Bayes Filtering in SpamAssassin

The Bayesian classifier in SpamAssassin began tagging emails a few days ago. I found this out because while messages were not marked as spam, my procmail rule started diverting all messages to my spam folder. The old rule was not particular about where the yes was and since BAYES contains yes, all emails looked like spam. The new rule only looks for the yes at the beginning.

# Old Rule
:H
* ^X-Spam-Status:.*Yes
$MAIL/spam
# New Rule
:H
* ^X-Spam-Status: Yes
$MAIL/spam

Now incoming spam messages contain an additional score in the spam report.

X-Spam-Report:
        *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
        *      [score: 1.0000]

I was surprised that it took the Bayes filter three months to gather enough email to begin scoring incoming email. It is a nice addition because it bumps up the spam scores enough to ensure that more messages that are spam get marked as such.

AT&T contracts are meaningless

I had a great experience on the phone with an AT&T representative when I called to transfer my number from an existing corporate plan to my own personal plan. I selected my options, and the representative explained to me that because it was a transfer, I would get a standard zero-month contract. He explained to me that I was free to change my service or transfer my number to another provider whenever I wanted to. This all went through successfully, and I was set.

Two weeks later, I noticed that my account had been slapped with an 18 month contract. When I called, AT&T had no idea why I would have had no contract, and they said that a standard transfer of service contract is 11 months. They told me that the first representative I spoke with was mistaken and that there is no record of my zero-month contract. Luckily I got my contract bumped down to 11 months; however, it doesn’t explain what happened to the zero-month contract.

Edit: 11 months is the standard length for a transfer of service. However, I did not agree to that over the phone. If they can slap me with a contract without any change in service or without my acceptance, their word and their contract are meaningless.

Pandora Alternatives

When Pandora announced a 40 hour cap on free listening, I decided to investigate some alternatives to Pandora. (I probably had been listening to 40 hours of music on Pandora in a week!) I quickly found Slacker. Slacker provides web-based music streaming in a similar manner to Pandora. I have found that they provide a tighter selection of music and that their music discovery has not been quite as good. This is probably because it is not based on the Music Genome Project. However, much like Pandora, after listening for a while, the selection has improved. Slacker’s commercials are a bit more radio-like and therefore more intrusive; however, they don’t have a cap so it’s hard to complain. Since I don’t always have my music on the machine I am on, I have also been taking advantage of my installation of Subsonic to listen to music.

Both Pandora and Slacker have mobile clients available for the BlackBerry. I have both installed, and both work well on 3G and WiFi. I don’t use the mobile clients very often, but when I want some music, and I am not at my computer, it’s convenient to have the option to play music right from my BlackBerry. Subsonic is supposed to stream to mobile devices, but I have yet to get that working.

Another benefit is that the LastFM Firefox Extension supports both Pandora and Slacker so I can continue “scrobbling” my tracks to Last.fm. Subsonic has built in support for Last.fm “scrobbling.”

Edit: Recently, I have started listening to my “library” on Last.fm. This feature plays songs that I have already listened to. I am not sure how Last.fm select songs, but I would assume it is based on what I have listened to most often. Since I have “scrobbled” a large number of songs, Last.fm does a decent job of playing songs for me. It’s still not as good as Pandora, and I imagine it would do much worse on smaller sets of “scrobbled” songs.

BlackBerry Bold Tethering on Windows XP

I adapted and simplified the directions for tethering available throughout this forum thread at CrackBerry.com. My directions are specific to AT&T in the United States.

Configuration:

Install the latest version of Blackberry Desktop.
Open the “Phone and Modem Options” control panel.
Click the “Modems” tab.
Select “Standard Modem” and click on “Properties”.
Click the “Advanced” tab.
In the “Extra initialization commands:” field, enter the following:

AT+cgdcont=1,"IP","wap.cingular"

Click “OK” twice.

Open “Network Connections”.
Open the “New Connection Wizard”.
Click “Next”.
Select the “Connect to the Internet” option, and click “Next”.
Select “Connect using a dial-up modem” and click “Next”.
In the “ISP Name” field, type “BlackBerry”, and click “Next”.
In the “Phone number field:”, type “*99#”, and click “Next”.
Select “Anyone’s use”, and click “Next”.
Leave the user name and password fields blank; uncheck “Make this the default Internet connection”, and click “Next”
Click “Finish”.
Click “Cancel”.

To connect:

Connect the BlackBerry Bold to the computer using a USB cable.
Open the BlackBerry Desktop Manager.
Open “Network Connections”.
Open the “BlackBerry” network connection.
Click “Dial”.

You should now be connected. When you are done, go ahead and disconnect.

BlackBerry Bold Tethering on Mac OS X

This forum post at BlackBerryForums.com has all of the details necessary to set up tethering for a BlackBerry Bold on Mac OS X.

I set up tethering quite a while ago. I then updated my Bold firmware not realizing that it would break tethering. After some digging, I found that the solution is to install the BlackBerry Bold PPPD Replacement. This solution is mentioned on the last page of the same BlackBerryForums.com post and also on this CrackBerry.com forum post.

Now tethering works correctly again, and I still have the benefits of the upgraded firmware on my Bold.

Update: BlackBerryForums.com has removed the old post for tethering on Mac OS X. This post appears to be a suitable replacement, but I have not tried these exact instructions.

WordPress Upgrade Problems

I decided to upgrade WordPress to version 2.8 using the Automatic Upgrade Tool. The upgrade looked something like this:

Downloading update from http://wordpress.org/wordpress-2.8.zip
Unpacking the core update
Verifying the unpacked files
Installing the latest version
Warning: copy(/home/www/blog/wp-content/themes/default/index.php) [function.copy]: failed to open stream: Permission denied in /home/www/blog/wp-admin/includes/class-wp-filesystem-direct.php on line 122
Warning: copy(/home/www/blog/wp-content/themes/default/index.php) [function.copy]: failed to open stream: Permission denied in /home/www/blog/wp-admin/includes/class-wp-filesystem-direct.php on line 122

After that, my blog wouldn’t load at all, and when I looked on my server, the entire blog directory was empty. I recognized the two listed files as ones I had changed in this post about displaying the author. It turns out that I had inadvertently set the owner of the file to root which gave the WordPress upgrade no permission to upgrade the file. Instead of failing gracefully, it simply dumped the entire blog directory.

It appears this bug has been addressed in this ticket. Hopefully a similar error won’t cause me any problems during my next upgrade.

Edit: It turns out the damage was a bit greater than I initially realized. This bug deleted almost every file that was owned by the Apache user. This included two wikis that I run and several other miscellaneous sites. Luckily I was able to restore everything from backups. I have also changed the ownership of many of the files to something other than the Apache user.

BlackBerry Bold Firmware Upgrade

I upgraded the firmware on my BlackBerry Bold 9000 from the default 4.6.0.167 to the 4.6.0.266 version using the instructions available on the CrackBerry.com forums. I went ahead and followed the directions, and the flash went successfully. The majority of my settings were restored. I had to enter user names and passwords for the YouMail client and the Google Apps sync client. However, the Pandora client retained its credentials correctly.

I had decided to upgrade my firmware because I had begun experiencing a lot of dropped calls and data connectivity issues. I often had to browse to a website or turn the radio off and on to start receiving email again. My Bold caused my speakers to hum constantly as it toggled between 3G and EDGE networks. After the firmware upgrade, the data connectivity issues and the speaker hum have gone away, but the phone still drops calls more often than it should. Many people had reported a battery life improvement relative to the 167 firmware. I didn’t notice this initially; however, when I updated the YouMail client on my BlackBerry, I discovered a new option to disable polling for new voice mails. Once I disabled that, my battery life improved substantially. After the update, I was also able to connect to Marquette University’s wireless network from my phone. It’s possible I had done something wrong in the past; however, I suspect the version fixed some little bug that made it incompatible with the wireless network on campus.

Despite AT&T’s delay in releasing updates for the Bold, I recommend this update if you are experiencing any problems with the current version of the firmware.

Edit: After using the new firmware for a while, I noticed the annoyance of the Visual Voice Mail icon in the application switcher. Since I cannot quit the application, I decided to find a way to remove it. A forum post on PinStack.com has the solution:

Another option is to simply remove all the vvm cod files from the java folder (7 of them) and then start up dm or apploader and run the through the process. It should tell you that it doesn’t recognize those files and remove them.

java folder is located: C:\Program Files\Common Files\Research In Motion\Shared\Loader Files\9000-v4.6.0.247_P4.0.0.206

Of course the version number is different, but once I removed the files and ran the loader again, the icon disappeared.

Show the Author in WordPress

I had added the code to the default WordPress theme a while back, but when I upgraded, it apparently cleared it out. This time I documented the changes I made to the default theme. Changes must be made to two files in “wp-content/themes/default” which is the default theme directory.

The first file is “index.php” and only requires the removal of the comments around “the_author()” portion.

<small><?php the_time('F jS, Y') ?> <!-- by <?php the_author() ?> --></small>

The second file is “single.php” and requires the addition of a “the_author()” block similar to the following.

on <?php the_time('l, F jS, Y') ?> at <?php the_time() ?> by <?php the_author() ?>

I achieved these changes with the following sed commands.

cd wp-content/themes/default
mv index.php index.php.default
sed 's/<!-- by <?php the_author() ?> -->/by <?php the_author() ?>/' index.php.default > index.php
mv single.php single.php.default
sed "s/on <?php the_time('l, F jS, Y') ?> at <?php the_time() ?>/on <?php the_time('l, F jS, Y') ?> at <?php the_time() ?> by <?php the_author() ?>/" single.php.default > single.php

The better solution is probably to find a theme that does this by default instead of enabling it every time the theme gets updated; however, I haven’t gone looking for a replacement theme yet.

This has been tested with WordPress 2.7.1.

Edit: It is advisable to then set the permissions on those two files back to that of the Apache user:

chown www-data:www-data index.php single.php

This will help prevent possible problems during an upgrade.

Special Characters and MediaWikis Do Not Mix

I have run a MediaWiki wiki for my house on campus for close to two years. When I set it up, I gave it the name “Saint Claude de La Colombière Men’s Catholic House,” and since MediaWiki accepted this name complete with two accented characters, I assumed it wasn’t a problem. For an unknown reason, it became a problem today. It started throwing the annoying “Fatal error: Allowed memory size of 20971520 bytes exhausted (tried to allocate 122880 bytes) in /home/www/claude/w/includes/Revision.php on line 361” errors. I tried increasing the memory allocated to PHP, looking through Apache log files, and upgrading the version of the MediaWiki software.

After at least an hour of frustration accumulated throughout the day, it occurred to me that this only happens on one page: the “About” page. This page, in standard MediaWiki fashion, is actually called: “Saint Claude de La Colombière Men’s Catholic House: About.” Then it hit me that perhaps those accented characters weren’t such a good idea after all. As soon as I changed the value in the LocalSettings.php to something more reasonable and accent free:

$wgSitename = "Saint Claude de La Colombiere Men's Catholic House";

everything started working again as normal.

I would be very curious to know why this worked in the past. I suppose it could have to do with my versions of Ubuntu Server, Apache, PHP, MediaWiki, etc. However, it’s one more thing to check when PHP starts throwing out of memory exceptions.

Edit: It turns out that increasing the memory to PHP was the solution as today I started experiencing the same error on other pages. I increased the LocalSettings.php memory limit from 20 MB to 32 MB.

ini_set( 'memory_limit', '32M' );

and this (actually) solved the problem. Before, I was editing the php.ini file, and it would appear that the LocalSettings.php was overriding the global value even though it was higher. Special characters may still be a bad idea.

URIBL SpamAssassin Settings

I have been receiving a lot of emails that contain web links that are getting marked as spam. According to URLBL.COM, these are links that appear in spam and not links where the spam originates. Therefore, about all I can do is whitelist the senders or dial down the scores on the rules for these filters. After adding a handful of senders to the whitelist, I decided to alter the rules.

I found all of the URIBL rules in the /usr/share/spamassassin/50_scores.cf file. I copied them to the /etc/spamassassin/local.cf file where I could change the values to something more reasonable:

score URIBL_AB_SURBL 0 0.800 0 0.900 # n=0 n=2
score URIBL_JP_SURBL 0 1.400 0 0.700 # n=0 n=2
score URIBL_OB_SURBL 0 1.000 0 0.700 # n=0 n=2
score URIBL_PH_SURBL 0 1.000 0 0.800 # n=0 n=2
score URIBL_RHS_DOB 0 0.400 0 0.500 # n=0 n=2
score URIBL_SBL 0 1.200 0 0.700 # n=0 n=2
score URIBL_SC_SURBL 0 1.200 0 0.200 # n=0 n=2
score URIBL_WS_SURBL 0 1.000 0 0.700 # n=0 n=2
score URIBL_BLACK 0 0.900 0 0.900 # n=0 n=2

Hopefully decreasing the scores for these rules will decrease false positives that I have been receiving in my inbox.