Spam, Spam, Spam, Spam...

From Wikipedia: "Spam is the use of electronic messaging systems (including most broadcast media, digital delivery systems) to send unsolicited bulk messages indiscriminately. While the most widely recognized form of spam is e-mail spam, the term is applied to similar abuses in other media: instant messaging spam, Usenet newsgroup spam, Web search engine spam, spam in blogs, wiki spam, online classified ads spam, mobile phone messaging spam, Internet forum spam, junk fax transmissions, social networking spam, television advertising and file sharing network spam."

It's everywhere, and seemingly impossible to avoid. Over 95% of all email is now spam of one sort or another, with over 100 billion spam messages being sent worldwide every day. The average email account receives over 400 of these irritating (and sometimes dangerous) messages per day. Just about every method has been tried to stop (or at least slow down) the onslaught, and yet it persists.

However, there are some simple things that can be done to keep your own inbox relatively spam-free. There are three points at which you can stop spam: The first, of course, is to prevent the spammers from getting ahold of your email address in the first place. Doing things like removing your email address from your website (or using phoenetics and spaces to spell it out, like "cgaba (at) brainwrap (dot) com") can help a little bit with this, but have the downside of making it more difficult for your customers to actually get ahold of you, so I'm not sure how useful that is.

A better suggestion is to create a special "junk" email address to use just for things like posting on public message boards or newsgroups, or even creating multiple email accounts to use for different functions (one for personal use, one for work, one for buying stuff online, etc).

The second stage of spam filtering is done at the server level--that is, before it even gets to your in-box. Any decent email hosting service (usually the same company providing the website hosting for your domain name) should have some sort of server-side spam filtering system available for your use. For those of you utilizing Brainwrap's hosting service, there's a powerful--and customizable--server-side spam filter called, appropriately enough, Spam Assassin. If you're not already utilizing it, here's how to do so:

  1. Log into your web control panel at https://www.yourdomain.com:8443 using your full email address & email password.
  2. On the left-hand side, click "Home"
  3. Under "Tools", click "Spam Filter"
  4. Make sure that Spam Assassin is running--the only icon on the screen should be a lever with the label "Disable" (which means that it's currently enabled, of course)
  5. On this screen you'll see 4 tabs across the top (Preferences, Black List, White List and Training).
    • "Preferences" (the current tab) turns the filter on and off, lets you set how aggressive you want the filter to be, and lets you decide what to do with email that it thinks is spam (either add a special code to the subject line or delete it outright if you're confident that it's not catching any "false positives")
    • The "Black List" tab lets you add email addresses that are constantly sending you spam; mail coming from any addresses listed here will be marked as spam no matter what. Unfortunately, most spammers use randomly made-up, phony "from" addresses anyway, so this isn't terribly useful, but could come in handy in certain situations.
    • The "White List" tab, on the other hand, is very useful. This does the opposite of the black list--email coming from any addresses listed here will not be marked as spam no matter what. This is a good way of preventing the filter from accidentally mistaking mail from your business associates, friends, family, etc. as spam.
    • Finally, the "Training" tab is the most important one to use. This is how you train your individual spam filter to become more accurate:
      • You should see a list of whatever email is still sitting on your server. The number of messages will vary depending on whether you have your email application set up to delete them from the server immediately after downloading, or to leave them on the server for a period of time, so you may have hundreds of messages listed or just a handful (or, possibly, none at all, in which case you'll have to revisit this screen at a later time)
      • The message list will include the Subject, the From address, and the Date Sent. 99% of the time, you should be able to tell whether the message is legitimate mail or not from this information alone.
      • To the left, you'll also see 2 more columns: the "T" column lets you know whether that message has already been Trained or not, and if so, what the filter currently thinks it's status should be. A blank circle means it's untrained; a green circle with a checkmark means that Spam Assassin thinks that it's a legitimate message; and a red circle with an exclamation mark means that the system thinks that it's spam.
      • Review each message. Leave the correctly-identified ones alone. Check off the spam messages that are incorrectly marked as legitimate. Then, click the "It's Spam!" link at the top. After taking a moment to process the corrections, the screen will refresh with the updated statuses.
      • Then, do the same thing for the legitimate messages that are incorrectly identified as spam. This time, click the "It's Not Spam!" link at the top.
      • Don't forget to also check off the uncategorized messages (blank circle) as well, depending on which one they should be listed as.
      • Voila! You've just made Spam Assassin that much more accurate.

I know the above steps sound tedious, but they actually go quite quickly once you get the hang of it, and after doing a few "training sessions" with several batches of email, you should start to see the database become more and more accurate. For instance, in my case, I've received almost 630,000 emails in the past 2.5 years--an average of nearly 700 per day. Of those, Spam Assassin has filtered out 92% of it (almost 580,000 messages) before they ever got to my in-box.

Of course, that still leaves nearly 50,000 messages that did make it through to my in-box, most of which is still spam. Still, I've cut down the number of spam messages reaching my in-box from 700 per day to less than 60--not too shabby!

So, what about the third line of defense: The client-side spam filter? Well, this essay is getting a bit long, so I'll save that for part two...