EMail Filtering and Client Side Black/White Listing

I periodically get asked about why I ask for a new email address before I receive any communications from that person. The short answer is that I get around 500 spam emails per day between my dozen or so email accounts (that's about 15,200/month or 182,500/year!). While I'm very careful about who gets what email account, the reality of the situation is that somewhere a mail server will be compromised (I've personally seen this happen with AT&T and Verizon service) or the end receiver will have a virus or spyware bot harvesting addresses from the inbox of a compromised system (far more common, especially in windoze). Once the spammers get your address, they sometimes share amongst themselves making the problem worse. Sometimes those spammers will spoof your email address (which is extremely trivial) and send spam as you to other people. The harvesters for other spammers then have your email address and add you to yet another list.

This becomes a vicious cycle and things get quickly out of hand. Years ago when I saw an Internet traffic usage chart and noticed that spam traffic had exceeded porn traffic, I knew that this was a major problem. To be fair to the email system, it was designed back in the 1970's before Internet existed, and the universities were using ARPANET. Things were much calmer and under control back then. To be unfair to the standards committees, they should have come up with a working fix by now. Most of their ideas have been mediocre at best. I do like TMDA, though. It's something that resides on the server. I haven't had a chance to test and implement it, but it looks far better on paper than any of the others.

So what can the average person with a bad spam problem do? The short answer is set up white listing and black listing.

"White List" is terminology for a filtering list that takes a matching item and moves it somewhere safe or lets it pass through to somewhere safe. The inverse is a "Black List" in which any matching item will get destroyed in some way.

The concept is to move the bad emails to the junk box, move the good emails to a safe inbox, and let anything that doesn't match fall through to a generic inbox for manual sorting (almost always spam).

If you're going through the trouble of setting up filtering, you might as well add some more organization to this model instead of dumping all emails into one safe (but messy) inbox. The example tree below is loosely based off what I have for my main email client. It may or may not be this extensive based on email usage and needs. The various folders and sub-folders include top categories for my multiple accounts across multiple domains and sub-categories for personal, family, friends, church, school, work, clients, shopping orders, political, registrations & notices (like for message boards), mailing lists (with sub-folders for the office, CERT, VPN, and high availability lists), phone system voice mail paging, network monitoring, firewall monitoring, generic spam account (usually open to the world), email based magazines, and other. In short, if there is a commonality between a bunch of emails, create a categorized folder for it.

Example EMail Folder Tree:

MyPersonalDomain.com-Me
\_Inbox-Safe
\_Family
\_Friends
\_Church
\_School
\_VoiceMailPaging
\_Other

MyPersonalDomain.com-GenericSpammableAccount
\_Shopping
\_SiteRegistrationAndNotices
\_Political
\_eMagazinesAndNewsLetters
\_Other

MyWorkDomain.com-Me
\_Inbox-Safe
\_Management
\_Colleagues
\_Clients
\_AutomatedAgentBots
\_SiteRegistrationAndNotices
\_NetworkMonitoring
\_FirewallMonitoring

MyWorkDomain.com-MailingLists
\_OfficeNewsAndAnnouncements
\_CERT
\_VPN
\_HighAvailability

As you can see, when a new mail alert is given, the tree example allows a person to quickly and efficiently see the new email category and keeps it grouped with like emails.

Each email client is different, so specific instructions on how to set up filters isn't really possible in this document. They all follow a generic set of rules and patterns, though (be warned that some are more advanced than others). A "filter set" is made up of a series of "filter match rules" that are like the links in a chain (like the metal chain link used to secure an object down). The first link in the chain is the first filter match rule entry, and so on. If the first link does not match with the email item presented, it will pass to the next until the end of the chain is reached (the end of the filter set). If the email item fully matches a filter match rule, then that email item will be acted on by the "filter action" rule (what the filter is supposed to do with the email, like move it into the Inbox-Safe folder). The end of the chain (the last link) will have an explicit match rule (made by you) or an implied match rule (invisible but still there) that says what to do with the item if nothing else has already been done. This is called a default filter action. For most email clients, it will pass through and do nothing.

Each filter set can be put into a group and that group can be put into a table with other filter groups (like a table of contents in a book). They are gone through from the top down until a matching filter rule is reached and an action is performed.

To reiterate for clarity... One or more "filter match rules" go into a "filter set". The "filter set" has a "filter action" that does something if there is a match. Collectively, these together can be put into a group container. There can be multiple group containers put into a list like the table of contents in a book. The list will be executed from the top down when run. Once again, not all email clients fully support this.

Filters sets have two common usage groupings. There are many others, but they are beyond the scope of this document and not used very often for email filtering.

A filter set can be grouped in the "OR" usage configuration. This basically says that: If I find any filter link in the chain that matches the email item, do what the filter action says to do. Since OR match rules are more generic, they are usually put towards the end of the groups of filter sets. Simple OR example with 2 match filters: If I find "this" OR "that" then I do "action".

A filter set can be grouped into the "AND" usage configuration. This basically says that: All filter links in the chain must match the email item before the filter action can be executed. Since AND match rules are more specific, they are usually put towards the beginning of the groups of filter sets. This prevents them from being blocked by more general OR filters. Simple AND example with 2 match filters: I must find both "this" AND "that" before I can do "action".

EMail Black Listing. Depending on the complexity of the email client, this will usually have a few filter set groups that are executed before any others (top in the book's table of contents). Black listing isn't overly effective and is like trying to stop the wind, but it can catch some of the more annoying big stuff. Black lists contain both OR and AND filter set groups.

Example OR Black List. For some reason spammers think I need certain prescription drugs without a doctor's consent. Others think I need videos to get aroused or maybe another degree. I think they are all offensive and want them removed. Many of these spam emails follow the same boring pattern. We can make use of that weakness. Filter set:

Filter Name: Blatent Junk
if from contains "viagra.com" OR
if subject contains "viagra" OR
if from contains "v_i_a_g_r_a" OR
if subject contains "v_i_a_g_r_a" OR
if subject contains "on pfizer" OR
if subject contains "doctor recommended treatment" OR
if subject contains "xxx free video" OR
if subject contains "you are nominated"
then execute filter action "move to junk"

Example AND Black Lists. Some spammers get creative and try to spoof my own email domain to send me junk. I'm the one who paid for it, and I find this highly offensive. Since AND filters are more specific, they cannot contain everything in one group like the OR filter above. This means there will be multiple AND filters in the top list.

Filter Name: Spoofed Junk 1 (my domain always returns to me)
if from contains "MyPersonalDomain.com" AND
if return-path does not contain "MyPersonalDomain.com"
then execute filter action "move to junk"

FilterName: Spoofed Junk 2 (I always have my real name in the address)
if from contains "me@MyPersonalDomain.com" AND
if from does not contain "My Real Name"
then execute filter action "move to junk"

FilterName: Spoofed Junk 3 (I don't send myself high priority email)
if from contains "me@MyPersonalDomain.com" AND
if priority is "high"
then execute filter action "move to junk"

FilterName: Spoofed Junk 4 (a friend is specifically getting spoofed)
(note that this would be better taken care of in the "OR Black List")
if from contains "MyFriend@somewhere.com" AND
if subject contains "Everyone will be jealous of your sexy body"
then execute filter action "move to junk"

Black List Safety. Do not get overly generic or zealous in the black list filter match rules. If you do, you could end up deleting important emails from valid sources. I recommend adding new rules slowly and carefully and monitoring if they really work or not. Get some friends to send some test emails through that should be hit by both black and white lists to make sure that they are behaving as expected.

Example OR White Lists. This is towards the bottom of the filter set grouping list "table of contents" since it is the most generic. The white lists are what most people will be interested in. The white lists are also what are used to sort emails into categories like in the tree example above. As you can see, these get very repetitive.

Filter Name: White List: Personal Inbox-Safe
if from contains "someone1@somewhere.com" OR
if from contains "someone2@somewhereelse.com" OR
if from contains "someone3@nearby.com OR
if from contains "TrustedFriendsDomain.com"
then execute filter action "move to folder Inbox-Safe"

Filter Name: White List: Personal Friends
if from contains "myfriend1@somewhere.com" OR
if from contains "myfriend2@somewhereelse.com" OR
if from contains "myfriend3@nearby.com
then execute filter action "move to folder Friends"

Filter Name: White List: Personal Family
if from contains "myfamily1@somewhere.com" OR
if from contains "myfamily2@somewhereelse.com" OR
if from contains "myfamily3@nearby.com
then execute filter action "move to folder Family"

Filter Name: White List: Work Colleagues
if from contains "mycolleague1@somewhere.com" OR
if from contains "mycolleague2@somewhereelse.com" OR
if from contains "mycolleague3@nearby.com
then execute filter action "move to folder Colleagues"

Filter Name: White List: Work Clients
if from contains "myclient1@somewhere.com" OR
if from contains "myclient2@somewhereelse.com" OR
if from contains "myclient3@nearby.com
then execute filter action "move to folder Clients"

Filter Name: White List: Work Message Boards
if from contains "boardbot1@somewhere.com" OR
if from contains "announce2@somewhereelse.com" OR
if subject contains "[List Digest]" OR
if from contains "maillist3@nearby.com
then execute filter action "move to folder SiteRegistrationAndNotices"

Filter Name: White List: Work Mailing List CERT
if subject contains "US-CERT Technical Cyber Security Advisory"
then execute filter action "move to folder CERT"

Filter Name: White List: Work Mailing List VPN
if subject contains "[VPN]"
then execute filter action "move to folder VPN"

The Final Default Filter Rule. This is neither black nor white but just simply puts everything else not filtered into a generic, catch-all folder at the very end of all the filter groups. Some email clients want this, others don't. It should be defined per email account so the various accounts don't get jumbled up. If you keep your filter list up to date, everything in this folder should be spam. It should still be given a quick once over before deleting, though, to make sure something important isn't lost inside it. If something important is found, create a new white list filter for it. This is also why I say "give me a heads up on your new email address" if you happen to get one. It is often hard to find something important in this folder without knowing what it is.

Filter Name: Explicitly Defined Default Rule
if from contains "me@MyPersonalDomain.com" OR
if deliver-to contains "me@MyPersonalDomain.com"
then execute filter action "move to folder MyPersonalDomain.com-Me"

Even with all this filtering, spammers will still try to find ways of breaking through and getting into your white list folders. There really isn't much that can be done about these except manually delete them or create another black list rule if there is a constant and common pattern.

A little more on filters and clients. Depending on the email client, the filter match rules may be only globally attached to all email accounts the email client watches. The filter match rules may have to be duplicated for each email account. There may be a mix of both. You'll have to use your head when figuring out your email setup. For example, the free Thunderbird email client attaches filters by email account and has no global option (a pain for black listing). It does have good grouping, though. The various web mail services are only one email account so everything will be global with no grouping.

EMail Client Spam Filters. Some email clients come with these. If so, use them. They make a great additional layer to the filtering setups. Just keep in mind that these are not perfect and will let unknown spam through. They also have to be trained in what is good and bad. Spam filters can majorly cut down the trash in the folders that the default filters dump into. This can make finding important emails that got missed in the filter chains far easier to find and correct.

In summary, email filtering may sound very complicated, but it isn't. It is just tedious to set up. If you want to try it but still find it daunting, I recommend creating a simple folder tree for the categories (like the example tree above), adding a few white list rules at a time, and watching the results. White lists won't delete anything, but black lists will and are less safe to learn with. Have some friends send you test emails if you don't want to wait for regular emails.