Email Relay Detection

Please pardon the basic webpage. I hope to someday add more information, including a summary of the many papers I have read on spam.

Introduction

Email Relay Detection is a preliminary finding. It uses Header Analysis and DNS lookups to determine if an email is relayed (came from a server other than the server that should have sent it). Most spam is relayed through open relays or botnets. Hence, relay detection can be used as a method of detecting and filtering spam. One of the big advantages is that this analysis can be performed very, very quickly.

You can read the paper that explains how it works and how it should be performed. You can also see the slides of the presentation I made at the 2007 MIT Spam Conference.

Reference Implementation

You can also download the reference implementation I used to obtain my results. Be aware, however, that the reference implementation was based on an older, less efficient specification. The results, however, should be the same. The reference implementation is capable of analyzing an email through STDIN (if no filename is given) or a file (or several files) containing a single email per file (a la Maildir, not Unix mailbox) by passing the filenames on the command line. If the email is analyzed through STDIN, the filter will spit the email back out with a X-RelayFilter header. Otherwise, it will print the results of the analysis to STDOUT for each file processed.

Thank you for visiting. If you have any questions or comments, please drop me a line at alberto at BYU dot edu.

Responses to Issues Brought Up During the Spam Conference Presentation

During the Questions phase of the presentation, several questions were brought up. I wish to present responses to the questions.

Does it only report relayed/not relayed email?

Yes, the current proposed solution only handles those two values. Assigning different point values to each test could provide a broader range of "relay detection" that could improve heuristics and combination with other spam filtering solutions.

There may be cases where header forgeries are OK.

One audience member said there may be legitimate cases for header forgeries such as the New York Times reader sharing a story with a friend. Although that type of forgery is very common and very much appreciated by many users, I think it has several problems:
  1. Forgery is forgery. Although forgery does have a place in crime prevention and investigation, I don't think it has a place in business. Think of a retailer that can generate a forged receipt at a client's request to avoid the client's company purchasing rules. As far as I know that's illegal. Why should email forgery be considered a legitimate business practice?
  2. RFC 2635 asks people not to relay email. By forging From fields in email they are in a way relaying email.
  3. It breaks my tests! If they wouldn't do it, I would have a nearly perfect system! :-)

Contradiction with RFC 1123

RFC 1123 states in section 5.2.5:
The HELO receiver MAY verify that the HELO parameter really corresponds to the IP address of the sender. However, the receiver MUST NOT refuse to accept a message, even if the sender's HELO command fails verification.
My response to this recommendation is as follows (and thank you for the other Spam Conference attendees who helped pen this response):
  1. RFC 1123 was published in October 1989. It is nearly 18 years old.
  2. RFC 2505, when describing how Received header lines should be built, (section 2.2.1) adds the following:
    These recommendations are deliberately stronger than RFC1123, [3], and are there to assure that mail sent directly from a spammer's host to a recipient can be traced with enough accuracy; . . .
    This suggests that a) RFC 1123 is becoming obsolete and b) recommendations in RFC 1123 are not good enough. RFC 2505 was written in February 1999, nearly 10 years after RFC 1123 and over 8 years ago. It seems to me like RFC 1123 is practically obsolete and 2505 should be revised shortly.
  3. RFC stands for Request for Comment. Here is my comment: Verifying HELO identities can help eliminate over 50% of current spam. Shouldn't we supersede RFC 1123's recommendation? Everyone would agree that the spam problem today is not anywhere close to what it was in 1989. So why are we basing our spam strategy on a 1989 opinion?
  4. RFC 821 and RFC 2505 require a FQDN in the HELO command. RFC 2505 clarifies this needs to be done to be able to trace the true origins of the email. If a spammer does not provide a FQDN in order to hide his/her identity, violating RFC 2505, should I not have the right to reject his message?

My tests are a reinvention of SPF

Yes, in a way they are. However, my techniques offer two great advantages over SPF:
  1. It uses existing DNS records and does not depend on people creating new DNS records. In other words, it works now across all domains with valid DNS records.
  2. It does not have an "allow from all" feature. A lot of spammers have set up SPF records that allow email from any IP address rendering SPF useless in stopping email.
SPF is a good idea, in my opinion. It provides the ability to verify Sender SMTP connections. However, it should not be the only tool used to determine the level of trust a host should have.

A false-positive rate over 0.5% is unacceptable

Everyone's rules as to what is acceptable and what is not acceptable are different. If this solution does not meet your standards, you don't have to use it. However, if it provides you with a new heuristic or tool to help you filter your email, great! If your biggest problem is storage space or bandwidth utilization and deploying this at the SMTP level would help your problems, great! If this solution causes you more problems and doesn't work for you, you don't have to implement it. I'm usre you will already have something that meets your needs better than this solution anyway.