Email Relay Detection
Please pardon the basic webpage. I hope to someday add more information, including
a summary of the many papers I have read on spam.
Introduction
Email Relay Detection is a preliminary finding. It uses Header Analysis and
DNS lookups to determine if an email is relayed (came from a server other
than the server that should have sent it). Most spam is relayed through
open relays or botnets. Hence, relay detection can be used as a method
of detecting and filtering spam. One of the big advantages is that this
analysis can be performed very, very quickly.
You can read the paper that
explains how it works and how it should be performed. You can also see the
slides of the
presentation I made at the 2007
MIT Spam Conference.
Reference Implementation
You can also download the reference implementation I
used to obtain my results. Be aware, however, that the reference implementation was
based on an older, less efficient specification.
The results, however, should be the same. The reference implementation is capable
of analyzing an email through STDIN (if no filename is given) or a file (or several
files) containing a single email per file (a la Maildir, not Unix mailbox) by passing
the filenames on the command line. If the email is analyzed through STDIN, the filter
will spit the email back out with a X-RelayFilter header. Otherwise, it will print
the results of the analysis to STDOUT for each file processed.
Thank you for visiting. If you have any questions or comments, please drop me a line
at alberto at BYU dot edu.
Responses to Issues Brought Up During the Spam Conference Presentation
During the Questions phase of the presentation, several questions were brought
up. I wish to present responses to the questions.
Does it only report relayed/not relayed email?
Yes, the current proposed solution only handles those two values. Assigning
different point values to each test could provide a broader range of "relay
detection" that could improve heuristics and combination with other spam
filtering solutions.
There may be cases where header forgeries are OK.
One audience member said there may be legitimate cases for header forgeries
such as the New York Times reader sharing a story with a friend. Although that
type of forgery is very common and very much appreciated by many users, I
think it has several problems:
- Forgery is forgery. Although forgery does have a place in crime
prevention and investigation, I don't think it has a place in business.
Think of a retailer that can generate a forged receipt at a client's
request to avoid the client's company purchasing rules. As far as I know
that's illegal. Why should email forgery be considered a legitimate
business practice?
- RFC 2635 asks people not to relay email. By forging From fields in
email they are in a way relaying email.
- It breaks my tests! If they wouldn't do it, I would have a nearly
perfect system! :-)
Contradiction with RFC 1123
RFC 1123 states in section
5.2.5:
The HELO receiver MAY verify that the HELO parameter really corresponds to the
IP address of the sender. However, the receiver MUST NOT refuse to accept a
message, even if the sender's HELO command fails verification.
My response to this recommendation is as follows (and thank you for the other
Spam Conference attendees who helped pen this response):
- RFC 1123 was published in October 1989. It is nearly 18 years old.
- RFC 2505, when describing how Received header lines should be built,
(section 2.2.1) adds the following:
These recommendations are deliberately stronger than RFC1123, [3],
and are there to assure that mail sent directly from a spammer's host
to a recipient can be traced with enough accuracy; . . .
This suggests that a) RFC 1123 is becoming obsolete and b) recommendations
in RFC 1123 are not good enough. RFC 2505 was written in February 1999,
nearly 10 years after RFC 1123 and over 8 years ago. It seems to me like
RFC 1123 is practically obsolete and 2505 should be revised shortly.
- RFC stands for Request for Comment. Here is my comment: Verifying HELO
identities can help eliminate over 50% of current spam. Shouldn't we
supersede RFC 1123's recommendation? Everyone would agree that the spam
problem today is not anywhere close to what it was in 1989. So why are we
basing our spam strategy on a 1989 opinion?
- RFC 821 and RFC 2505 require a FQDN in the HELO command. RFC 2505
clarifies this needs to be done to be able to trace the true origins of the
email. If a spammer does not provide a FQDN in order to hide his/her
identity, violating RFC 2505, should I not have the right to reject his
message?
My tests are a reinvention of SPF
Yes, in a way they are. However, my techniques offer two great advantages
over SPF:
- It uses existing DNS records and does not depend on people
creating new DNS records. In other words, it works now across all
domains with valid DNS records.
- It does not have an "allow from all" feature. A lot of spammers have
set up SPF records that allow email from any IP address rendering SPF
useless in stopping email.
SPF is a good idea, in my opinion. It provides the ability to verify Sender
SMTP connections. However, it should not be the only tool used to determine
the level of trust a host should have.
A false-positive rate over 0.5% is unacceptable
Everyone's rules as to what is acceptable and what is not acceptable are
different. If this solution does not meet your standards, you don't have to use
it. However, if it provides you with a new heuristic or tool to help you filter
your email, great! If your biggest problem is storage space or bandwidth
utilization and deploying this at the SMTP level would help your problems,
great! If this solution causes you more problems and doesn't work for you,
you don't have to implement it. I'm usre you will already have something that
meets your needs better than this solution anyway.