Greylisting is looking a bit long in the tooth

Back in 2006, when I set up thinktank.co.nz’s mail server defenses, SPAM was increasing at an astronomical rate. There were a number of tools to use on mail servers to help stem the tide, including greylisting, SpamAssassin, Realtime Blackhole Lists (RBL’s), signature systems including Vipul’s Razor and Pyzor; other technologies such as Sender Policy Framework (SPF), DomainKeys Identified Mail (DKIM), and more recently Domain-based Message Authentication, Reporting & Conformance (DMARC) all play a part in detecting badly-behaving mail senders and SPAMmy messages.

But greylisting is the simplest and computationally least expensive method, and produced generally great returns. Until recently.

Greylisting runs on an email recipient’s mail server, and works like this: it collects “triplets” of the sender’s email address, the sender’s email server IP address, and the recipient’s email address. The first time a particular sender tries to get in touch with a particular recipient, the greylist software rejects the email with a temporary failure, essentially saying “sorry I’m temporarily unavailable – come back later” and waits for the sending email server to try again. If this happens before a configurable period of time, typically five minutes, the sender will get the same temporary failure again. When a retry happens after the specified configurable period, the message will get through. All of this requires next to no computing power, just looking up whether this particular sender at a specific email server has tried to contact a specific recipient – a triplet match – in the last n minutes.

The only downside to greylisting is that sometimes you have to wait for an email from a new sender to arrive, which can be particularly annoying when subscribing to a new service and waiting for a confirmation email.

The big idea behind greylisting was that, in the olden days at least, SPAM was generally sent by poorly configured senders, typically as part of malware inadvertently installed on an unsuspecting correspondent’s computer. Primitive SPAM software was optimised to try the maximum number of addresses in the shortest possible time. If it received a temporary failure, it would typically just move on to the next target, and not expend any effort in retrying. But the SPAMmers have honed their craft, and are a lot smarter these days.

Recently, as email volumes continue to expand, some senders (such as Microsoft’s online Outlook and AliExpress) have a large number of outgoing mail servers. The second element of the aforementioned triplet, the sending email server IP address could be different on each retry. Which meant that an email from one of these servers will easily show up as a “false positive”, and could be caught up in greylisting purgatory for a very long time.

I recently decided to disable greylisting on our mail server to see if the other defenses could close the gap – and they performed admirably. It’s hard to measure accurately, but anecdotally the amount of spam coming through to client email addresses is roughly the same as it was when greylisting was in effect. It’s just computationally a lot more expensive. But given the relatively low mail volumes we receive here at thinktank.co.nz, it hasn’t been a major issue.

You may be asking yourself, “Why on earth are you running your own mail server? Why not just use Gmail or Outlook like everyone else?” The answer is, we don’t want Google and Microsoft reading all our mail. It’s strange to me how upset people get about Facebook reading their posts and understanding their social graph, but I think your email trail, for most people, is much more telling. Problem is, 43% of our emails have correspondent address going through google.com or outlook.com servers – meaning that Google and Microsoft will be reading our mail anyway. I’ll continue to be defiant about the other 57%, and hope that others, in time, will see the light, and break free of corporate surveillance of their email.