The Paradox Of The Mail Server On The Cloud

Cloud Mail ParadoxProviding your web application with a mail service that works flawlessly is probably essential for your business. You need to send activation emails to users, password reset emails, newsletters and probably a whole bunch of other emails that have to do with interactions with your application.

When there were only physical servers and static IP addresses, everything worked perfectly. But now, when your application is in the cloud, setting up a working mail server next to your application is ridiculously impossible. If your application is successful and you would like to send emails to your millions of satisfied users, your options come down to:

  1. Use a physical hosted server.
  2. Use a 3rd party email service.
  3. Set up a mail server in the cloud and compromise on some/most being marked as spam.

For us cloud oriented developers, option 1 is as useful as somebody suggesting you’d use a cassette tape recorder to put your favorite songs on. It’s old, unreliable, can’t scale. Option 2 is very costly if your business is successful, and most of these services don’t deal with the amount of mails you need to send if you have a large scale user base. Option 3 will make your email communication efforts with your users almost non-existent, which means you can’t afford it as well. So your only option is to compromise somewhere.

Why is sending email from the cloud so difficult?

In order for your mail server to operate successfully and be trusted by mail services around the world, you need to abide by the following rules:

  1. Don’t be an open relay.
  2. Implement (and follow) SPF policy (and DKIM if possible).
  3. Have a PTR record that resolves back exactly to your mail server hostname.
  4. Don’t let your public IP address be listed in any RBLs.

Rule #1 is easily implemented in any mail server configuration, and there are also a number of online tools to test if you’re an open relay or not. Option #2 is also pretty easy to implement, assuming you control your DNS zone files and know your way around it.

The problem of mail on the cloud begins with rules #3 and #4. A PTR record, which is a reverse DNS entry, must be present and correct for your mail server to not be considered spammy. If your mail server is at 1.2.3.4 and is called mail.example.com, the PTR query for 1.2.3.4 (well, for 4.3.2.1.in-addr.arpa) must return mail.example.com. The PTR record can only be changed by the owner of the IP address, or by a delegation of his authority to you. Amazon Web Services do not let you control PTR records, so there goes the option for a mail server on EC2.

Other clouds let you control the PTR records for the IP addresses they assigned to you. But they fail on Rule #4. While your specific IP address might not be blacklisted in RBLs, the entire block that it belongs to might be blacklisted, because these IP addresses are assigned dynamically and therefore are always suspected as spammy by these lists. This is the case with Rackspace Cloud for example, and is the only thing left to be solved before you can run a mail server there. And although they’re trying to get their address block de-listed, this problem still persists.

Other clouds I’ve examined in this space are GoGrid and Joyent. GoGrid want you to fill up a questionnaire, and only then they open up port 25 for you. This sounds absurd, and against all the on-demand nature of the cloud (and I also personally don’t trust ServePath, the company that operates GoGrid). Joyent’s offering seem to disregard the option of hosting a mail server with them, and I couldn’t get their response on this matter.

So unless Rackspace Cloud solve their IP block blacklisting problem, or AWS offer a PTR setting option (plus no blacklisting as well), we’re left with the need to compromise.

The only feasible solution right now — seems like it’s back to physical hosting.

Blackhole Name Servers

If you are running a name server that’s serving your application or inner network in some way, and you start seeing a slowdown in reverse name resolution, you should check your logs (or if no name server logs, you can tcpdump port 53), and search for requests to BLACKHOLE-1.IANA.ORG (192.175.48.6) or BLACKHOLE-2.IANA.ORG (192.175.48.42).

When I saw these for the first time I thought it was some Chris Cornell Joke.

If you’re seeing these and experience a slowdown, you have a problem — your name server is recursing and trying to resolve addresses in the reserved private space, instead of replying with an authoritative answer, or at least replying with a redirection.

There are 2 solutions (assuming you are using bind):

  1. Configure your name server to be authoritative for the reserved space:
    In /etc/named.conf:

    zone “0.0.10.in-addr.arpa” {
    type master;
    file “/var/named/0.0.10.in-addr.arpa.zone”;
    };

    And in the zone file /var/named/10.in-addr.arpa.zone, if for example you want 10.0.0.3 to resolve to web.example.com:

    $TTL 14400
    @ IN SOA ns1.example.com. admin.example.com. (
    2009012501;
    28800;
    604800;
    604800;
    86400
    )
    
    IN NS ns1.example.com.
    3 IN PTR web.example.com
  2. If you know (or can assume) there’s a name server along the way that is configured to reply authoritatively for these queries, configure your name server to not perform recursion. This way it replies to the query with “I don’t know who’s 10.0.0.3, go look for yourself, here’s a hint”.In /etc/named.conf, add in options context:
    recursion no;

Since there was indeed a name server configured properly to reply for all the 10.0.0.0/8 addresses in my network, and I only configured the inner name server to reply for what the application needed, adding the no recursion option solved the problem in my case.

By the way, adding “recursion no” to a name server that is only there to serve some specific application need is good practice both security-wise and performance-wise.

Oh, and here’s what IANA have to say about the blackhole servers. Creepy.