ERROR 2026 (HY000): SSL connection error — The Solution

If you’re using mysql server with SSL, and ever encountered this error whenever you try to access the database after it’s running:

ERROR 2026 (HY000): SSL connection error

One thing to check is if your client certificate and server certificate have the same common name. You’ve probably went through the certificate generation procedure, and (like I did) just entered the same common name for both without noticing.

This is a nasty error message that doesn’t tell you anything, and there’s nothing in the error log to imply what went wrong. So remember – when generating your own certificates for a mysql server, use different common names for client and server!

The Paradox Of The Mail Server On The Cloud

Cloud Mail ParadoxProviding your web application with a mail service that works flawlessly is probably essential for your business. You need to send activation emails to users, password reset emails, newsletters and probably a whole bunch of other emails that have to do with interactions with your application.

When there were only physical servers and static IP addresses, everything worked perfectly. But now, when your application is in the cloud, setting up a working mail server next to your application is ridiculously impossible. If your application is successful and you would like to send emails to your millions of satisfied users, your options come down to:

  1. Use a physical hosted server.
  2. Use a 3rd party email service.
  3. Set up a mail server in the cloud and compromise on some/most being marked as spam.

For us cloud oriented developers, option 1 is as useful as somebody suggesting you’d use a cassette tape recorder to put your favorite songs on. It’s old, unreliable, can’t scale. Option 2 is very costly if your business is successful, and most of these services don’t deal with the amount of mails you need to send if you have a large scale user base. Option 3 will make your email communication efforts with your users almost non-existent, which means you can’t afford it as well. So your only option is to compromise somewhere.

Why is sending email from the cloud so difficult?

In order for your mail server to operate successfully and be trusted by mail services around the world, you need to abide by the following rules:

  1. Don’t be an open relay.
  2. Implement (and follow) SPF policy (and DKIM if possible).
  3. Have a PTR record that resolves back exactly to your mail server hostname.
  4. Don’t let your public IP address be listed in any RBLs.

Rule #1 is easily implemented in any mail server configuration, and there are also a number of online tools to test if you’re an open relay or not. Option #2 is also pretty easy to implement, assuming you control your DNS zone files and know your way around it.

The problem of mail on the cloud begins with rules #3 and #4. A PTR record, which is a reverse DNS entry, must be present and correct for your mail server to not be considered spammy. If your mail server is at 1.2.3.4 and is called mail.example.com, the PTR query for 1.2.3.4 (well, for 4.3.2.1.in-addr.arpa) must return mail.example.com. The PTR record can only be changed by the owner of the IP address, or by a delegation of his authority to you. Amazon Web Services do not let you control PTR records, so there goes the option for a mail server on EC2.

Other clouds let you control the PTR records for the IP addresses they assigned to you. But they fail on Rule #4. While your specific IP address might not be blacklisted in RBLs, the entire block that it belongs to might be blacklisted, because these IP addresses are assigned dynamically and therefore are always suspected as spammy by these lists. This is the case with Rackspace Cloud for example, and is the only thing left to be solved before you can run a mail server there. And although they’re trying to get their address block de-listed, this problem still persists.

Other clouds I’ve examined in this space are GoGrid and Joyent. GoGrid want you to fill up a questionnaire, and only then they open up port 25 for you. This sounds absurd, and against all the on-demand nature of the cloud (and I also personally don’t trust ServePath, the company that operates GoGrid). Joyent’s offering seem to disregard the option of hosting a mail server with them, and I couldn’t get their response on this matter.

So unless Rackspace Cloud solve their IP block blacklisting problem, or AWS offer a PTR setting option (plus no blacklisting as well), we’re left with the need to compromise.

The only feasible solution right now — seems like it’s back to physical hosting.

Improving WordPress Caching: Batcache and Redirects

BatmanThere’s a great plugin for WordPress called Batcache by Andy Skelton, which uses memcached as the cache backend for object cache and full page caches. It’s a great piece of code, and using it is a must if you want a scalable distributed cache for your WordPress install (avoiding the local file system caching WP-Super-Cache offers).

I’ve been running batcache for a while, and noticed that although I set up a query-heavy homepage to be cached for a long while, the homepage query was still running every couple of minutes. So after some digging and logging I came to realize that batcache was missing an important part — caching redirects.

Why is it important to cache redirects? Here’s an example. If I go to http://example.com/, my homepage will be cached. But when I go to http://www.example.com/, the homepage query still runs. This is because the default www to no-www redirect happens at ‘template-redirect’ action time (with canonical_redirect()), which occurs after the main posts query is run. And, batcache wouldn’t cache the redirect itself nor would it associate the www with the no-www version (it differentiates pages by a couple of parameters, including HTTP_HOST, which is different in this case). So any request to http://www.example.com would actually first run the homepage query, then redirect to http://example.com and serve the cached version only then.

After a short correspondence with Andy, he committed a changeset to advanced-cache.php (and readme.txt ;) ) that now enables to cache redirects (off by default, be sure to turn it on!). Great to see such responsiveness from a plugin author, and I hope it helps others as well.

Cron Script To Snapshot Any Attached EBS Volume

If you would like to cron snapshots of any attached volume to an instance, you can use the following script. It uses the EC2 command line tools to see what volumes are currently attached to this instance, and takes a snapshot. Make sure to replace all the variables on the top of the script to match your own.

#!/bin/bash

export JAVA_HOME=/usr/java/default
export EC2_HOME=/vol/snap/ec2-api-tools-1.3-26369
export EC2_PRIVATE_KEY=[PATH-TO-YOUR-PRIVATE-KEY]
export EC2_CERT=[PATH-TO-YOUR-CERT]
export AWS_ACCESS_KEY_ID="[YOUR-ACCESS-KEY]"
export AWS_SECRET_ACCESS_KEY="[YOUR-SECRET-KEY]"

INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
echo "Instance ID is $INSTANCE_ID"
VOLUMES=`$EC2_HOME/bin/ec2-describe-volumes | grep "ATTACHMENT" | grep "$INSTANCE_ID" | awk '{print $2}'`
echo "Volumes are: $VOLUMES"

for VOLUME in $VOLUMES; do
        echo "Snapping Volume $VOLUME"
        DEVICE=`$EC2_HOME/bin/ec2-describe-volumes $VOLUME | grep "ATTACHMENT" | grep "$INSTANCE_ID" | awk '{print $4}'`
        echo "Device is $DEVICE"
        MOUNTPOINT=`df | grep "$DEVICE" | awk '{print $6}'`
        echo "Mountpoint is $MOUNTPOINT"

        # Snapshot
        SNAPSHOT_ID=`$EC2_HOME/bin/ec2-create-snapshot $VOLUME`

        echo "Snapshotted: $SNAPSHOT_ID"

done

If you’re wondering why $MOUNTPOINT is important (it’s not used here after all), it’s because you might want to freeze your filesystem if it’s XFS, so you could safely take a snapshot of a MySQL database for example. So you could easily wrap the snapshot create command with this:

        # freeze
        xfs_freeze -f $MOUNTPOINT

        # Snapshot
        SNAPSHOT_ID=`$EC2_HOME/bin/ec2-create-snapshot $VOLUME`

        # unfreeze
        xfs_freeze -u $MOUNTPOINT

And if you are indeed using this script to snapshot a volume with MySQL on it, you need also to flush tables with read lock, and gather information on master and slave positions. For this task you can use Eric Hammond‘s script, and incorporate it to the cron script. (You can read more about MySQL and XFS on EC2 on the AWS site).

Apache mod_ssl Makes Your Clients Crawl Compared To Nginx SSL Module

The SSL/TLS process is a heavy one, it involves algorithm negotiation between client and server, key exchanges, cyphering, decyphering and authentication. But what’s surprising is, that the server you’re connecting to can directly influence the performance of your client and its CPU consumption.

I had a php command line process spawning child processes and connecting through SSL to a web server, in 2 scenarios. The first scenario was to an out of the box Apache httpd server with mod_ssl, and the second scenario was to an out of the box Nginx with the SSL module. Both were using the exact same box, and were “out of the box” meaning I used the default configuration for both.

In the first scenario I was able to spawn no more than 6 (!) php processes before the box running them began to show load, and the CPU queue started to fill up. Each php child was taking between 15%-30% cpu at any given moment.

In the second scenario, I was able to spawn 40 (!!) php child processes without the box being loaded. Each php child was taking around 1.5% cpu.

I’m no SSL expert, and there might be a way to configure Apache to inflict less load on the connecting client. There is also SSLSessionCache which might relieve load from both the server and the client. But the “out of the box” configuration shows that Nginx is a real winner again.

If you can, avoid SSL altogether. If not, terminate it at a front-end before proceeding to Apache.

Google Chrome 2, CSS Content-Type and Amazon S3

Google ChromeIt seems that ever since Google Chrome 2 was released, some of the CSS files I was serving from S3 were not being treated as valid by it, and the page layouts would break because of it. Firefox and IE were fine with it, and Chrome 1 was ok with it too. It was just Chrome 2.

A little inspection showed that the CSS files I stored on S3 were not being served with a Content-Type header, while from a standard apache web server they were. This combined with the new strictness of Chrome 2 (actually resulting from a new strictness in WebKit) made Chrome not treat these files as actual CSS, and break the page.

So the obvious solution was to make the CSS files be delivered from S3 with the correct “Content-Type: text/css” header. Fortunately enough, this is very easy to do with S3 API. Just pass the “Content-Type:text/css” header when you’re uploading the file to S3, and it will be present in the response headers whenever someone requests the file.

Here’s to the browser wars, that never end and got more complicated with the new player in town, Google Chrome.

Detaching Infrastructure From Physical Hosts: Fantasy vs. Reality

Dead Harddrive
Image via http://www.flickr.com/photos/martinlatter/

Cloud computing has brought along the promise of easy-to-scale-and-yet-affordable computer clusters. There are various clouds out there that provide Infrastructure as a Service, such as Amazon EC2, Google App Engine, Mosso, and the newcomer Force.com Sites to name a few. I personally have experience as a developer only with Amazon EC2, and I am a devoted fan and user of the entire AWS stack. Nonetheless, I believe that what I have to say here is relevant to all other platforms.

While the cloud and IaaS model have indeed many significant advantages over traditional physical hosting, there is one major annoyance still to overcome in this space, and that is: your virtual host is still connected to a physical machine. And that machine is non-redundant, it doesn’t have any hot backup, and there’s no way to transparently and hassle-free fail over from it once its malfunctioning. And this is why, from time to time I get this email from Amazon:

Hello,

We have noticed that one or more of your instances are running on a host degraded due to hardware failure.

i-XXXXXXXX

The host needs to undergo maintenance and will be taken down at XX:XX GMT on XXXX-XX-XX. Your instances will be terminated at this point.

The risk of your instances failing is increased at this point. We cannot determine the health of any applications running on the instances. We recommend that you launch replacement instances and start migrating to them.

Feel free to terminate the instances with the ec2-terminate-instance API when you are done with them.

Let us know if you have any questions.

Sincerely,

The Amazon EC2 Team

At this stage, this is one of the greatest shortcomings of EC2 from my point of view. As a customer of EC2, I don’t want to care if a host has hardware failure. Why can’t my instance just be mirrored somewhere else, consistent hot-backup style, and upon failure of host hardware be transparently switched to the backup host? I don’t care paying the extra buck for this service.

In my vision, in a true IaaS cloud there is no connection between the virtual machine and the physical host. The virtual machine is truly floating in the cloud, unbound to the physical realm by means of some consistent mirroring across physical hosts.

And you might be thinking “you can implement this on your own on the existing infrastucture that EC2 offers”, and “you should be prepared for any instance going poof”. And you are correct, at the current offering of EC2, this is the case. You always have to be prepared for an instance failure (in the last month, I had 2 physical hosts failure out of about 20, that’s about a monthly 10% (!!) ), and you always have to build your architecture so that a single host failure can fail over gracefully. But were my vision a reality, I wouldn’t have to worry about these things, and wouldn’t have to spend time and money on the overhead that they incur.

I am not certain that this is the situation in the other clouds, but if it is not, it might come with the price of less flexibility, which is a major part of EC2 on which I am not willing to give up. If that flexibility can be maintained, I would love to see my vision become a reality on EC2.