ERROR 2026 (HY000): SSL connection error — The Solution

If you’re using mysql server with SSL, and ever encountered this error whenever you try to access the database after it’s running:

ERROR 2026 (HY000): SSL connection error

One thing to check is if your client certificate and server certificate have the same common name. You’ve probably went through the certificate generation procedure, and (like I did) just entered the same common name for both without noticing.

This is a nasty error message that doesn’t tell you anything, and there’s nothing in the error log to imply what went wrong. So remember – when generating your own certificates for a mysql server, use different common names for client and server!

Improving WordPress Caching: Batcache and Redirects

BatmanThere’s a great plugin for WordPress called Batcache by Andy Skelton, which uses memcached as the cache backend for object cache and full page caches. It’s a great piece of code, and using it is a must if you want a scalable distributed cache for your WordPress install (avoiding the local file system caching WP-Super-Cache offers).

I’ve been running batcache for a while, and noticed that although I set up a query-heavy homepage to be cached for a long while, the homepage query was still running every couple of minutes. So after some digging and logging I came to realize that batcache was missing an important part — caching redirects.

Why is it important to cache redirects? Here’s an example. If I go to http://example.com/, my homepage will be cached. But when I go to http://www.example.com/, the homepage query still runs. This is because the default www to no-www redirect happens at ‘template-redirect’ action time (with canonical_redirect()), which occurs after the main posts query is run. And, batcache wouldn’t cache the redirect itself nor would it associate the www with the no-www version (it differentiates pages by a couple of parameters, including HTTP_HOST, which is different in this case). So any request to http://www.example.com would actually first run the homepage query, then redirect to http://example.com and serve the cached version only then.

After a short correspondence with Andy, he committed a changeset to advanced-cache.php (and readme.txt ;) ) that now enables to cache redirects (off by default, be sure to turn it on!). Great to see such responsiveness from a plugin author, and I hope it helps others as well.

Apache mod_ssl Makes Your Clients Crawl Compared To Nginx SSL Module

The SSL/TLS process is a heavy one, it involves algorithm negotiation between client and server, key exchanges, cyphering, decyphering and authentication. But what’s surprising is, that the server you’re connecting to can directly influence the performance of your client and its CPU consumption.

I had a php command line process spawning child processes and connecting through SSL to a web server, in 2 scenarios. The first scenario was to an out of the box Apache httpd server with mod_ssl, and the second scenario was to an out of the box Nginx with the SSL module. Both were using the exact same box, and were “out of the box” meaning I used the default configuration for both.

In the first scenario I was able to spawn no more than 6 (!) php processes before the box running them began to show load, and the CPU queue started to fill up. Each php child was taking between 15%-30% cpu at any given moment.

In the second scenario, I was able to spawn 40 (!!) php child processes without the box being loaded. Each php child was taking around 1.5% cpu.

I’m no SSL expert, and there might be a way to configure Apache to inflict less load on the connecting client. There is also SSLSessionCache which might relieve load from both the server and the client. But the “out of the box” configuration shows that Nginx is a real winner again.

If you can, avoid SSL altogether. If not, terminate it at a front-end before proceeding to Apache.

Google Chrome 2, CSS Content-Type and Amazon S3

Google ChromeIt seems that ever since Google Chrome 2 was released, some of the CSS files I was serving from S3 were not being treated as valid by it, and the page layouts would break because of it. Firefox and IE were fine with it, and Chrome 1 was ok with it too. It was just Chrome 2.

A little inspection showed that the CSS files I stored on S3 were not being served with a Content-Type header, while from a standard apache web server they were. This combined with the new strictness of Chrome 2 (actually resulting from a new strictness in WebKit) made Chrome not treat these files as actual CSS, and break the page.

So the obvious solution was to make the CSS files be delivered from S3 with the correct “Content-Type: text/css” header. Fortunately enough, this is very easy to do with S3 API. Just pass the “Content-Type:text/css” header when you’re uploading the file to S3, and it will be present in the response headers whenever someone requests the file.

Here’s to the browser wars, that never end and got more complicated with the new player in town, Google Chrome.

Cross Domain AJAX Calls and Iframe Communication How To

I set out to write a short tutorial about cross domain ajax calls and communication between iframes using url hashes (fragments), but I found this great tutorial by Michael Mahemoff covering the subject inside and out.

It is important to note that the code on his demo uses window.location.hash, although it seems this practice should currently be avoided because of a known open bugs in the way Firefox URI-decodes document.location.hash when getting its value (could be considered an expected behavior and not a bug, but still different than any other browser). Facebook’s code uses substring on pathname, and it seems to work cross browser. Facebook also have on their wiki a nice visual explanation of how it works:

Until the HTML 5 postMessage is widely adopted, we will have to keep using this somewhat of a hack, but it works, that’s what’s important.

Network Latency Inside And Across Amazon EC2 Availability Zones

I couldn’t find any info out there comparing network latency across EC2 Availability Zones and inside any single Availability Zone. So I took 6 instances (2 on each US zone), ran some test using a simple ping, and measured 10 Round Trip Times (RTT). Here are the results.

Single Availablity Zone Latency

Availability Zone Minimum RTT Maximum RTT Average RTT
us-east-1a 0.215ms 0.348ms 0.263ms
us-east-1b 0.200ms 0.327ms 0.259ms
us-east-1c 0.342ms 0.556ms 0.410ms

It seems that at the time of my testing, zone us-east-1c had the worst RTT between 2 instances in it, almost twice as slow as the other 2 zones.

Cross Availablity Zone Latency

Availability Zones Minimum RTT Maximum RTT Average RTT
Between us-east-1a and us-east-1b 0.885ms 1.110ms 0.937ms
Between us-east-1a and us-east-1c 0.937ms 1.080ms 1.031ms
Between us-east-1b and us-east-1c 1.060ms 1.250ms 1.126ms

It’s worth noting that in cross availability zones traffic, the first ping was usually off the chart, so I disregarded it. For example, it could be anywhere between 300ms to 400ms, and the the rest would fall down to ~0.300. Probably some lazy routing techniques by Amazon’s routers.

Conclusions

  1. Zones are created different! — At least at the time of the testing, if you have a cluster on us-east-1b it performs almost twice as fast with regards to RTT between machines than a cluster on us-east-1c.
  2. Cross Availability Zones latency can be 6 times higher than inner zone latency. For a network intensive application, better keep your instances crowded in the same zone.

I should probably also make a throughput comparison between and across Availability Zones. I promise to share if I get to test it.

Connecting Several PHP Processes To The Same MySQL Transaction

The following concept is still under examination, but my initial tests proved successful, so I thought it’s time to share.

Here is a problem: There is a batch job reading remote data through XML-RPC, and updating a local MySQL database according to the XML-RPC responses. The db is InnoDB, and the entire batch job should be transacted. That is, in any case of failure, there should be rollback, and on success there should be commit.

So, the simple way is of course a single process script that uses a linear workflow:

  1. START TRANSACTION locally.
  2. Make XML-RPC request, fetch data.
  3. Insert data into db as needed.
  4. Repeat 2-3 until Error or Finish.
  5. If Error ROLLBACK, if Finish COMMIT.

This works, but you may notice a bottleneck, being the XML-RPC request. It’s using http, and it’s connecting to a remote server. Sometimes the XML-RPC server also takes time to perform the work that generates the response. Add the network latency, and you get a single process that most of the time sits idle and waits for response.

So if we have a process that just sits and waits most of the time, let’s spread its work over several processes, and assume that while most of the processes will be waiting, at least one can be free to deal with the local database. This way we will get maximum utilization of our resources.

So the multi-process workflow:

  1. START TRANSACTION locally.
  2. Fork children as necessary.
  3. From child, make XML-RPC request, fetch data.
  4. From child, acquire database access through semaphore.
  5. From child, insert data into db as needed.
  6. From child, release database access through semaphore.
  7. From child, repeat 3-6 until Error or Finish.
  8. From parent, monitor children until Error or Finish.
  9. From parent, if Error ROLLBACK, if FINISH COMMIT.

Now, the workflow seems all and well in theory, but can it work in practice? Can we connect to the same transaction from several different PHP processes?

I was surprised to find out that the answer is positive. As long as all processes share the same connection resource, they all use the same connection. And in MySQL, the same connection means the same transaction, given that a transaction was started and not yet committed or rolled back (either explicitly or implictly).

The secret is to create the connection resource with the parent, and when forking children, they have a reference to the same connection. The caveat is that they must access the resource atomically, otherwise unexpected behavior occurs (usually the connection hangs, I am guessing that it is when one child tries to read() from the socket and the other to write() to it). So in order to streamline the access to the db connection, we use a semaphore. Each child can access the connection only when it’s available, and it’s blocking if not available.

In the end of the workflow, our parent process acts much like a Transaction Manager in an XA Transaction, and according to what the children report, decides whether to commit or rollback.

Here is a proof of concept code (not tested in this version, but similar code tested and succeeded):

The DBHandler Class

class DBHandler
{
	private $link;
	private $result;
	private $sem;

	const SEMKEY = '123456';

	public function __construct($host, $dbname, $user, $pass, $new_link = false, $client_flags = 0)
	{
		$this->link = mysql_connect($host, $user, $pass, $new_link, $client_flags);
		if (!$this->link)
			throw new Exception ('Could not connect to db. MySQL error was: '. mysql_error());
		$isDb = mysql_select_db($dbname,$this->link);
		if (!$isDb)
			throw new Exception ('Could not select db. MySQL error was: '. mysql_error());
	}

	private function enterSemaphore()
	{
		$this->sem = sem_get(self::SEMKEY,1);
		sem_acquire($this->sem);
	}

	private function exitSemaphore()
	{
		sem_release($this->sem);
	}


	public function query($sql)
	{
		$this->enterSemaphore();

		$this->result = mysql_unbuffered_query($sql, $this->link);
		if (!$this->result)
			throw new Exception ('Could not query: {' . $sql . '}. MySQL error was: '. mysql_error());
		if ($this->result === true)
		{
			// INSERT, UPDATE, etc..., no result set
			$ret = true;
		}
		else
		{
			// SELECT etc..., we have a result set
			$retArray = array();
			while ($row = mysql_fetch_assoc($this->result))
				$retArray[] = $row;
			mysql_free_result($this->result);
			$ret = $retArray;
		}

		$this->exitSemaphore();

		return $ret;
	}

	public function beginTransaction()
	{
		$this->query('SET AUTOCOMMIT = 0');
		$this->query('SET NAMES utf8');
		$this->query('START TRANSACTION');
	}

	public function rollback()
	{
		$this->query('ROLLBACK');
	}

	public function commit()
	{
		$this->query('COMMIT');
	}
}

The Forking Process

$pid = 'initial';
$maxProcs = $argv[1];
if (!$maxProcs)
{
	 $maxProcs = 3;
}
$runningProcs = array(); // will be $runningProcs[pid] = status;
define('PRIORITY_SUCCESS','-20');
define('PRIORITY_FAILURE','-19');

try
{
	$dbh = new DBHandler(DBHOST,DBNAME,DBUSER,DBPASS);

	$dbh->beginTransaction();

		// fork all needed children
		$currentProcs = 0;
		while ( ($pid) && ($currentProcs < $maxProcs))
		{
			$pid = pcntl_fork();
			$currentProcs++;
			$runningProcs[$pid] = 0;
		}

		if ($pid==-1)
		{
			throw new Exception ("fork failed");
		}
		elseif ($pid)
		{
			// parent
			echo "+++ in parent +++n";
			echo "+++ children are: " . implode(",",array_keys($runningProcs)) . "n";

			// wait for children
			// NOTE -- here we do it with priority signaling
			// @TBD -- posix signaling or IPC signaling.
			while (in_array(0,$runningProcs))
			{
				if (in_array(PRIORITY_FAILURE,$runningProcs))
				{
					echo "+++ some child failed, finish waiting for children +++n";
					break;
				}
				foreach ($runningProcs as $child_pid => $status)
				{
					$runningProcs[$child_pid] = pcntl_getpriority($child_pid);
					echo "+++ children status: $child_pid, $status +++n";
				}
				echo "n";
				sleep(1);
			}

			echo "+++ checking if should commit or rollback +++n";
			if (in_array(PRIORITY_FAILURE,$runningProcs) || in_array(0,$runningProcs))
			{
				echo "+++ some child had problem! rollback! +++n";
				$dbh->rollback();
			}
			else
			{
				echo "+++ all my sons successful! committing! +++n";
				$dbh->commit();
			}

			// signal all children to exit
			foreach ($runningProcs as $child_pid => $status)
			{
				echo "+++ killing child $child_pid +++n";
				posix_kill($child_pid,SIGTERM);
			}
		}
		else
		{
			// child
			$mypid = getmypid();
			echo "--- in child $mypid ---n";
			//sleep(1);
			echo "--- child $mypid current priority is " . pcntl_getpriority() . " ---n";

			// NOTE -- following queries do not work, for example only
			$dbh->query("select ...");

			echo "--- child $mypid finished, setting priority to success and halting ---n";
			pcntl_setpriority(PRIORITY_SUCCESS);
			while (true)
			{
				echo "--- child $mypid waiting to be killed ---n";
				sleep(1);
			}
		}

}
catch (Exception $e)
{
	// output error
	print "Error!: " . $e->getMessage() . "n";

	// if parent -- rollback, signal children to exit
	// if child  -- make priority failure to signal
	if ($pid)
	{
		// rollback
		$dbh->rollBack();
		foreach ($runningProcs as $child_pid => $status)
			posix_kill($child_pid,SIGTERM);
	}
	else
	{
		pcntl_setpriority(PRIORITY_FAILURE);
		$mypid = getmypid();
		while (true)
		{
			echo "--- child $mypid waiting to be killed ---n";
			sleep(1);
		}
	}

}

Well, all of this sounds well, and also worked well on a development environment. But it should be taken out of the lab and tested on a production environment. Once I give it a shot, I will update with benchmarks.