ERROR 2026 (HY000): SSL connection error — The Solution

If you’re using mysql server with SSL, and ever encountered this error whenever you try to access the database after it’s running:

ERROR 2026 (HY000): SSL connection error

One thing to check is if your client certificate and server certificate have the same common name. You’ve probably went through the certificate generation procedure, and (like I did) just entered the same common name for both without noticing.

This is a nasty error message that doesn’t tell you anything, and there’s nothing in the error log to imply what went wrong. So remember – when generating your own certificates for a mysql server, use different common names for client and server!

Connecting Several PHP Processes To The Same MySQL Transaction

The following concept is still under examination, but my initial tests proved successful, so I thought it’s time to share.

Here is a problem: There is a batch job reading remote data through XML-RPC, and updating a local MySQL database according to the XML-RPC responses. The db is InnoDB, and the entire batch job should be transacted. That is, in any case of failure, there should be rollback, and on success there should be commit.

So, the simple way is of course a single process script that uses a linear workflow:

  1. START TRANSACTION locally.
  2. Make XML-RPC request, fetch data.
  3. Insert data into db as needed.
  4. Repeat 2-3 until Error or Finish.
  5. If Error ROLLBACK, if Finish COMMIT.

This works, but you may notice a bottleneck, being the XML-RPC request. It’s using http, and it’s connecting to a remote server. Sometimes the XML-RPC server also takes time to perform the work that generates the response. Add the network latency, and you get a single process that most of the time sits idle and waits for response.

So if we have a process that just sits and waits most of the time, let’s spread its work over several processes, and assume that while most of the processes will be waiting, at least one can be free to deal with the local database. This way we will get maximum utilization of our resources.

So the multi-process workflow:

  1. START TRANSACTION locally.
  2. Fork children as necessary.
  3. From child, make XML-RPC request, fetch data.
  4. From child, acquire database access through semaphore.
  5. From child, insert data into db as needed.
  6. From child, release database access through semaphore.
  7. From child, repeat 3-6 until Error or Finish.
  8. From parent, monitor children until Error or Finish.
  9. From parent, if Error ROLLBACK, if FINISH COMMIT.

Now, the workflow seems all and well in theory, but can it work in practice? Can we connect to the same transaction from several different PHP processes?

I was surprised to find out that the answer is positive. As long as all processes share the same connection resource, they all use the same connection. And in MySQL, the same connection means the same transaction, given that a transaction was started and not yet committed or rolled back (either explicitly or implictly).

The secret is to create the connection resource with the parent, and when forking children, they have a reference to the same connection. The caveat is that they must access the resource atomically, otherwise unexpected behavior occurs (usually the connection hangs, I am guessing that it is when one child tries to read() from the socket and the other to write() to it). So in order to streamline the access to the db connection, we use a semaphore. Each child can access the connection only when it’s available, and it’s blocking if not available.

In the end of the workflow, our parent process acts much like a Transaction Manager in an XA Transaction, and according to what the children report, decides whether to commit or rollback.

Here is a proof of concept code (not tested in this version, but similar code tested and succeeded):

The DBHandler Class

class DBHandler
{
	private $link;
	private $result;
	private $sem;

	const SEMKEY = '123456';

	public function __construct($host, $dbname, $user, $pass, $new_link = false, $client_flags = 0)
	{
		$this->link = mysql_connect($host, $user, $pass, $new_link, $client_flags);
		if (!$this->link)
			throw new Exception ('Could not connect to db. MySQL error was: '. mysql_error());
		$isDb = mysql_select_db($dbname,$this->link);
		if (!$isDb)
			throw new Exception ('Could not select db. MySQL error was: '. mysql_error());
	}

	private function enterSemaphore()
	{
		$this->sem = sem_get(self::SEMKEY,1);
		sem_acquire($this->sem);
	}

	private function exitSemaphore()
	{
		sem_release($this->sem);
	}


	public function query($sql)
	{
		$this->enterSemaphore();

		$this->result = mysql_unbuffered_query($sql, $this->link);
		if (!$this->result)
			throw new Exception ('Could not query: {' . $sql . '}. MySQL error was: '. mysql_error());
		if ($this->result === true)
		{
			// INSERT, UPDATE, etc..., no result set
			$ret = true;
		}
		else
		{
			// SELECT etc..., we have a result set
			$retArray = array();
			while ($row = mysql_fetch_assoc($this->result))
				$retArray[] = $row;
			mysql_free_result($this->result);
			$ret = $retArray;
		}

		$this->exitSemaphore();

		return $ret;
	}

	public function beginTransaction()
	{
		$this->query('SET AUTOCOMMIT = 0');
		$this->query('SET NAMES utf8');
		$this->query('START TRANSACTION');
	}

	public function rollback()
	{
		$this->query('ROLLBACK');
	}

	public function commit()
	{
		$this->query('COMMIT');
	}
}

The Forking Process

$pid = 'initial';
$maxProcs = $argv[1];
if (!$maxProcs)
{
	 $maxProcs = 3;
}
$runningProcs = array(); // will be $runningProcs[pid] = status;
define('PRIORITY_SUCCESS','-20');
define('PRIORITY_FAILURE','-19');

try
{
	$dbh = new DBHandler(DBHOST,DBNAME,DBUSER,DBPASS);

	$dbh->beginTransaction();

		// fork all needed children
		$currentProcs = 0;
		while ( ($pid) && ($currentProcs < $maxProcs))
		{
			$pid = pcntl_fork();
			$currentProcs++;
			$runningProcs[$pid] = 0;
		}

		if ($pid==-1)
		{
			throw new Exception ("fork failed");
		}
		elseif ($pid)
		{
			// parent
			echo "+++ in parent +++n";
			echo "+++ children are: " . implode(",",array_keys($runningProcs)) . "n";

			// wait for children
			// NOTE -- here we do it with priority signaling
			// @TBD -- posix signaling or IPC signaling.
			while (in_array(0,$runningProcs))
			{
				if (in_array(PRIORITY_FAILURE,$runningProcs))
				{
					echo "+++ some child failed, finish waiting for children +++n";
					break;
				}
				foreach ($runningProcs as $child_pid => $status)
				{
					$runningProcs[$child_pid] = pcntl_getpriority($child_pid);
					echo "+++ children status: $child_pid, $status +++n";
				}
				echo "n";
				sleep(1);
			}

			echo "+++ checking if should commit or rollback +++n";
			if (in_array(PRIORITY_FAILURE,$runningProcs) || in_array(0,$runningProcs))
			{
				echo "+++ some child had problem! rollback! +++n";
				$dbh->rollback();
			}
			else
			{
				echo "+++ all my sons successful! committing! +++n";
				$dbh->commit();
			}

			// signal all children to exit
			foreach ($runningProcs as $child_pid => $status)
			{
				echo "+++ killing child $child_pid +++n";
				posix_kill($child_pid,SIGTERM);
			}
		}
		else
		{
			// child
			$mypid = getmypid();
			echo "--- in child $mypid ---n";
			//sleep(1);
			echo "--- child $mypid current priority is " . pcntl_getpriority() . " ---n";

			// NOTE -- following queries do not work, for example only
			$dbh->query("select ...");

			echo "--- child $mypid finished, setting priority to success and halting ---n";
			pcntl_setpriority(PRIORITY_SUCCESS);
			while (true)
			{
				echo "--- child $mypid waiting to be killed ---n";
				sleep(1);
			}
		}

}
catch (Exception $e)
{
	// output error
	print "Error!: " . $e->getMessage() . "n";

	// if parent -- rollback, signal children to exit
	// if child  -- make priority failure to signal
	if ($pid)
	{
		// rollback
		$dbh->rollBack();
		foreach ($runningProcs as $child_pid => $status)
			posix_kill($child_pid,SIGTERM);
	}
	else
	{
		pcntl_setpriority(PRIORITY_FAILURE);
		$mypid = getmypid();
		while (true)
		{
			echo "--- child $mypid waiting to be killed ---n";
			sleep(1);
		}
	}

}

Well, all of this sounds well, and also worked well on a development environment. But it should be taken out of the lab and tested on a production environment. Once I give it a shot, I will update with benchmarks.

MMM (Mysql Master-Master Replication) on EC2

Maintaining a MySQL high availablity cluster is one of the first missions encountered when scaling web applications. Very quickly your application gets to the point where the one database machine you have is not enough to handle the load, and you need to make sure that when failure happens (and it always happens), your cluster is ready to failover gracefully.

Some basic MySQL replication paradigms

MySQL master-slave replication was one of the first architectures used for failovers. The rationale is that if a master fails. a slave can be promoted to master, and start handle the writes. For this you could use several combinations of ip tools and monitoring software, for example, iproute and nagios, or heartbeat and mon.

However, master-slave architecure for MySQL replication has several flaws, most notable are:

  • The need to manually take care of bringing the relegated master back to life, as a slave to the now-promoted master (this can be scripted, but usually contains many pitfalls when trying to automate as a script).
  • The possibility of failover during a crash, which can result in same transaction being committed both on the old master and the new master. Good luck then, when trying to bring back the master as a slave. You’ll most likely get some duplicate key failure because of auto increments on the last transaction when starting replication again, and then the whole database on the relegated master is useless.
  • The inablity to switch roles quickly. Say the master is on a better machine than the slave, and now there was a failover. How can you easily restore the situation the way it was before, with the master being on the better machine? Double the headache.

Along came master-master architecture, which in essence is an architecture which keeps two live masters at all times, with one being a hot standby for the other, and switching between them is painless. (Baron Schwartz has a very interesting post about why referring to master-master replication as a “hot” standby could be dangerous, but this is out of the scope of this post). One of the important things that lies in the bottom of this paradigm, is that every master works in its own scope in regards to auto-increment keys, thanks to the configuration settings auto_increment_increment and auto_increment_offset. For example, say you have two masters, call them db1 and db2, then db1 works on the odd auto-increments, and db2 on the even auto-increments. Thus the problem of duplication on auto increment keys is avoided.

Managing master-master replication with MMM

Master-master replication is easily managed by a great piece of perl code called MMM. Written initially by Alexey Kovyrin, and now maintained and being installed daily in production environments by percona, MMM is a management daemon for master-master clusters. It’s a convenient and reliable cluster management tool, which simplifies switching roles between machines in the cluster, takes care of their monitoring and health issues, and prevents you from doing stupid mistakes (like making an offline server an active master…).

And now comes the hard, EC2 part. Managing high availability MySQL clusters is always based on the ability to control machines ip addresses in the internal network. You set up a floating internal ip address for each of the masters, configure your router to handle these addresses, and you’re done. When the time comes to failover, the passive master sends ARP and takes over the active master’s ip address, and everything swtiches smoothly. It’s just that on EC2, the router part or any internal ip address can not be determined by you (I was told FlexiScale gives you the option to control ip addresses on the internal network, but I never got to testing it).

So how can we use MMM on EC2 for master-master replication?

One way is to try using EC2’s elastic ip feature. The problem with it, is that currently moving an elastic ip address from one instance to another, takes several minutes. Imagine a failover from active master to passive master in which you would have to wait several minutes for the application to respond again — not acceptable.

Another way is to use name resolving instead of ip addresses. This has major drawbacks, especially the non reliable nature of the DNS. But it seems to work if you can set up a name resolver that serves your application alone, and use the MMM ns_agent contribution I wrote. You can check out MMM source, and install the ns_agent module according to contrib/ns_agent/README.

I am happy to say that I currently have a testing cluster set up on EC2 using this feature, and up until now it worked as expected, with the exception of several false-positive master switching due to routing issues (ping failed). Any questions or comments on the issue are welcome, and you can also post to the devolpement group.

MySQL off-site replication and timezones

Recently we were doing some testing on a mysql slave server which was located at an off-site location from the master server. While the on-site slave was having no problems replicating, the off-site slave would once a day break replication on a duplicate entry error because of a unique key insert problem.

This was very weird, especially because the on-site slave was having no trouble at all. After some brainstorming aided by percona, we realized the cause. The off-site slave was in a different timezone than the other master and slave. This fact together with the fact that we had a unique key that contained curdate(), caused the following scenario:

  • On January 3, 23:57, there was an insert on table t, unique key was 2009-01-03.
  • On January 4, 00:01, there was an insert on table t, unique key was 2009-01-04

No problem, but — times are replicated as timestamps, so on the off-site slave, which was 1 hour ahead (EST instead of CST):

  • First insert was replicated as January 4, 00:57, insert on table t, unique key 2009-01-04
  • Second insert was replicated as January 4, 01:01, insert on table t, and error on unique key for 2009-01-04

The solution is either to set the whole machine’s timezone to the timezone of the master you’re replicating, or using the same time-zone and default-time-zone settings for the mysql server.