Have you been using MySQL for your web applications since your first foray into website development? Perhaps you now run, or are thinking about running, your own server and you will set up MySQL because that is what you have always used. Did you know there are alternatives? A NoSQL database is one of them, and it has its use cases, but that will need some consideration, and almost certainly some rewriting of your application, to take advantage of. Instead, let me talk to you about some databases that speak your applications language, namely MySQL's client server protocol.
A quick recap
MySQL AB was founded by Michael Widenius (Monty), David Axmark and Allan Larsson in 1995. After the 3.x and 4.x release series, MySQL 5.0 arrived on our servers in 2005, bringing in big changes with new features and improved performance over the 4.x releases. In 2008 Sun Microsystems acquired MySQL AB. After being critical of the release process for 5.1, Monty left Sun in 2009 to form Monty Program AB. Also in 2009, Oracle made a bid to acquire Sun, which reached final approval in 2010. Since then, Oracle have released MySQL 5.5 bringing scalability and monitoring improvements.
These changes of ownership and personnel changes, amid concerns over the direction of MySQL development, have sparked off a thriving community of databases based on the open source MySQL code, developed by people who have strong ideas of the direction they' would like to see MySQL take.
Percona Server with XtraDB
Percona was founded by Peter Zaitsev, one of the worlds foremost experts in high performance MySQL, having run the "High Performance MySQL" group within MySQL AB's support group from 2002 to 2006. Percona Server is a patched build of MySQL with Percona's XtraDB storage engine, which itself is based on MySQL's InnoDB storage engine. The patches applied have been developed to solve real world problems encountered by the Percona team. These patches are readily available and quite often are integrated back into MySQL releases. The XtraDB storage engine aims to improve the performance and scalability above that of the InnoDB storage engine by taking advantage of hardware features that weren't available or were relatively uncommon when InnoDB was first designed, such as multiple processor cores.
Monty formed Monty Program AB to, amongst other things, continue developing MySQL in the direction he thought it should go. To achieve this, he forked MySQL and named the fork MariaDB. Key aims of the MariaDB project are to create a more open development model to encourage community participation and to provide a complete drop-in replacement for MySQL. Features from the MySQL code base are regularly merged to bring in fixes and to help maintain compatibility, although there has been some criticism of how quickly this happens. On top of this MariaDB looks to improve performance and add new and better features.
In 2008 Brian Aker, who was previously MySQL's Director of Architecture, forked MySQL to create the Drizzle project. The Drizzle project follows a similar process to the development process of the Linux Kernel, promoting control by the community over control by a company, both at the code level and at the developer level. It's designed as a pluggable micro-kernel promoting easy extension. While Drizzle was initially forked from the MySQL 6.0 code base, effort has been made to favour performance over compatibility with MySQL, particularly with reference to scalable performance for multiple cores and cloud deployments.
Which one should I use?
After reading this article, resist the temptation to go running off and install one of the alternatives on your production hardware! If after reading this article you do decide to investigate migrating, I would highly recommend benchmarking the performance, stability and compatibility of the alternatives against your application. In fact, it is less of a recommendation and more of a requirement. If you take anything away from this article, it should be that each of the teams have concentrated on different areas of MySQL. Which one works best for you depends on your application workload. What works best for sustained throughput of short running simple queries isn't necessarily going to work best for lots of long running complex queries. Whether they continue to diverge to fulfil different needs or whether one of the projects manages to successfully integrate the best features of all projects to create an über-database remains to be seen. If that happens, then you get to have it all, but for now to find the best database for you, you will need to benchmark, benchmark, benchmark!
Percona Server and MariaDB
Both Percona and MariaDB are binary compatible. You can grab a tarball of either, extract it stop MySQL, start Percona/MariaDB, load up your client and start running queries like you'd just restarted MySQL. This is a great starting point to test your application and see what improvements you get out of the box, it is also very good for benchmarking as it means you don't have to migrate your data to make comparisons.
The Drizzle acorn has fallen furthest from the oak that is MySQL. While they share common ancestry, Drizzle is no longer binary compatible with MySQL and therefore more care needs to be taken over migration. Here's what most people will need to know:
- Drizzle supports InnoDB, but the InnoDB file formats are different. This means you have to transform your data, you can't just start Drizzle and point it at your MySQL files.
- MyISAM is supported only for explicitly created temporary tables, and is expected to be completely unavailable in a future release.
- MERGE, FEDERATED, ARCHIVE and CSV storage engines have all been removed.
- Triggers and views aren't supported natively, however appropriate hooks are provided to allow these to be added as plugins
- MySQL's stored procedures implementation has been deemed to be less than optimal and has been removed. There's no indication that this will be reimplemented, they are favouring improved batching of commands over the wire to reduce the time row locks are held, which is the main advantage of stored procedures in MySQL.This decision also fits with the Drizzle aim to reduce database complexity.
- Some functions, commands and keywords have been removed
- Some objects have been removed. A rule of thumb as to what has been removed is where there were previosuly TINY-, SMALL- and MEDIUM- types, there is now one optimised type. Aliases have also been removed, e.g. SET has been removed leaving ENUM. A comprehensive list of removed objects is available.
Most of these changes are likely to affect your database administration more than your application implementation, sticking to the idea that you should still be able to drop Drizzle in with little or no change to your application.
To simplify the migration of data from MySQL to Drizzle, Drizzle provide the drizzledump tool. While designed for Drizzle backup, it can connect to a MySQL server and produce a Drizzle-compatible export. It manages the above compatibility issues for you, e.g. conversion of data types in table definitions. If you've got both servers live, drizzledump can even import from MySQL straight into Drizzle without an intermediary file.
Not only do the alternatives offer improved performance for some use cases, but they also offer additional features that are not present in MySQL. Deciding to take advantage of these features means you move further away from MySQL compatibility and portability, but if there is a killer feature you want to use in your application then adopt the alternative but be aware of this trade-off. While you're still deciding on the best database server for you, you need to track and isolate these changes to allow you to switch between the alternatives. I'd suggest creating a branch, or equivalent, in your version control system for application change. It's worth version controlling your database config as well, if you're not already (you can read more about this in Harrie's post about database version control). Even better you could use a DevOps tool like chef or puppet to create reproducible database deployments, but that's a subject for another article.
Percona server with XTRADB
Percona Server comes with XtraDB, a drop-in replacement for the InnoDB storage engine. It offers compatibility with InnoDB, so everything you can do with InnoDB works with XtraDB. The differences are improved performance, which we'll cover shortly, diagnostics and reliability. If you've ever had problems due to corrupt InnoDB tables you'll be pleased to know that XtraDB gives you a corrupt table error rather than a crash.
HandlerSocket is a MySQL daemon plugin that gives you a NoSQL interface to MySQL storage engines. It is fast! Benchmarks have shown it supporting nearly twice the number of queries per second as memcached. That's right, not twice as fast as MySQL client access (it is nearly 7 times as fast actually) but nearly twice as fast as memcached.
Percona Server comes with the HandlerSocket plugin built and ready to run with just a couple of lines of config. The HandlerSocket plugin can be built for MySQL as well, but Percona Server ships with it ready to go and includes documentation on its configuration so, for that effort, it goes down as a feature for Percona Server.
Diagnostics and Monitoring
Percona's business is analysing MySQL installations with performance problems and providing solutions. This wouldn't be possible without being able to first identify the problem. While MySQL provides some indicators for status and performance, Percona found that they fell short so, during the course of their work, they've written many improvements and additions to the reporting of Percona Server status and performance. Percona Server now offers over 20 more tables in INFORMATION_SCHEMA and 60 more status and performance counters, along with breakdowns of performance per table, per index, per client and per thread, and this is still not a complete list of diagnostic capabilities added in Percona Server! A MySQL feature that you may be familiar with, the slow query log, has also had some additions, bringing microsecond precision to query run times, additional query statistics and the ability to enable and disable it at runtime. With this information more readily available, application and database optimisation becomes simpler, bottlenecks become easier to identify and problems are solved faster, all of which are as important to application development as squeezing better performance out of MySQL out of the box.
MariaDB offers additional storage engines which can provide optimisations for specific data models.
- Aria: Intended to be a fully transactional storage engine comparable with InnoDB and planned to be the default storage engine for MariaDB
- XtraDB: This is the same engine as discussed above and developed by Percona
- PBXT: A transactional storage engine originally developed by Primebase, now merged into the MariaDB development tree and maintained there
- FederatedX: A drop in replacement for the Federated storage engine
- OQGRAPH: A query engine designed to handle heirarchies and complex graphs. More information can be found at http://openquery.com/graph/doc.
- SphinxSE: While implemented as a storage engine, this plugin actually provides a connection to a sphinx search server.
- IBMDB2I:The MySQL IBMDB2I storage engine that was removed in MySQL 5.1.55
HandlerSocket and HANDLER statement
MariaDB also comes bundled with HandlerSocket as I mentioned above for Percona Server. It also takes direct storage access one step further by adding a HANDLER statement allowing for direct reads from storage engines via key lookup, allowing low level access that bypasses the query optimiser. This can offer up to 50% performance increase, but beware that it doesn't offer consistency. There are also a few cases where it might end badly; as always with great power comes great responsibility.
Virtual columns and dynamic columns
If you have used Oracle or MSSQL you may already be familiar with virtual columns. For those who haven't seen them before, they allow a derived column in your table. They operate in two modes, VIRTUAL and PERSISTENT. The behave the same but are implemented slightly differently. PERSISTENT virtual columns are calculated and written on row writes, whereas VIRTUAL virtual columns are calculated when the row is read. Which you use depends on whether you need to improve performance on read or write.
Dynamic columns allow you to store different columns for each row in a table. One use case for this is a products table holding multiple types of products, each with different attributes. Often this is handled by creating an EAV structure maintained by an application, however dynamic rows can provide a simpler solution (I know it looks like NoSQL, but it's not, promise ;) ). It is implemented by storing the dynamic columns in a blob and providing native functions for accessing this data. While the current implementation is stable and usable, there are some limitations and improvements are planned to address these limitations. The aim is to allow dynamic columns to be almost indistinguishable from normal columns.
Drizzle's focus on performance and simplification led to the Drizzle team designing a microkernel architecture, supporting plugins to provide features rather than implementing them in the core. This simplifies the process of adding features which will hopefully cultivate an ecosystem of Drizzle plugins and Drizzle plugin developers. It's clear that developing a thriving development community around Drizzle is the intention, as it's been made as simple as possible to develop and build plugins. As noted by Padraig O'Sullivan, this is 'freaking awesome'.
To date there are 75 official Drizzle plugins covering everything from authentication (Schema, LDAP, HTML, File, Allow all) to server protocols (MySQL, Drizzle) to storage engines (InnoDB, MyISAM, Memory). There is some criticism to this approach, as people are used to monolithic applications and expect to features to be available without needing to decide which plugins are enabled, with authentication being the usually-cited example. Personally I like the plugin approach, but it's not for everyone. As almost everything is a plugin, most plugins provide functionality that's present in the core of MySQL, so I won't go into those. Here are the best of the rest:
Authentication is no longer bound to a table in the MySQL schema. One of the most interesting authentication plugins is the LDAP plugin, allowing you to use your central LDAP server for authentication rather than maintaining a separate list of credentials within your database server, however this can be complex if you don't already use LDAP in your organisation. For development you might want to use the PAM plugin so you can log in with your usual linux credentials. If you're happy with the MySQL way of doing things, the Schema authentication plugin allows authenticating against credentials stored in the database. Whichever authentication scheme you choose, you'll probably want to disable the "Allow All" plugin which allows unrestricted access.
With the console plugin enabled you can connect directly to the database, no client required. As this is like most other *nix programs, simply connecting STDIN and STDOUT, this allows for piping input and output directly to and from Drizzle without an intermediary client.
For most applications, caching is a Good Idea (TM). Memcached is commonly used in the same stack as MySQL, and Drizzle has a few tricks up its sleeve to make this easier. The normal access pattern is:
- query the cache
- if a key isn't present in the cache, query the database
- once we've got the information out of the database, push it back into the cache
With the Drizzle memcached_functions plugin, we can send the data off to memcached in the same SQL statement we use to retreive the data. This gives us a couple of interesting possibilities. How about a trigger to prime the cache when data is inserted into Drizzle, or to invalidate cache when rows are deleted? The only slight issue with this is that there isn't a plugin that implements triggers yet, but I'm sure one will be along shortly.
Drizzle provides plugins to publish transactions to both RabbitMQ and ZeroMQ. If you're currently polling your database for new data, this would allow you to move the load from your database to a message queue, providing better scalability. It could also be used as a basis for replication from Drizzle to other data stores.
If the query logging plugin is loaded, by default it logs nothing. This can be controlled at runtime allowing for logging of specific queries or globally at specific times or events. This means that it only logs on demand, and importantly does not require a server restart to turn it on or off. It's also possible to log queries out to a Gearman server for processing. There is at least one person out there using Gearman to integrate Drizzle with MySQL Enterprise Monitor.
Replication has had a complete overhaul from its MySQL origins. Replication is implemented by storing replication events as Google Protocol Buffer messages in an InnoDB table. Storing replication events in a standardised format in a readily accessible table means that writing replication plugins and clients becomes much easier, simplifying the replication of data between different formats and technologies.
As I already mentioned, each variant has its strengths and weaknesses. Currently, none of the contenders, including MySQL, offer a complete best, or worst, performance profile for all use cases. If I benchmarked all four databases on identical hardware for various workloads with various levels of concurrency I could easily be here for weeks. So to make everything I've written about features and compatibility relevant by the time you read this, I'm going to give you the distilled highlights and aims for each project. I also can't stress enough that performance is extremely dependant on workload, meaning that the only way to be sure which database server will work best for you is to benchmark it against your application. So let us take a look at the contenders:
The bulk of Percona Server's performance improvements come from its XtraDB storage engine. Percona identified a situation in MySQL and InnoDB where throughput would steadily increase and then plummet, stalling briefly, before building back up, continually repeating this cycle of build up and stall. XtraDB has made big strides in addressing this problem and will now sustain high throughput, often at higher peak performance, without suffering the same stalling. If your workload requires a high sustained throughput, Percona Server is likely to be a good choice. If your database activity is short bursts of transactions, you're less likely to see such a pronounced improvement.Another significant improvement is its ability to scale vertically (more CPUs and memory within one server). Percona Server will scale consistently to 48 cores allowing all but the biggest applications to completely avoid the necessity to scale horizontally (more servers using replication and/or sharding), reducing application and administration complexity.
While MariaDB has backported some MySQL 6.0 features and improvements, the current release is based on MySQL 5.1, so there are some advancements that MySQL and Percona Server have that are not yet integrated into MariaDB. At the time of writing there is an alpha release of MariaDB 5.5 available which is based on MySQL 5.5 that will address this. There is criticism about the time it takes for MariaDB to rebase on a new version of MySQL, although I wonder if the project would make its own improvements as quickly if more time was spent in merging in changes.
MariaDB has not been idling in the performance department however, the biggest improvements are in the query optimiser. The role of the query optimiser is to try to run the query in the most efficient manner by calculating the most efficient joins and making the best use of indexes to reduce the amount of disk reads and computations needed to return the result set. Have you ever had to rewrite a subquery into a less readable, more complex join, to make it perform better? MariaDB's query optimiser will do a good job of this under the bonnet. This helps in some cases with writing more readable SQL and also protects your application from a certain class of performance-killing queries that could sneak in from an unwary developer. As well as subquery optimisation, join optimisation has also been improved. Added together, an application that uses lots of complex queries should see good improvements from this. If, however, your application is performing lots of simple queries, it is unlikely you will see the benefit of the improved optimiser.
Instead of using InnoDB, MariaDB makes use of Percona's XtraDB storage engine, so receives similar performance benefits to Percona Server in this area.
Drizzle takes a different approach to performance. Rather than concentrating on raw performance, e.g. how fast Drizzle can process a transaction or number of transactions, they're focused on scaling. Drizzle is targeting workloads where applications need to be able to run 1024+ concurrent connections. MySQL's sweet spot was traditionally around 16 threads, although recent versions appear to cope better with higher concurrency and, as usual, this all depends on workload. From an architecture perspective this means systematically removing contention points within Drizzle that have historically prevented MySQL from scaling. This support for high concurrency workloads has sometimes resulted in design decisions that have caused worse performance at lower concurrency so if your application does not require high levels of concurrency you may not see any performance improvement compared to MySQL and in some cases may actually see worse performance. If you need high concurrency, however, the drizzle is an excellent choice for you.
The most satisfying conclusion possible to draw from this would be that one of the alternatives offers the best in everything and I could tell you to run out and install it right away. Life is not that simple, however. If you've skipped ahead to peek at the answer, unfortunately this is one of those open ended questions where there is no single right answer.
Each alternative has its advantages and disadvantages, things it does well and areas where others have overtaken it. Which you choose is entirely dependant on on your application workload.
What is abundantly clear though is that MySQL is still going strong after its changes in ownership and this is providing a strong base, and competition, for some interesting projects to develop. It will be interesting to see how they continue to improve and differentiate themselves. It's also impressive to see the community that's developing around them and the level of passion displayed, as evidenced by the comments made around the recent MariaDB benchmarks.
Your next step
After doing some initial benchmarking and compatibility testing, the best way of seeing how your application works with one of the alternatives is to run your application against it, however you don't want to migrate everything over until you're sure you've settled on the right one. A good way to do this is to set up a slave that you can point a copy of your application at to perform reads against. You can then configure your application to perform an a/b test, monitoring performance between your MySQL slave(s) versus your alternative slave. May the best alternative win!