At Sportradar, we have several products where everything is hosted on our servers, but our customers embed the them into their websites. The result is that we concurrently handle the accumulated traffic of all our customers. On a typical Saturday this is a five-digit number of requests per second. In order to handle all this traffic, and more importantly, making it easy to scale up to meet future traffic demands as we sell more products, we have spent quite a bit of time on researching what kind of service infrastructure works best with as little hardware as possible.
The stack I will describe here is not the same as we are using. It is a simplification. Linux provides a lot of buttons to push and knobs to turn that affect performance. But these settings are typically very tied to the workload and very difficult to generalise. We have achieved an understanding, mostly by trial and error, about what works for us, but the same settings will probably not be useful to anyone else.
This article is only concerned about how requests move from your users to the web servers serving content. It does not deal with how to scale the web application itself. I will also not go into much detail about how to configure each of the services mentioned.
Principles
I have to say I am a big fan of the Unix philosophy of using small, specialised services. It is the primary reason I like to use web servers like nginx, which only handles one single task, and why I think using PHP FPM instead of Apache/Mod_PHP is a good idea. Just like with programming, keeping stuff compartmentalised makes debugging easier, it leads to single failing nodes affecting only single services, and it is a whole lot easier to scale where necessary.
All of the machine in this setup are virtualised using Kernel Virtual Machine (KVM), and managed by Ganeti. The cool thing about using Ganeti is that it supports syncing disks to a secondary hypervisor using Distributed Replicated Block Device (DRBD). If any of these nodes fail, they can just be booted on the secondary hypervisor and pick up where the failing node left off. Note that if your application is very CPU bound, I would not use virtualisation. You lose quite a bit of CPU and I/O performance when virtualising.
The stack
Let me start of by presenting the stack. Then I’ll go through each level and give some more thorough explanations later.
- Gateway
- SSL termination/proxy
- HTTP Accelerator
- Web server/FCGI

Granted, using this many systems require its cost of system administration. But since the nodes individually are so simple, running software upgrades is rather trivial as there are no conflicting dependencies. Using virtual machines also make dist-upgrades trivial. We simply do not ever do it. Instead we fire up a new virtual machine with the newest OS version, configure it, deploy software and do some simple testing, and then just let it be a drop-in replacement of the old node.
The gateway
The purpose of the gateway is to handle routing between Internet and the application-specific subnets. I like using a load balancer like Linux Virtual Server (LVS) for this, because it allows me to scale the layer following horizontally. LVS can basically handle any amount of traffic you throw at it on a single node so there is no need to think about how to add more nodes into this layer. If it really became necessary to do so, and adding more hardware to the existing two nodes would not be possible, DNS round-robin could be a way to achieve a form of load balancing.
Even though I do not find load balancing necessary in this layer, I would still remand redundancy. Not only can nodes fail, but every now and then, I would like to be able to take the load gateway out of production to perform maintenance on it. Redundancy on this level is achieved by using Linux-HA. The simple explanation of what this software suite does is this: If the active node dies, the stand-by node takes over its IP, sends an ARP announcement, and, if configured correctly, resumes the work of the failed node.
SSL termination/proxy
So you may ask “Why do we need dedicated nodes to terminate SSL?”. Firstly it is because both web applications and SSL terminations are typically CPU-bound so you do not want these two parts fighting over resources. Secondly, Varnish, the next service in the stack, does not speak SSL.
This layer need to be scalable horizontally due to the CPU cycles required to terminate SSL. Especially if you allow ciphers using one-time Diffie-Hellman. I always make sure that I have enough nodes on this layer to handle at least a single node failure.
These days I use nginx for this layer, but any kind of light-weight, high-performance web server will do the job. The one thing worth mentioning about using nginx is that it does not (yet) support HTTP 1.1. So no keep-alive connections and no chunked response towards the backend. But since the backend is Varnish, this is not that big of an issue.
HTTP Accellerator
And now for the stack’s super hero: Varnish. It is an HTTP cache server that can handle pretty much any amount of traffic. During my stress testing I have seen Varnish handle thousands of connections on a single CPU core. Therefor I would not worry about scaling this bit horizontally unless you have to cache a huge amount of data.
Another reason for only having a single active node in this layer is that there is a chance for the same page being cached at different times with different contents. If the user continuously hit ‘refresh’ they would end flipping between the two different cached versions making your site look silly.
The redundancy setup is identical to that of the gateway layer.
Web server
In my sketch above, I just added a bunch of Nginx/PHP FPM servers behind the Varnish. This is how the setup would look like in its simplest form, assuming that you do not require cookies, user logins or anything else that require this layer to simulate some form of state.
The important bit is that this layer is easy to scale horizontally. All you need to do is add another server to the director configuration of Varnish. Varnish support several different form of directors, even directors that will help you maintain state. Going into details about this, however, is in itself worthy of an article.
Some final remarks
This setup is a bit simpler than I would put into production, but it contains the essential details. All of the services mentioned are quite trivial to configure and there should be lots of resources online about each of them.
Avoid getting individuals lost in Scrum
A recent post by Marianne sparked a bit of an interesting discussion among my peers about how agile development processes has a bias against certain types of people.
Generalists not welcome
Marianne’s claim is that agile development put a higher value on generalists, while specialists are unwelcome:
(..) Instead of having one team of architects, one team of usability guys etc, we try to compose the teams with a mix of skills. Rotating tasks among team members is also recommended. A natural consequence of this is that each team member has to perform a broad variety of tasks. These cross-functional teams therefore work best when everyone knows a bit about everything. Scrum promotes the generalist and degrades the specialist––unless they are specialized in several fields.
My opinion is that if you choose to specialise in a single, exclusive field, you will not be able to bring much to a team, I think. The main problem would be the specialists inherent inability to obtain a holistic view of the application. A developer who is strong in one aspect of the application, but is unable to, or worse, unwilling to learn about the other aspects of the application, most notably sub-optimal interfaces towards the rest of the code.
Processes and individuals
While I think developers should be somewhat familiar with all aspects of the code, developers should certainly be allowed to, and indeed encouraged to, specialise in a few aspects. Due to human nature, these experts will feel ownership to the parts of the code they choose to specialise in. But I do not think this is a problem! It is only a problem if other developers feel that they are not allow to venture into their code. Code owned by a small set of developers who have no interest in sharing their code with the rest of the team will end up as unreadable and unmaintainable.
The typical developer I work with has some kind of bias towards frontend development (i.e Javascript/CSS) or backend development (PHP/SQL). They do not mind working on either part, but they prefer only one of them. And the assignments of tasks are typically weighted thereafter; those who like backend development get more backend tasks than those who prefer frontend development. But everyone have to dive into the other part now and again.
With this arrangement it is natural that those with less understanding of, say, frontend work require help from those who are experts. XP would say the solution to this is pair programming, and those who like following things by the book would say you absolutely have to do pair programming. You are not doing it right if you do not enforce pair programming. Here I strongly agree with Marianne. Mandatory pair programming is a plain stupid idea and is a clear example of requiring a team to adapt to processes instead of adapting processes to the team. If the team wants to do constant pair programming, then by all means do so, but if the team do not want to, then please drop this idea. I find that sessions of pair programming happen spontaneously when needed anyway (and without developers realising they are actually pair programming). All you need is a culture that encourage helping each other when needed.
The same goes for having meetings just for the sake of having meetings (because Scrum said so). In larger teams it makes sense to formalise interaction more through meetings, but for smaller teams I find that developers interact enough to know what other people are working on. Developers are required to keep an eye on the commit log too, to get an idea of which part of the system is being changed.
When it comes to stakeholders such as product owners and those from the business side of things outside of the team, or even architects like me who work daily in multiple teams, I find that it works very well to let architects and team leaders interact with product owners regularly, and keep the scope of interaction between developers and product owners to that of single tasks.
Individuals and team
Unlike Marianne, I do not think that not to “interrupt a developer (or anyone else for that matter) who is clearly trying to concentrate” is a good solution. When some developers feel they are interrupted too frequently we arrange for time-slotting the day so that we have periods where interruption is allowed and periods where people can focus on their own tasks. Yes, this blocks some developers, but in return it enables others, and I find that people are more happy with such arrangements and it keeps the interaction up on an acceptable level.
I feel that the well-being of the team is more important than that of each individual in isolation. This mean everyone need to compromise. Those who feel the need to talk and poke need to respect that other team members like to be left alone. And those who want to sit for themselves and code all day need to understand that they have to help someone else code without grabbing for their keyboard and do the job for them.
Fin
To conclude, I find it imperative that the team is allowed to set their own rules. People outside of the team (architects, managers, etc) should only interfere when they see that the team is heading into a problematic direction such as endless discussions and low productivity (too much interaction), or code getting partitioned into “their code and our code” (too little interaction).
PHP 5.4 RCs just entered ~arch
Hi all. Just a quick update on the process of adding PHP 5.4 support to Gentoo.
PHP 5.4 has just entered the release candidate-phase upstream. Since I have done quite a bit of changes to the php 5.4 ebuilds compared to the php 5.3 ebuilds, I want to have these changes thoroughly tested so that we do not need more than the typical 30 days of testing after PHP 5.4 goes final.
Since not all extensions compile with the new ABI, I also need to manually revbump every extension that have adapted their code to work with the new ABI. This process is somewhat time-consuming, so if you have any extensions you really want to test together with PHP, let me know and I will do those first. Among the extensions known NOT to work are APC, memcached and xdebug.
Testing PHP 5.4 alphas on Gentoo
To ensure a timely release
As you may remember, Gentoo was very late to the PHP 5.3 party. Our stable version was released even after Ubuntu released their stable version. That is not how it should be. One of the main reason was that the PHP ebuilds were stuck in a quagmire and it was very difficult to get any upgrades done. However, just before leaving Gentoo, while being swamped with work, Hoffie amazingly rewrote most of the PHP ebuild’s logic, making it a whole lot more maintainable. This is what has enabled me to release updates for PHP this quickly.
Testers wanted
Now, the PHP developers just embarked on their PHP 5.4 release cycle, starting out with alpha releases. To ensure that PHP 5.4 will be released quickly after upstream release their version, I have written ebuilds for these alpha versions. If you are interested in testing all the cool stuff that PHP 5.4 will bring, simply unmask and emerge dev-lang/php:5.4. You can then switch your preferred SAPIs to 5.4 using the eselect PHP module. Because of the minor version slotting, there is no need to remove PHP 5.3 in order to test PHP 5.4. Both versions should work fine in parallel.
Do note that Gentoo offers no support on this except for the ebuilds themselves. Alpha versions should never be used in production. If you see anything in these ebuilds that should be done differently, or if I am missing support for something, do not hestitate to open a bug about it.
Extensions for PHP 5.4
Quite a lot of extensions do not work with PHP 5.4 due to ABI breakage, and that the ABI is still unstable. Because of this, none of the extension ebuilds will support PHP 5.4 yet. Once the PHP developers announce that the PHP 5.4 ABI is stable, I will start migrating ebuilds to support the new ABI.
Ebuild cleanup
The current Gentoo PHP ebuilds have defined a lot of USE flags that have been marked as experimental. All of those have now been removed, making it a lot less confusing to be a PHP user on Gentoo.
Also, I have removed a few very old patches from the Gentoo patchset that enables some undocumented features I am sure no one even knows about, such as defining default charset for a MySQL connection in php.ini. Those who those features are free to take the patch and apply it using the epatch_user feature.
The reason for doing these changes is that I want to bring PHP on Gentoo as close to upstream as possible. The only patches that I find useful for the Gentoo PHP distribution are patches that change default settings to reflect the defaults of the Gentoo system itself, such as pid files always go in /var/run.
dev-lang/php:5
As you may know, PHP 5.2 is no longer supported by upstream. The reason we still have ebuilds for it in Gentoo is that certain packages still depended on the old major version slotted ebuilds, where PHP 5.2 is the only stable version on many architectures.
Currently, there is only one known such package left, and it now has a new revbump pending stabilisation. Once that package version has been stabilised, the older versions will be removed and so will all of dev-lang/php:5. To minimise the amount of pain this may cause you, please start migrating to minor version slotted PHP now.
dev-lang/php:5.2
I have not yet decided when dev-lang/php:5.2 will be removed, but my plan so far is something like this: Once dev-lang/php:5 is gone, dev-lang/php:5.2 will be hard masked. This makes sure that every single PHP 5.2 user get the message, as well as the necessary time to migrate the code. Once the cries of outrage have settled, for there will certainly be cries of outrage, the entire slot will be removed from the tree. As I said, I have not yet decided when that will happen, but I expect it to happen sometime around the end of the year.
PHP.next on Gentoo
As you may know, Hannes Magnusson made PHP.next snapshots available for Ubuntu. This got me curious about how difficult it would be to create snapshot ebuilds for PHP.next. Turns out it wasn’t all that difficult.
The only issue I encountered was how to deal with extensions. As the snapshot ebuilds would be masked, the php5-4 PHP_TARGETS USE flag would also have to be masked. In order to unmask the USE flag mask, add -php_targets_php5-4 to /etc/portage/profile/use.mask.
Anyways, to get PHP.next working, do the following steps:
- Unmask the latest PHP snapshot
- Unmask the USE flag as mentioned above
- Add
php5-4toPHP_TARGETSin/etc/make.conf - Update world
- Use
eselect phpand choose php5.4 as target for cli.
After following these steps you should be able to see something like this:
php --version PHP 5.3.99-pl0-gentoo (cli) (built: May 7 2011 09:24:56) Copyright (c) 1997-2011 The PHP Group Zend Engine v2.4.0, Copyright (c) 1998-2011 Zend Technologies
As for support, if you think there is something wrong with the ebuild, feel free to file a bug on b.g.o. If you think something is wrong with PHP, file a bug upstream. If you want me to add another snapshot to portage because something cool happened in PHP’s subversion, just drop me an email and I see what I can do.
The last year or so I have been intrigued by a form of emergent design where the design emerge from keeping the code base managably small at all cost, even if it means ripping apart packages/namespaces, software layers and domain concepts. This in contrast to the default agile practice of test-driven development and refactoring in small steps.
First the concept
In order to develop in this fashion it is imperative that one follows a few key development processes:
- Only develop for today
- Always have a finished product
- Your code will be gone tomorrow
This has certain implications that you also must take into consideration:
- Defer unit testing
- Discuss constantly
- Know when to stop
Only develop for today
This is pretty much what emergent design means. You have a product that you want to create, you start of with a couple of features, and you design the architecture and implement code to satisfy only these few features. Forget about anything else. By the time they are done you have learnt something new about the project or something about the feature has changed. In my experience one of these things happen all the time. When you pick features, always pick the most difficult ones and the ones you understand the least. Those are the high-risk features that will affect your design the most.
Always have a finished product
At the latest, have a finished project at the end of every feature implementation. This is pretty much what is known as continuous integration. There are several reasons why you want to do this. The most important one in my opinion is that you can prove progress. It is also important as it allows you to get feedback and confirm that you are moving in the right direction.
Your code will be gone tomorrow
If a new feature comes along and you feel that something in the existing code feels off, does not quite fit with this new feature start from scratch with that particular part of the code. Really. The alternative is to keep code that does not perfectly fit with what you are creating. If you do clear away this code your code base will grow unnecessary large. It is imperative that after you finish implementing a feature and start reflecting over the current code base, you and your fellow developers need to feel that if you were to develop this product from scratch knowing what you now know, the code would look exactly like this.
And herein lies the challenge …
The implications
Defer unit testing
This is the part where the agile zealots take out their pitch forks and prepares the bonfire. Test-driven development will only hold you back! When you write tests, you actually commit to an interface and start developing contracts between the different units of your code. Your mother may have told you that this is the safe way to code, but you are an adult now and should know how to navigate without such safty nets. I am not saying you should not test, or even write automated tests, but they should not be on the unit level. And you should certainly never test first.
Rewrite, not refactor
The idea is that when the code base is really small, small enough for all developers in a team to have an overview of everything in the code base, the code becomes more manageable and major rewrites possible. Remember that it is major rewrites you want. Refactoring is not sufficient because it implies that you maintain the interface when what you want to do is to redo the entire layer of architecture.
Discuss constantly
This is not a style of coding that you can do alone. You must develop in a team. Also, pair programming is not enough. When any abstraction is to be made or any part of the code is to be reviewed all technical team members must be involved. Everyone in the team need to know what is going on and how every part work. This mean that there will be a lot of talking, but then, the more you talk about code, the better you get to know the code.
Know when to stop
This is the most critical part. This method for development only takes you so far. At some point you have to say “Well, I think we reached the point where we cannot go further without increasing the complexity beyond where everyone can know everything”. Then you need to start adding unit tests and move to the more baby-step, traditional agile style of programming. However, I would still recommend you stay away from this idea about test first. You still want to write larger chunks of code in order to obtain a feel of how you want everything to look like in the end.
Am I crazy?
Probably. But I have actually experienced something close to this. In this project, we managed to go on for a few months without any unit testing at all, and despite this, we did not introduce many bugs. We also found that when we did want to start adding unit tests, this was fairly easy and only a few small set modifications were necessary in order to get most of the code under testing. Also, when we reflected over the code we acknowledged that knowing what we did then, with all the obligatory changes in the specifications, customers going in circles over what they want and features implemented and removed, we could not have made a better architecture. If we were to start from scratch again, the code would end up looking exactly like it did now. And this felt really great to us. Usually after months of development I look back and think “I will certainly never do that again”, but this time I really did not.
… So maybe I am on to something?
These last few weeks I have been working a lot with quality assurance and performance optimisation at work. Most of the PHP QA tools are fairly new and thus did not have any ebuilds in portage. So over the past few days I have been adding support for all the QA tools that I am using. Thanks to Denny Reeh I have also managed to bump most of the PHPUnit stack.
The QA tools I added are the following:
dev-php/phpmd– PHP mess detectordev-php/phpcpd– Copy/Paste detectordev-php/phpdepend– Static code analyser for PHPdev-php/phploc– A tool for quickly measuring the size of a PHP projectdev-php/php-codebrowser– Generates a highlighted code browsing parsed from xml reports generated from codesniffer or phpunit.dev-php/xhprof– A Hierarchical Profiler for PHP
If there is any QA tools for PHP out there not yet available in portage, let me know and I will see if they can be added.
Also, let me know if you are interested in knowing about how I make use of these tools and I will see if I can find the time to write a little bit about it.
Catching fatal errors in PHP
Introduction
I often hear fellow developers voicing concerns about running PHP in a production environment without the any possibility for catching fatal errors such E_ERROR, E_PARSE and E_CORE_ERROR. In dynamic languages like PHP those kind of errors happen all the time, for example when trying to call a method on a variable you assumed was an instance of a specific class, but which for some reason suddenly was not instantiated. Not only are they often not catched, but often it is also difficult to even know that they are occurring.
You probably already know of set_error_handler(), which allows you to catch the basic errors and warnings. For uncatched exceptions, you also have set_exception_handler(). However, none of these features enable you to log fatal errors.
A solution – register_shutdown_function()
The solution to the problem is register_shutdown_function(). This function lets you register a callback function that will be run when PHP is shutting down, even when it is forced to shut down due to entering an unstable state.
Be aware that you will not be able to resume running a script after the shutdown function is finished, but in this function you will be able to do some last-minute clean-up of open resources or, as I will shortly explain, allow you to send notification emails about the script crashing.
How it works
I never make much use of frameworks. Zend Framework, however, do have some nifty components that I like to use whenever I can. One of these is Zend_Log. For the basic errors, Zend_Log already provides an error handler, which may be initiated by running something like this:
$logger = Zend_Log::factory(
/*
enter your Zend_Log configuration here
*/
);
$mail = new Zend_Mail();
/*
Do some Zend_Mail setup
...
*/
$writer = new Zend_Log_Writer_Mail($mail);
/*
Do some writer setup
...
*/
$logger->addWriter($writer);
$logger->registerErrorHandler();
As you can see, I am making use of Zend_Mail to notify me of any bad stuff.
Unfortunately, it does not provide something equally simple for exceptions. But Zend_Log do have some nifty support for logging exceptions. So lets add a quick exception handler that makes use of Zend_Log:
set_exception_handler(function ($e) use ($logger)
{
$logger->ERR($e);
});
Nothing exciting here and you probably already know how to do this. Most likely you were doing something like it already.
Now for the interesting part: The errors that make PHP crash. Here is the code:
register_shutdown_function(function () use ($logger)
{
if ($e = error_get_last()) {
$logger->ERR($e['message'] . " in " . $e['file'] . ' line ' . $e['line']);
$logger->__destruct();
}
});
This function now gets called just before PHP exits. So what happens here?
Remember that with register_shutdown_function() we register a callback that is called when the PHP engine is shutting down. Therefore it is called also when PHP is exiting normally. So the first thing we do, is to check if an error occurred while executing the script. If an error occurred, we log it. Notice how we can make use of even variables that originates from outside of the scope of the callback function.
The main difference between the shutdown callback and the exception handler callback, besides explicitly checking for errors, is that if you use something like Zend_Mail for logging you have to explicitly destruct the logger. If you do not do this your log email will not be sent.
Conclusion
If you, like some of my fellow developers, did not know about these tricks, you know should be able to implement something like the example code above. Hopefully this article will lead to someone sleeping better at nights.
If you want to look at a library that handles a lot of these stuff for you automatically, check out the PHP error handler library by Ferenc Kovacs. This source repository also contains some examples and demos you can use to test how such error handling works.
