More on PHP performance

After writing the post criticizing Google’s “performance advice” for PHP beginners, I started thinking – OK, I don’t like Google’s advice, what would I propose instead?

So here are my thoughts about what would be good for the beginner to consider when he starts with PHP performance optimizations. Note that I do not say it’s the only thing you should do – there are a bunch of articles, talks, blogs, etc. about PHP performance and many of them contain very good advice and go into much more details than I intend to go into. But I think the items below are ones that you should ensure you are doing to the full extent before you go to look around for performance tricks.

Also, from the start I want to say that I work for Zend Technologies and I participated in development of many Zend solutions, both free and commercial. I am going to mention both kinds in this article, where relevant. I am aware that there are alternative solutions, but I will mention the ones I know the best. So please do not take this as commercial advertisement or any claim on relative merits of other solutions – it is not the intent. The intent is to give general direction and some examples, if somebody prefers other solutions in the same direction – that’s fine.

Bytecode cache
If you care about performance and don’t use bytecode cache then you don’t really care about performance. Please get one and start using it. If you want ready-made commercially-supported solution with nice GUI, etc., look at Zend Server, if you’re more into compile-it-yourself command-line then you may want to look at APC, or other alternatives.

Profile you code before you start optimizing it! Otherwise it would be like travelling around a foreign city with signs written in an unreadable language witout any map or GPS. You’ll probably get somewhere, but you wouldn’t have any idea where you are, where you should go and how far are you from the place you need to be. Profiling would allow you to know which parts of code are worth investing into and which aren’t. You can use Zend Studio/Debugger or Xdebug for that.

Most PHP installations run in “shared nothing” mode where as soon as the request processing ends, all the data associated with the request is gone. It has some advantages, but also one big disadvantage – you can not preserve results of repeated operations. That is, unless you use caching.
You should look into caching all operations which take considerable time and can return the same result for a prolonged period of time or same data set. That may include configurations, database queries, service requests, complex calculations, full pages or page fragments, etc., etc. Caching expensive operations is one of the most powerful performance improvements you can do.
There are numerous low-level caching solutions – memcached, APC, Zend Server (you can find a good guide to it on DevZone) and others. On top of it, you may look into Zend Framework’s caching infrastructure – which support the backends described above and more and makes caching much easier.

Optimize your data
Usually the most expensive places of the PHP application are where it accesses external data – namely, database or filesystem or network. Look hard into optimizing that – reduce number of queries, improve database structure, reduce filesystem accesses, try to bundle data to make one service call instead of several, etc. For more advanced in-depth look, use tools like strace (Unix) and Process Explorer (Windows) to look into system calls your script produces and think about ways to eliminate some of  them. You would not be able to eliminate all of them but each of them is a worthy target.

Don’t try to outsmart the engine
There are a lot of “tips” floating around about which constructs in PHP are faster or slower than others. I think you can safely ignore all of these tips, especially if you’re a beginner. Odd are, 9 cases out of 10 they won’t give you any improvement at all, and in the remaining one case it will be either not applicable in your code or not worth the time spent on it. Yes, there are ways to save couple of opcodes and remove couple of lookups here and there – but unless you’ve already done with all of the previous steps it is not worth it. And some of the advice out there will actually make you code slower, less robust and less secure without you even noticing. So I think for the beginners is better to stay away from trying to outsmart the engine altogether.

Benchmark in real life

Many of the advices I mentioned above have benchmarks as a proof. The problem is these benchmarks always test only a short piece of code. However, you would not be running that one-liner – you would be running the whole big application. This reminds me of a joke about a physicist that developed the model of a spherical horse in vacuum in order to use it to win bets on horse racing. If you want better chances to win than that physicist, test in real environment, not in vacuum. If you have an idea for some improvement, verify that this improvement actually improves your application, not just an artificial benchmark. If this is impossible, use profile results to estimate potential benefit – if you find a way to optimize function that summarily runs for 0.1% of overall execution time, you probably won’t do any good to the application as a whole.

Leverage the extensions
That seems too obvious, but I have seen a lot of code that duplicates functions available in some PHP extension. There are a lot of functions in PHP and if you do something that others may have done before, check in the manual. You have DOM/SimpleXML extensions for XML, JSON extension for JSON, SOAP extension for doing SOAP, etc., etc. Do not create custom serialization/deserialization if serialize()/deserialize() would work for you.
If you have some very performance-sensitive bit of script and you can do C programming (beginner in PHP doesn’t mean beginner in everything :), consider even making your own extension, it’s not that hard.

Avoid extra notices/errors/etc.
Even suppressed errors have cost in PHP, so try and write your code so it would not produce notices, strict notices, warnings, etc. You may want to enable logging of all errors to examine that. Never enable displaying errors in production though – it will only lead to a major public embarrassment.

Use php.ini-production as a start
If you need a set of php.ini settings which would not hurt your performance and not break anything, look into php.ini-production in PHP source. You may need to change a couple of details (e.g. include path) but it’s a good starting point.

Use big realpath cache
Realpath cache is very useful for the engine when it tries to find the unique full name of the file from just filename or relative path. By default, it’s 16K but if you have a lot of files with long pathes, it’s better to increase the size – it would save the expensive disk accesses.

There are probably more things that could be said, but this post is pretty long already, so I will end it here and you are welcome to add your opinion in comments.

displaying errors

PHP has a setting named display_errors that allows one to specify if various error messages should be sent to the output or not. It is recommended to keep it off, especially for the public sites, since it may reveal too much information about the application, and looks awful when seen on a public site.

However, for a developer an error report shown in time and place may prove quite valuable and usually is easier to work with then logs, etc. Of course that would mean – keep errors on in development, off in production. OK, then what we do if something weird happens in production and we want to see the errors, but we don’t want others to see them?

ASP has an interesting feature here – it allows you to display detailed error page only when accessed from local browser, but display something generic when accessed from “outside”. Maybe PHP could have some setting like display_errors=local which would enable display_errors for requests originating from developer machine but would disable it when outsider accesses it? Of course, this should be carefully done to prevent security problems, but I have a feeling it might be useful.

This can be done with an extension or even user-defined prepend script, but I think system-level mechanism might help people to use it correctly and avoid embarrassing themselves with publicly-displayed errors while keeping the stuff easy to spot for developers. Would that be useful?

Graceful recovery

Right now some situations (parse errors, undefined function call, no more memory) in PHP result in fatal error – which means the engine can not continue with the request beyond this point. From the user point of view, this often results in a blank page. I wonder if it would be possible to have standard recovery mechanism that would allow the PHP engine on fatal error to enter some kind of “recovery mode” when it would output some very basic page saying “ok, I have some problems here, but don’t panic and just tell my programmer to fix the code”. It won’t give much info probably but it would allow production sites display nice message to the users instead of the boring snowfield panorama it displays now (that is if the administrator was smart enough to set display_errors to off).

Maybe it should allow only fixed HTML, or maybe some kind of “request recovery” mode which would create some “recovery mode” sub-request when it would allow to do more – like send emails to webmaster :). This may need some creative thinking but the main idea is to move away from the snowfield thing.