Zend Server PHP sources

I was asked about PHP going with Zend Server, specifically from which sources it is built – as we don’t ship source packages for the builds. Since Zend Server includes PHP build that can have some patches applied from SVN past the release (i.e. if the package has version 5.2.10 it might have some patches that were in SVN 5.2 branch past 5.2.10 tag) – I think it is important that people know what they are going to run if the use Zend Server.

So, there is some solution for that – we do have packages with PHP sources called php-5.2-source-zend-server and php-5.3-source-zend-server. If you are using RPM/DEB setup for Zend Server, you can just install these packages by name and get it in /usr/local/zend/share/php-source/ and if you install it any other way (or just curious?)  I guess you could take packages directly from RPM or DEB repo and install them manually :)  Those are exactly the sources from which the binaries are built.

What we don’t have yet is a changelog that describes in detail which patches are going into that release additionally to pure X.Y.Z tag. I’ve asked Powers That Be to add it, so hopefully it’d happen soon too.

P.S. If you need help on how to add custom extension to ZS, we have DevZone article about it.

syntax I miss in PHP

Here are some syntax additions I’d like to see in PHP:

1. a()()
When a() returns a callable object (such as a closure) the second set of brackets would call it. That would allow to write some neat code. I am working on a patch for that but it’s a bit harder than I thought.

2. a()[$x]
This is kind of obvious, I wonder why don’t we have it? Should also be able to be an lvalue, though unless a() returns by-ref or return value is an object it may not do what one wanted.

3. foo(1,2,,4)
Syntax to skip a parameter in a call, which then will be substituted with the default as defined by the function. Would come handy if you have function with many defaults, and you only want to change the last one but leave the rest alone – now you don’t have to look up the actual default’s values.

4. foo(“a” => 1, “c” => 2, “d” => 4)
Named parameters call, which allows you to specify parameters in arbitrary order by name. Would allow to build nice APIs which could accept wider ranges of parameters without resorting to using arrays. That’d also imply call_user_func_array() would accept named arguments.
The problem here might be what to do with unknown arguments – i.e. ones that the function did not expect. I guess the function could just ignore them.

5. $a = ["a", "b" => "c"];
I’d really like to have short array syntax. Yes, I know it was rejected so many times already, but I still like it.

More on PHP performance

After writing the post criticizing Google’s “performance advice” for PHP beginners, I started thinking – OK, I don’t like Google’s advice, what would I propose instead?

So here are my thoughts about what would be good for the beginner to consider when he starts with PHP performance optimizations. Note that I do not say it’s the only thing you should do – there are a bunch of articles, talks, blogs, etc. about PHP performance and many of them contain very good advice and go into much more details than I intend to go into. But I think the items below are ones that you should ensure you are doing to the full extent before you go to look around for performance tricks.

Also, from the start I want to say that I work for Zend Technologies and I participated in development of many Zend solutions, both free and commercial. I am going to mention both kinds in this article, where relevant. I am aware that there are alternative solutions, but I will mention the ones I know the best. So please do not take this as commercial advertisement or any claim on relative merits of other solutions – it is not the intent. The intent is to give general direction and some examples, if somebody prefers other solutions in the same direction – that’s fine.

Bytecode cache
If you care about performance and don’t use bytecode cache then you don’t really care about performance. Please get one and start using it. If you want ready-made commercially-supported solution with nice GUI, etc., look at Zend Server, if you’re more into compile-it-yourself command-line then you may want to look at APC, or other alternatives.

Profiling
Profile you code before you start optimizing it! Otherwise it would be like travelling around a foreign city with signs written in an unreadable language witout any map or GPS. You’ll probably get somewhere, but you wouldn’t have any idea where you are, where you should go and how far are you from the place you need to be. Profiling would allow you to know which parts of code are worth investing into and which aren’t. You can use Zend Studio/Debugger or Xdebug for that.

Caching
Most PHP installations run in “shared nothing” mode where as soon as the request processing ends, all the data associated with the request is gone. It has some advantages, but also one big disadvantage – you can not preserve results of repeated operations. That is, unless you use caching.
You should look into caching all operations which take considerable time and can return the same result for a prolonged period of time or same data set. That may include configurations, database queries, service requests, complex calculations, full pages or page fragments, etc., etc. Caching expensive operations is one of the most powerful performance improvements you can do.
There are numerous low-level caching solutions – memcached, APC, Zend Server (you can find a good guide to it on DevZone) and others. On top of it, you may look into Zend Framework’s caching infrastructure – which support the backends described above and more and makes caching much easier.

Optimize your data
Usually the most expensive places of the PHP application are where it accesses external data – namely, database or filesystem or network. Look hard into optimizing that – reduce number of queries, improve database structure, reduce filesystem accesses, try to bundle data to make one service call instead of several, etc. For more advanced in-depth look, use tools like strace (Unix) and Process Explorer (Windows) to look into system calls your script produces and think about ways to eliminate some of  them. You would not be able to eliminate all of them but each of them is a worthy target.

Don’t try to outsmart the engine
There are a lot of “tips” floating around about which constructs in PHP are faster or slower than others. I think you can safely ignore all of these tips, especially if you’re a beginner. Odd are, 9 cases out of 10 they won’t give you any improvement at all, and in the remaining one case it will be either not applicable in your code or not worth the time spent on it. Yes, there are ways to save couple of opcodes and remove couple of lookups here and there – but unless you’ve already done with all of the previous steps it is not worth it. And some of the advice out there will actually make you code slower, less robust and less secure without you even noticing. So I think for the beginners is better to stay away from trying to outsmart the engine altogether.

Benchmark in real life

Many of the advices I mentioned above have benchmarks as a proof. The problem is these benchmarks always test only a short piece of code. However, you would not be running that one-liner – you would be running the whole big application. This reminds me of a joke about a physicist that developed the model of a spherical horse in vacuum in order to use it to win bets on horse racing. If you want better chances to win than that physicist, test in real environment, not in vacuum. If you have an idea for some improvement, verify that this improvement actually improves your application, not just an artificial benchmark. If this is impossible, use profile results to estimate potential benefit – if you find a way to optimize function that summarily runs for 0.1% of overall execution time, you probably won’t do any good to the application as a whole.

Leverage the extensions
That seems too obvious, but I have seen a lot of code that duplicates functions available in some PHP extension. There are a lot of functions in PHP and if you do something that others may have done before, check in the manual. You have DOM/SimpleXML extensions for XML, JSON extension for JSON, SOAP extension for doing SOAP, etc., etc. Do not create custom serialization/deserialization if serialize()/deserialize() would work for you.
If you have some very performance-sensitive bit of script and you can do C programming (beginner in PHP doesn’t mean beginner in everything :), consider even making your own extension, it’s not that hard.

Avoid extra notices/errors/etc.
Even suppressed errors have cost in PHP, so try and write your code so it would not produce notices, strict notices, warnings, etc. You may want to enable logging of all errors to examine that. Never enable displaying errors in production though – it will only lead to a major public embarrassment.

Use php.ini-production as a start
If you need a set of php.ini settings which would not hurt your performance and not break anything, look into php.ini-production in PHP source. You may need to change a couple of details (e.g. include path) but it’s a good starting point.

Use big realpath cache
Realpath cache is very useful for the engine when it tries to find the unique full name of the file from just filename or relative path. By default, it’s 16K but if you have a lot of files with long pathes, it’s better to increase the size – it would save the expensive disk accesses.

There are probably more things that could be said, but this post is pretty long already, so I will end it here and you are welcome to add your opinion in comments.

5.3!!!

After a long string of delays, PHP 5.3 is finally out.  On the course of last 2 years, I was pretty sure a number of times that it will happen next month the latest, but there always were good reasons to postpone it. Now finally it’s officially out. I think it’s a huge step for PHP. Download it and try it!

Some major new features in 5.3:

  1. Namespaces! They didn’t end up exactly as I thought they would but they are a major feature PHP was missing for a long time, and I’m very curious to see how it works out in big projects.
  2. Closures and anonymous functions! PHP now has first-class functions, and you can do all kinds of crazy stuff with it. Or just make your code easier to read and maintain :)
  3. Garbage collection. PHP engine, being refcount-based, always has had a slight problem with reference loops. Even though usually it was not a big issue since at the end of the request everything is cleaned up, for long-running PHP applications not based on short request pattern it became a problem. Not anymore – now the engine knows to clean up such loops.
  4. Late static binding – it’s somewhat exotic thing for people that never encountered it, but was very burning issue for people that did need it. Basically, when class Foo extends class Bar, and the method func() defined in Foo is called as Bar::func(), there was no way to distinguish it from Foo::func(). Now there is. This allows to implement all kinds of cool patterns like ActiveRecord.
  5. Intl extension in core – lots of functions to allow you to internationalize your application.
  6. Phar in core – now you can pack all the application in one neat file and still be able to run it!

Also in 5.3:

  1. Nowdocs – same as heredocs, but doesn’t parse variables. Excellent feature for somebody that wants to include bing chunk of text into the script which can happen to have $’s etc. in it.
  2. ?: shortcut. That’s simple – $a?:$b is $a if $a is true, otherwise it’s $b.
  3. goto. Yes, I know. But now we have it too. Deal with it. :)
  4. mysqlnd – native PHP-specific mysql driver.

Last but definitely not least – tons of performance improvements, bug fixes, etc. Download it today! :)

PHP performance tips from Google

I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of them are quite sound, if not new – like use latest versions if possible, profile your code, cache whatever can be cached, etc. Some are of doubtful value – like the output buffering one, which could be useful in some situations but do nothing or be worse in others, and if you’re a beginner generally it’s better for you to leave it alone until you’ve solved the real performance problems.

However some of the advices make no sense at best and are potentially harmful at worst. Let’s get to it:

First one: Don’t copy variables for no reason. I don’t know what the author intended to describe there, but PHP engine is refcounting copy-on-write, and there’s absolutely no copying going on when assigning variables as they described it:

$description = strip_tags($_POST['description']);
echo $description;

I don’t know where it comes from but it’s just not so, unless maybe in some prehistoric version of PHP. Which means unless you’re going back to 1997 in a time machine this advice is no good for you.

Next one: Avoid doing SQL queries within a loop. This actually might make sense in some situations, however the code examples they give there is missing one important detail that makes it potentially harmful for beginners (see if you can spot it):

$userData = [];
foreach ($userList as $user) {
$userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
}
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);

Please repeat after me – DO NOT INSERT USER DATA INTO SQL WITHOUT SANITIZING IT!
Of course, I can not know that $user was not sanitized. Maybe the intent was that it was. But if you give such example and target beginners, you should say so explicitly, every time! People tend to copy/paste examples, and then you get SQL injection in a government site.

Another thing: most of real-life PHP applications usually do not insert data in bulk, except for some very special scenarios (bulk data imports, etc.) – so actually in most cases one would be better off using PDO and prepared statements. Or some higher-level frameworks which will do it for you. But if you roll your own SQL – sanitize the data! This is much more important than any performance tricks.

Next one: Use single-quotes for long strings. PHP code is parsed and compiled, and any possible difference in speed between parsing “” and ” is really negligible unless you operate with hundreds of megabyte-size strings embedded in your code. If you do so, your quotes probably aren’t where you should start optimizing. And of course, using caching (see below) eliminates this difference altogether.

Next one: Use switch/case instead of if/else. This makes no sense since switch does essentially the same things as if’s do. See for yourself, here is the “if” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       2     T(2) = IS_EQUAL(A(1), C("add"))
3       2     JMPZ(T(2), 7)
4       3     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       3     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       4     JMP(16)
7       4     A(4) = FETCH_R(C("_POST")) [global]
8       4     A(5) = FETCH_DIM_R(A(4), C("action")) [Standard]
9       4     T(6) = IS_EQUAL(A(5), C("delete"))
10      4     JMPZ(T(6), 14)
11      5     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
12      5     Au(7) = DO_FCALL_BY_NAME() [0 arguments]
13      6     JMP(16)
14      7     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
15      7     Au(8) = DO_FCALL_BY_NAME() [0 arguments]
16      9     RETURN(C(1))
17      9     HANDLE_EXCEPTION()

Here is the “switch” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       3     T(2) = CASE(A(1), C("add"))
3       3     JMPZ(T(2), 8 )
4       4     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       4     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       5     BRK(0, C(1))
7       6     JMP(10)
8       6     T(2) = CASE(A(1), C("delete"))
9       6     JMPZ(T(2), 14)
10      7     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
11      7     Au(4) = DO_FCALL_BY_NAME() [0 arguments]
12      8     BRK(0, C(1))
13      9     JMP(15)
14      9     JMP(19)
15     10     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
16     10     Au(5) = DO_FCALL_BY_NAME() [0 arguments]
17     11     BRK(0, C(1))
18     12     JMP(20)
19     12     JMP(15)
20     12     SWITCH_FREE(A(1))
21     13     RETURN(C(1))
22     13     HANDLE_EXCEPTION()
No.     CONT    BRK     Parent
0         20          20           -1

You can see there’s a little difference – the latter has CASE/BRK opcodes, which act more or less like IS_EQUAL and JMP, but their plumbing is a bit different, but in general, code is the same (you could even argue “switch” code is a bit less optimal, but that is really the area you shouldn’t be concerned with before you can read and understand the code in zend_vm_def.h – which is not exactly a beginner stuff.

Another thing that the author absolutely failed to mention and which should be one of the very first things anybody who cares about performance should do – is to use a bytecode cache. There are plenty of free ones (shameless plug: Zend Server CE includes one of them – all the performance improvements for $0 :) and you don’t have to change a bit of code to run it.

Now, I understand Google is not a PHP shop like Yahoo or Facebook or many others. But this article is signed “Eric Higgins, Google Webmaster” and one would expect something much more sound from such source. And in fact there are a lot of blogs and conference talks on the topic and lots of community folks around that I am sure would be ready to help with such article – I wonder why wasn’t it done? Why apparently the best advice we can find from Google is either trivial or useless or wrong?

I think they can do much better, and they should if they take “making the web faster” seriously.

P.S. After having all this written, I also found a comment from Gwynne Raskind, which I advise to read too.

Y-Combinator in PHP

Since PHP 5.3 now has closures, all things that other languages with closures do should also be possible. One of them is having recursive closures. I.e. something like this:

$factorial = function($n) {
   if ($n <= 1)
     return 1;
   else
     return $n call_user_func(__FUNCTION__$n 1);
};

which does not work. One of the ways to do it is to use Y combinator function, which allows, by application of dark magic and friendly spirits from other dimensions, to convert non-recursive code to recursive code. In PHP, Y combinator function would look like this:

function Y($F) {
    $func =  function ($f) { return $f($f); };
    return $func(function ($f) use($F) {
            return $F(function ($x) use($f) {
            $ff $f($f);
            return $ff($x);
        });
    });
}

And then the factorial function would be:

$factorial Y(function($fact) {
    return function($n) use($fact) {
        return ($n <= 1)?1:$n*$fact($n-1);
    };
});

Which does work:

var_dump($factorial(6)); ==> int(720)

Of course, we could also cheat and go this way:

$factorial = function($n) use (&$factorial) {
      if ($n <= 1)
        return 1;
      else
        return $n $factorial($n 1);
};

Doing Y-combinator in PHP was attempted before (and here), but now I think it works better. It could be even nicer if PHP syntax allowed chaining function invocations – ($foo($bar))($baz) – but for now it doesn’t.

If you wonder, using such techniques does have legitimate applications, though I’m not sure if doing it in PHP this way is worth the trouble.