PHP performance tips from Google

I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of them are quite sound, if not new – like use latest versions if possible, profile your code, cache whatever can be cached, etc. Some are of doubtful value – like the output buffering one, which could be useful in some situations but do nothing or be worse in others, and if you’re a beginner generally it’s better for you to leave it alone until you’ve solved the real performance problems.

However some of the advices make no sense at best and are potentially harmful at worst. Let’s get to it:

First one: Don’t copy variables for no reason. I don’t know what the author intended to describe there, but PHP engine is refcounting copy-on-write, and there’s absolutely no copying going on when assigning variables as they described it:

$description = strip_tags($_POST['description']);
echo $description;

I don’t know where it comes from but it’s just not so, unless maybe in some prehistoric version of PHP. Which means unless you’re going back to 1997 in a time machine this advice is no good for you.

Next one: Avoid doing SQL queries within a loop. This actually might make sense in some situations, however the code examples they give there is missing one important detail that makes it potentially harmful for beginners (see if you can spot it):

$userData = [];
foreach ($userList as $user) {
$userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
}
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);

Please repeat after me – DO NOT INSERT USER DATA INTO SQL WITHOUT SANITIZING IT!
Of course, I can not know that $user was not sanitized. Maybe the intent was that it was. But if you give such example and target beginners, you should say so explicitly, every time! People tend to copy/paste examples, and then you get SQL injection in a government site.

Another thing: most of real-life PHP applications usually do not insert data in bulk, except for some very special scenarios (bulk data imports, etc.) – so actually in most cases one would be better off using PDO and prepared statements. Or some higher-level frameworks which will do it for you. But if you roll your own SQL – sanitize the data! This is much more important than any performance tricks.

Next one: Use single-quotes for long strings. PHP code is parsed and compiled, and any possible difference in speed between parsing “” and ” is really negligible unless you operate with hundreds of megabyte-size strings embedded in your code. If you do so, your quotes probably aren’t where you should start optimizing. And of course, using caching (see below) eliminates this difference altogether.

Next one: Use switch/case instead of if/else. This makes no sense since switch does essentially the same things as if’s do. See for yourself, here is the “if” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       2     T(2) = IS_EQUAL(A(1), C("add"))
3       2     JMPZ(T(2), 7)
4       3     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       3     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       4     JMP(16)
7       4     A(4) = FETCH_R(C("_POST")) [global]
8       4     A(5) = FETCH_DIM_R(A(4), C("action")) [Standard]
9       4     T(6) = IS_EQUAL(A(5), C("delete"))
10      4     JMPZ(T(6), 14)
11      5     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
12      5     Au(7) = DO_FCALL_BY_NAME() [0 arguments]
13      6     JMP(16)
14      7     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
15      7     Au(8) = DO_FCALL_BY_NAME() [0 arguments]
16      9     RETURN(C(1))
17      9     HANDLE_EXCEPTION()

Here is the “switch” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       3     T(2) = CASE(A(1), C("add"))
3       3     JMPZ(T(2), 8 )
4       4     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       4     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       5     BRK(0, C(1))
7       6     JMP(10)
8       6     T(2) = CASE(A(1), C("delete"))
9       6     JMPZ(T(2), 14)
10      7     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
11      7     Au(4) = DO_FCALL_BY_NAME() [0 arguments]
12      8     BRK(0, C(1))
13      9     JMP(15)
14      9     JMP(19)
15     10     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
16     10     Au(5) = DO_FCALL_BY_NAME() [0 arguments]
17     11     BRK(0, C(1))
18     12     JMP(20)
19     12     JMP(15)
20     12     SWITCH_FREE(A(1))
21     13     RETURN(C(1))
22     13     HANDLE_EXCEPTION()
No.     CONT    BRK     Parent
0         20          20           -1

You can see there’s a little difference – the latter has CASE/BRK opcodes, which act more or less like IS_EQUAL and JMP, but their plumbing is a bit different, but in general, code is the same (you could even argue “switch” code is a bit less optimal, but that is really the area you shouldn’t be concerned with before you can read and understand the code in zend_vm_def.h – which is not exactly a beginner stuff.

Another thing that the author absolutely failed to mention and which should be one of the very first things anybody who cares about performance should do – is to use a bytecode cache. There are plenty of free ones (shameless plug: Zend Server CE includes one of them – all the performance improvements for $0 :) and you don’t have to change a bit of code to run it.

Now, I understand Google is not a PHP shop like Yahoo or Facebook or many others. But this article is signed “Eric Higgins, Google Webmaster” and one would expect something much more sound from such source. And in fact there are a lot of blogs and conference talks on the topic and lots of community folks around that I am sure would be ready to help with such article – I wonder why wasn’t it done? Why apparently the best advice we can find from Google is either trivial or useless or wrong?

I think they can do much better, and they should if they take “making the web faster” seriously.

P.S. After having all this written, I also found a comment from Gwynne Raskind, which I advise to read too.

the secret of PHP

So, another “PHP sucks” post, this time from Jeff Atwood. He actually ends up even kind of praising PHP, surprised by its success. I have a couple of thoughts on that topic too.

First, people really need to stop reading something on PHP written somewhere in 2005 (probably about experiences that happened in 2001) and apply it to PHP as it is now, without even checking around for current trends. It’s as if people would dig up books from middle ages saying that there are only seven metals in existence or debating about phlogiston, and would use it speaking about the modern chemistry. Come on!

Then the next thing apparently wrong with PHP is too many functions. Right. Since when? Since when having a lot of functions is a problem? Does it hurt anybody? Does it make writing PHP code harder? Does it make programmer less successful in achieving his goals?
About keywords I could kind of understand – OK, a lot to remember (though I didn’t see anybody really having trouble to remember such complicated keywords as “while”, “if”, “class” or “public”) and it takes out some good English words that could be used as function/method names to confuse the enemy (who wouldn’t want to have function named endforeach() or static(), not to mention function()? too bad those are not available!). But complaining there’s too many actual functions that allow you to do real useful stuff? That is the thing that is bothering people? That is what scares people away from using the language “for years”?

The next beef with PHP is that people write sucky code on it. No, really, they do? Must be something really wrong with this language. It’s not like people write mind-bogglingly sucky code on every other “good” language on the planet. But I get it. The intent was – PHP makes easy to write sucky code. Yes, this is true. As true as “Porsche 997 makes it easy to drive at 100mph into a brick wall”.  PHP makes it easy to write various kinds of code – and if 90% of code written is sucky, then 90% of PHP code would be sucky. But my experience says quality of the production code almost never has much to do with the language, but only with the culture – organizational and personal, and with choosing right ways to do the job. The rest is just bad statistics in play. Like “I know 7-year-old writing websites, and his PHP code sucks”. I bet his Haskell code rules though ;)

That’s not to say PHP couldn’t use improvement. It could. And it does, actually – and there’s enough room for improvement still, in many areas. But it probably would never satisfy purists. It’s practical. Maybe it doesn’t allow you to write whole programs in one line of uncomprehensible character soup or play with high-level math theory concepts, but it allows people to write web applications. So they do – so where’s the surprise when one morning somebody wakes up and discovers there’s a ton of web applications around and they are written in PHP? :)

P.S. I wish for every 50 “PHP sucks” blogs people would write one good RFC.