PHP 10.0 Blog

What if…

PHP performance tips from Google

Posted by Stas on June 26, 2009

I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of them are quite sound, if not new – like use latest versions if possible, profile your code, cache whatever can be cached, etc. Some are of doubtful value – like the output buffering one, which could be useful in some situations but do nothing or be worse in others, and if you’re a beginner generally it’s better for you to leave it alone until you’ve solved the real performance problems.

However some of the advices make no sense at best and are potentially harmful at worst. Let’s get to it:

First one: Don’t copy variables for no reason. I don’t know what the author intended to describe there, but PHP engine is refcounting copy-on-write, and there’s absolutely no copying going on when assigning variables as they described it:

$description = strip_tags($_POST['description']);
echo $description;

I don’t know where it comes from but it’s just not so, unless maybe in some prehistoric version of PHP. Which means unless you’re going back to 1997 in a time machine this advice is no good for you.

Next one: Avoid doing SQL queries within a loop. This actually might make sense in some situations, however the code examples they give there is missing one important detail that makes it potentially harmful for beginners (see if you can spot it):

$userData = [];
foreach ($userList as $user) {
$userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
}
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);

Please repeat after me – DO NOT INSERT USER DATA INTO SQL WITHOUT SANITIZING IT!
Of course, I can not know that $user was not sanitized. Maybe the intent was that it was. But if you give such example and target beginners, you should say so explicitly, every time! People tend to copy/paste examples, and then you get SQL injection in a government site.

Another thing: most of real-life PHP applications usually do not insert data in bulk, except for some very special scenarios (bulk data imports, etc.) – so actually in most cases one would be better off using PDO and prepared statements. Or some higher-level frameworks which will do it for you. But if you roll your own SQL – sanitize the data! This is much more important than any performance tricks.

Next one: Use single-quotes for long strings. PHP code is parsed and compiled, and any possible difference in speed between parsing “” and ” is really negligible unless you operate with hundreds of megabyte-size strings embedded in your code. If you do so, your quotes probably aren’t where you should start optimizing. And of course, using caching (see below) eliminates this difference altogether.

Next one: Use switch/case instead of if/else. This makes no sense since switch does essentially the same things as if’s do. See for yourself, here is the “if” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       2     T(2) = IS_EQUAL(A(1), C("add"))
3       2     JMPZ(T(2), 7)
4       3     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       3     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       4     JMP(16)
7       4     A(4) = FETCH_R(C("_POST")) [global]
8       4     A(5) = FETCH_DIM_R(A(4), C("action")) [Standard]
9       4     T(6) = IS_EQUAL(A(5), C("delete"))
10      4     JMPZ(T(6), 14)
11      5     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
12      5     Au(7) = DO_FCALL_BY_NAME() [0 arguments]
13      6     JMP(16)
14      7     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
15      7     Au(8) = DO_FCALL_BY_NAME() [0 arguments]
16      9     RETURN(C(1))
17      9     HANDLE_EXCEPTION()

Here is the “switch” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       3     T(2) = CASE(A(1), C("add"))
3       3     JMPZ(T(2), 8 )
4       4     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       4     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       5     BRK(0, C(1))
7       6     JMP(10)
8       6     T(2) = CASE(A(1), C("delete"))
9       6     JMPZ(T(2), 14)
10      7     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
11      7     Au(4) = DO_FCALL_BY_NAME() [0 arguments]
12      8     BRK(0, C(1))
13      9     JMP(15)
14      9     JMP(19)
15     10     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
16     10     Au(5) = DO_FCALL_BY_NAME() [0 arguments]
17     11     BRK(0, C(1))
18     12     JMP(20)
19     12     JMP(15)
20     12     SWITCH_FREE(A(1))
21     13     RETURN(C(1))
22     13     HANDLE_EXCEPTION()
No.     CONT    BRK     Parent
0         20          20           -1

You can see there’s a little difference – the latter has CASE/BRK opcodes, which act more or less like IS_EQUAL and JMP, but their plumbing is a bit different, but in general, code is the same (you could even argue “switch” code is a bit less optimal, but that is really the area you shouldn’t be concerned with before you can read and understand the code in zend_vm_def.h – which is not exactly a beginner stuff.

Another thing that the author absolutely failed to mention and which should be one of the very first things anybody who cares about performance should do – is to use a bytecode cache. There are plenty of free ones (shameless plug: Zend Server CE includes one of them – all the performance improvements for $0 :) and you don’t have to change a bit of code to run it.

Now, I understand Google is not a PHP shop like Yahoo or Facebook or many others. But this article is signed “Eric Higgins, Google Webmaster” and one would expect something much more sound from such source. And in fact there are a lot of blogs and conference talks on the topic and lots of community folks around that I am sure would be ready to help with such article – I wonder why wasn’t it done? Why apparently the best advice we can find from Google is either trivial or useless or wrong?

I think they can do much better, and they should if they take “making the web faster” seriously.

P.S. After having all this written, I also found a comment from Gwynne Raskind, which I advise to read too.

About these ads

54 Responses to “PHP performance tips from Google”

  1. [...] PHP performance tips from Google [...]

  2. danielj said

    erm. Just quoting is not enough and will not prevent sql injection.
    You shall properly cast and escape. Case numerics and use mysql_real_escape_string() – or an appropriate function for your DB – on all literals. I prefer sprintf() for that:

    $sql = sprintf( ‘
    SELECT …
    FROM table
    WHERE numeric_column = %d
    AND string_column = “%s”
    AND float_column = %f ‘
    , $numeric
    , mysql_real_escape_string( $string, $connection)
    , $float
    );

    • Stas said

      You are right, quoting is not a good description as it might be misunderstood as just sticking ”s around the variable is enough. I’ve corrected the wording.

    • Julien said

      $sql = “SELECT `…` FROM `table` ”
      .= “WHERE `numeric_column` = ‘” . (int)$numeric . “‘ ”
      .= “AND `string_column` = ‘” . mysql_real_escape_string( $string, $connection) . “‘ ”
      .= “AND `float_column` = ‘” . (float)$float . “‘ “;

      This is how I’d do it

      • Phil said

        That’s all well and good until someone who doesn’t know what they are doing comes along and “maintains” your code, or uses the same style and forgets a cast.

        Best bet is to move over to PDO and use prepared statements with type hinting.

  3. Tom Wardrop said

    I don’t think there’s anything misleading or inappropriate about said article. I think all tips mentioned would be quite helpful to beginners. Sure, you may be saving micro-seconds here and there, but considering it doesn’t take any extra effort to implement these tips (besides just knowing and getting use to them), it’s definitely worth it.

    Now, let’s talk about the first example “Don’t copy variables for no reason.”. This is a perfectly valid tip. The result of the example code given is two completely separate strings, almost identical (hence, double the memory is taken up).

    As for the second example, this is also fine. As to not confuse the beginner, he’s focused only on the tip in focus, without confusing the user by talking about SQL injection, which hasn’t got anything to do with performance at all. If he were to mention such a thing, he’d almost have to write a completely separate article.

    Now finally, for the last tip you criticise, this is also a valid tip. I guess his only mistake was to not mention under what circumstances the switch statement would be quicker. If you have an if statement which contains some form of processing (besides the actual boolean evaluation), such as a function call, then the switch statement will be quicker as it will only have to run that function (as an example) once, where as an else/if statement would need to re-run the function for every ‘else if’.

    • Stas said

      I think there’s much more important point – my fault is that I failed to convey it beyond all the technical details. The point is even if those tricks were valid – which I still maintain most of them aren’t, at least in the form they were presented – if you are a beginner and start optimizing your application, giving you a random collection of engine tricks that might allow you to save 0.1% of execution time in random places is absolutely worst advice you could ever give to a beginner. Optimization should not start with engine tricks. Actually, when you properly optimize you will probably need no such tricks anyway, as I hardly can imagine any application which performance depends on if you use single or double quotes.
      Giving a beginner these random tricks of doubtful use and mentioning really important things only in passing gives readers impression that performance optimization is – at least when it comes to PHP – collection of weird tricks and that’s the way you should go to make your applications perform. Nothing could be farther from the truth.
      I think I need another post on the matter… :)

      • The topic of single quotes is often banished, but Zend Framework coding standards, for example, force to use single quotes. I think it’s a KISS approach: don’t use double quotes if you don’t do variable substitution because people will think that there are variables in the string, just as you use private methods instead of making all public because this way no one will suppose that the method is being called from an external class (and thus cannot be refactored easily). And you also avoid to have ‘\n’ parsed.

    • The variables tip is misleading. Obviously variable assignment in any language costs something, even in C. The author is focusing on PHP here and making the inference that there is enough of a cost to consciously avoid variable assignment. A beginner will take this “tip” and try to use as few variables as possible to make the code “fast”. As Stas points out, this will have almost zero effect on the speed of the code. It will, however, have a disastrous consequences to the readability and maintainability of the code.

    • $foo = $bar;

      This does *NOT* take up twice as much memory!

      It takes an extra 8 bytes or so, until you *change* $foo or $bar, at which point copy-on-write kicks in.

      • Katai said

        But the example didnt show $foo = $bar;
        It used a function, that actually _changes_ the variable

        $description = strip_tags($_POST['description']);

        The example given is completly valid. And even if $foo = $bar; doesnt double the Memory, you could make the argument that everyone that changes $foo, doubles the memory later. It makes no sense to use 2 vars if it can be done over one.

  4. Jamie said

    Case/switch is actually a little less optimal. I’ve seen benchmarks done using if blocks and switch blocks. The if blocks performed a bit faster. My only guess is that Google is going for readability of the code.

  5. [...] this new post to the PHP 10.0 blog Stas has some responses to the recent suggestions from Google as to how to [...]

  6. [...] this new post to the PHP 10.0 blog Stas has some responses to the recent suggestions from Google as to how to [...]

  7. Adam said

    Great article. Way to stick it to Google.
    The only thing I might say in response is that they may have a point with the if/else vs. switch which is that if you look at the opcode you see that the if/else makes a call to access the key in the hash table for each if/elseif whereas switch only does it once. I don’t know how much of a difference this makes, but I’ve always tried to avoid hash table reads vs a read to a regular variable.
    Just my two cents.

    • Stas said

      It might make some sense if they had a function and noted that it takes significant time to run it, but saving a couple of hash lookups isn’t really a thing you should be worried about. Even then the difference is not between if() and switch() per se as between doing same thing once and multiple times.

  8. Andrei said

    The Google list of advices for PHP optimizations is not even funny …

    “Optimization is hard! Let’s go shopping”

  9. [...] PHP performance tips from Google I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of [...] [...]

  10. Visko said

    The article is a joke. They are taking the piss, don’t you get it?

  11. Mike said

    Good article, but I disagreed with you when you said “any possible difference in speed between parsing “” and ” is really negligible”. Even though it is a small amount of extra processing time. It can slowly add up and if you have a high traffic website a 0.1% increase in efficiency can lower server costs considerably.

    • This is not true. The ” vs ‘ is done before the code is even executed in PHP. They both have different token names. The cost is so low that it’s not even worth measuring. Use ” when it’s convenient. Use ‘ when it’s convenient.

      Also, not that it matters, but “hello $name” is probably marginally faster than ‘hello ‘ . $name.

      But I don’t think anyone should care. I doubt it’s 0.1, and if your web site has that much traffic, you should focus on loops and, in general, algorithms.

      Focusing on string grammars is pointless.

  12. [...] Source: WordPress [...]

  13. Mark said

    Hey,

    Nice article you have there.
    Now i’m wondering… how did you do those if and switch code blocks? is that assemble code or c code.. could you explain how i get output like that?

    Thanx,
    Mark.

  14. Tim said

    Note that in the “Avoid unnecessary copies” bit you seem to have missed the call to strip_tags(). After the assignment, $description != $_POST['description'].

    • Stas said

      I know. However whatever is strip_tags producing, it’s there. It doesn’t matter if you assign it to variable, to 10 variables or to no variables. All assignment does is creating one more entry in hashtable and changing refcount.

  15. [...] recomendado una serie de consejos para optimizar nuestro código PHP, y no se ha hecho esperar la respuesta de la comunidad, diciendo que son consejos [...]

  16. Sebastian said

    Thanks Stas for pointing out the articles wrongness and shortcomings. Google should really be doing much better than that pap piece on PHP performance. To write about performance and not include any metrics is shameful. As is not following one’s own first piece of advice: “Profile your code to pinpoint bottlenecks”.

    There are many pages on the web giving similar performance tips for PHP and all they really seem to do is give PHP a bad name. Clearly, quite some effort went into making the Google article (the video is slick). It would have been much better if Google put the effort into making a more in-depth piece on profiling or caching. I know I’ve found information on APC, for instance, a bit lacking.

  17. [...] PHP performance tips from Google [...]

  18. [...] A Note on Google’s So-called Best Practices Make the Web Faster – Google groups PHP performance tips from Google [...]

  19. [...] imperdibles, aunque no se deben dejar de leer los comentarios y referencias, porque han metido alguna gamba de [...]

  20. [...] post on the PHP 10.0 blog, Stas looks at performance in PHP applications as his own response to the Google suggestions they recently released. So here are my thoughts about what would be good for the beginner to [...]

  21. [...] post on the PHP 10.0 blog, Stas looks at performance in PHP applications as his own response to the Google suggestions they recently released. So here are my thoughts about what would be good for the beginner to [...]

  22. [...] PHP performance tips from google by Stas Malyshev [...]

  23. Steve-o said

    Switch statements vs if/then/else statement performance is different from language to language. The preference for switch statements is primarily for “read-ability.” It’s actually good that the performance of switch and if/then/else statements in PHP is on the same level. There should now be no excuse to NOT use switch statements when applicable, for the sake of yourself and all other developers. Avoiding complicated logic chains should always be the goal.

    But I agree that most of the code examples by Google were horrendous.

  24. [...] två stycken artiklar som bevisar att de flesta av Googles tips inte stämmer. Den första är PHP 10.0 bloggen och den andra en en diskussion på Google [...]

  25. [...] skrev igår om hur PHP communityn sågar googles tips. Framför allt länkar han till några andra artiklar som rör samma ämne och som är mycket intressanta och i viss mån oroväckande (för de [...]

  26. [...] optimization advice from Google Response to Google’s optimization advices Another response to Google’s optimization advices More on PHP performance Tags: google, php, zend Comments (0) [...]

  27. [...] tips about optimizing PHP code and at first I happily took them in. Later on, having read other points of view, I started to wonder a bit about some of the optimizations and later still I realised that [...]

  28. OnGe said

    You are wrong with many points of your article.
    Don’t copy variables for no reason
    $description = strip_tags($_POST['description']);
    echo $description;

    this acctually consume more memory than just echo, simply because you need to save it to variable, then remember value. There would be no difference if there wasnt strip_tags function, but here you do not make reference to $_POST['description'] value, but make another value from return of strip_tags function. Just echo doesnt remember this at all.

    Avoid doing SQL queries within a loop
    There is appearantly no need to bother with sasnitation in article about optimalization, as well as there is no need to bother with optimalization in article about sanitation. Second, there is no user input, it can be expected that data is loaded from database and thus it must be safe right now.

    Avoid doing SQL queries within a loop
    If you care about performence, you will probably not use PDO. Another thing is PDO doesnt shrink you INSERT commands to one, it just lounch in in one bunch. It saves communication between web and DB server, but still is slower. Some SQL framework is good idea, of course, but such framework needs to be written by someone and then these people need to know such tweaks.

    About bytecode caching, it is of course nice thing but it doesnt make run code faster, it just saves time for checking and parsing sourcecode. You still can get very slow code even with such optimizers and you can get very fast code without them. No need to mention that not all webhosters offers such software on their machines.

    So, only thing I can agree with you is that there is no significat gain in performence in if/switch and “/’ stuff. Otherwise, you are perhaps missed something or you are just arguing about things you do not fully understand and feeling great, because you “beat” some giant as Google. Thats fine, so many people does some thing. But next time, try to test your stuff first – that google article is much more correct than your is;)

    • Stas said

      Don’t copy variables for no reason
      This consumes marginally more memory (about 30-40 bytes I’d say) but that’s really not what you should be thinking about when writing an app.

      “Just echo doesnt remember this at all.

      If you talk about the difference in the lifetimes, it does exist but if your code is properly modular, it would matter only as long as you’re in the same scope. Google’s claim that that code duplicates the variable is false anyway.

      There is apparently no need to bother with sanitation in article about optimization, as well as there is no need to bother with optimization in article about sanitation.

      You are neglecting the fact that people tend to copy code from such articles. And we’re not talking about some security infrastructure that takes major effort – we are talking about basic code hygiene. Which needs to happen from the start, not come as an afterthought.

      About bytecode caching, it is of course nice thing but it doesnt make run code faster, it just saves time for checking and parsing sourcecode

      And if you ever tried it’s effects on an application, it’s usually quite significant (2-3x is common), unless your application code is seriously slow.

      No need to mention that not all webhosters offers such software on their machines.

      If you care about performance, you should then choose hosters that do offer that option. There are pretty cheap VPSes out there, too.

      you are just arguing about things you do not fully understand and feeling great, because you “beat” some giant as Google.

      I don’t want to get into bragging contests, but I have reasonable confidence that after 10 years of working in the field, writing some of the best PHP tools out there and some of the actual engine code that runs PHP – I have at least some understanding of the matters. Of course, it can be flawed and I can be mistaken, and when I am pointed out that I am, with proof, I will always be ready to admit it and correct it.

      • OnGe said

        That Google article isnt about how to tune up your server, its about how to write faster running PHP code. Thats it. Writing there about optimizers would be for book, not for article. I do not dispute gain of bytecode cashing, but it simply isnt matter of that article. Same thing about sanitation (of something that probably was sanitized before).

        Anyway, you write you are ready to admit and correct mistake. Then do that. You have big one in that Don’t copy variables for no reason part. Difference between echo function(‘something’); and echo $variable = function(‘something’); is increasing with size of variable. If you want proof, write this few lines of code and run it. Even if you echo small things, that really give you just few bytes, it can make big difference when you are preparing big data feed or loging progress of parsing such feeds. Simply because it happen hundreds, thousands or hundreds of thousands times :)

        Btw, google doesnt claim variable copy all the time, but when its altered. See article:
        DWhat this actually results in is doubled memory consumption (when the variable is altered), and therefore, slow scripts.

        PS: you can say one should not bother with such things when writing application. You are right, but it is true for any optimalization. This is something for what should be paid attention when app is done.

        • Stas said

          What that article is a collection of random tricks value of which is dubious at best. This is not how performance optimization is properly done, neither it should be “paid attention when app is done”. It should be paid attention from the start, but what you should pay attention to is not stupid tricks that try to cheat the engine of a dozen of bytes or CPU cycles, but proper architecture and design, and using the right tools for the job. I’ve seen too many apps full of such stupid tricks but not even bothering to cache database accesses or minimize filesystem interaction, etc. That’s the consequence of thinking performance optimization is a bag of tricks.

          I don’t know what Google meant by “when variable is altered” (in that case old value would be just destroyed, not duplicated) but nothing like that happens happens in the code there – it is not altered and not duplicated. And in the example they cited BOTH implementations would use 1MB of memory at the peak.

          • OnGe said

            You appearenly do not distinct code optimalization and design optimalization (not sure about term, english isnt my language). First thing is exactly bunch of tricks and it is exactly what is that article about (IMHO). Design optimalization is that what you are talking about, it is often place where big gain can be achieved (bacause of poor design in way of performance). This is something that can be hardly generalized and written in understandable article.

            To altered variable problem:
            Look at the code:
            $description = strip_tags($_POST['description']);

            Here you pass _POST varuable to strip_tags function and store return value to variable description. Function strip_tags makes altered _POST['description'] variable. So, you get something different, that you store to new variable and then you get another value, because old _POST value is (of course) still there.

            This do not happen when you just echo it and this is exactly what is this example about.

  29. Stas said

    I am sorry, OnGe, but what you describe is not what happens. If you look at the opcodes for echo strip_tags($_POST['description']), you see something like:

              0       2     A(1) = FETCH_R(C("_POST")) [global]
              1       2     A(0) = FETCH_DIM_R(A(1), C("description")) [Standard]
              2       2     SEND_VAR(A(0), 1)
              3       2     A(0) = DO_FCALL(C("strip_tags")) [1 arguments]
              4       2     ECHO_OP(A(0))
    

    Function strip_tags DOES NOT alter its argument. It returns the modified string into A(0). The only difference is that when you assign it, A(0) survives longer. But it is always created. Old POST variable is ALWAYS there.

    This is one more reason why you should avoid tricks. It’s harder to get them right than you think.

  30. Ra said

    Good work. Looking for more tips.

  31. [...] 一月 19, 2010 作者为 bmchaoshi PHP performance tips from Google [...]

  32. [...] it was published. At that time, a number of the tips were massively criticized – a lot of blog posts were written, comments made, etc. Seemingly, this has led to the article being changed, which [...]

  33. Php2ranjan said

    Hi all,
    PHP performance tips from Google I saw a link on twitter also reffed to PHP optimization page. There is also lot more advice into that page..
    thanks it nice Articles..

  34. Jack said

    Seems double quote string is faster: http://www.linuxask.com/questions/should-i-always-use-single-quotes-for-php-strings

    • Stas said

      I think you’re missing a very important point – it doesn’t matter, since if you’re optimizing against string parsing, you’re looking for optimizations in a wrong place. It doesn’t matter if a particular microbenchmark gives that result or another (which would most probably be result of random fluctuations) – it’s not where you should look for optimizations.

  35. Henry said

    “Don’t copy variables for no reason”
    Google is right.
    Their first example creates two different variables, one with stripped tags, one without. PHP saves both instead of a diff for performance reasons. I doubt the first variable would be needed afterwards and the example won’t even need the second one after that either.

  36. zvz said

    I always wondered if data sanitization using real_escape_string makes any penalty to performance?

  37. [...] article in particular, https://php100.wordpress.com/2009/06/26/php-performance-google/, is interesting as the author compares internal php code of if vs switch and they are nearly [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: