PHP performance tips from Google

I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of them are quite sound, if not new – like use latest versions if possible, profile your code, cache whatever can be cached, etc. Some are of doubtful value – like the output buffering one, which could be useful in some situations but do nothing or be worse in others, and if you’re a beginner generally it’s better for you to leave it alone until you’ve solved the real performance problems.

However some of the advices make no sense at best and are potentially harmful at worst. Let’s get to it:

First one: Don’t copy variables for no reason. I don’t know what the author intended to describe there, but PHP engine is refcounting copy-on-write, and there’s absolutely no copying going on when assigning variables as they described it:

$description = strip_tags($_POST['description']);
echo $description;

I don’t know where it comes from but it’s just not so, unless maybe in some prehistoric version of PHP. Which means unless you’re going back to 1997 in a time machine this advice is no good for you.

Next one: Avoid doing SQL queries within a loop. This actually might make sense in some situations, however the code examples they give there is missing one important detail that makes it potentially harmful for beginners (see if you can spot it):

$userData = [];
foreach ($userList as $user) {
$userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
}
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);

Please repeat after me – DO NOT INSERT USER DATA INTO SQL WITHOUT SANITIZING IT!
Of course, I can not know that $user was not sanitized. Maybe the intent was that it was. But if you give such example and target beginners, you should say so explicitly, every time! People tend to copy/paste examples, and then you get SQL injection in a government site.

Another thing: most of real-life PHP applications usually do not insert data in bulk, except for some very special scenarios (bulk data imports, etc.) – so actually in most cases one would be better off using PDO and prepared statements. Or some higher-level frameworks which will do it for you. But if you roll your own SQL – sanitize the data! This is much more important than any performance tricks.

Next one: Use single-quotes for long strings. PHP code is parsed and compiled, and any possible difference in speed between parsing “” and ” is really negligible unless you operate with hundreds of megabyte-size strings embedded in your code. If you do so, your quotes probably aren’t where you should start optimizing. And of course, using caching (see below) eliminates this difference altogether.

Next one: Use switch/case instead of if/else. This makes no sense since switch does essentially the same things as if’s do. See for yourself, here is the “if” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       2     T(2) = IS_EQUAL(A(1), C("add"))
3       2     JMPZ(T(2), 7)
4       3     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       3     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       4     JMP(16)
7       4     A(4) = FETCH_R(C("_POST")) [global]
8       4     A(5) = FETCH_DIM_R(A(4), C("action")) [Standard]
9       4     T(6) = IS_EQUAL(A(5), C("delete"))
10      4     JMPZ(T(6), 14)
11      5     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
12      5     Au(7) = DO_FCALL_BY_NAME() [0 arguments]
13      6     JMP(16)
14      7     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
15      7     Au(8) = DO_FCALL_BY_NAME() [0 arguments]
16      9     RETURN(C(1))
17      9     HANDLE_EXCEPTION()

Here is the “switch” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       3     T(2) = CASE(A(1), C("add"))
3       3     JMPZ(T(2), 8 )
4       4     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       4     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       5     BRK(0, C(1))
7       6     JMP(10)
8       6     T(2) = CASE(A(1), C("delete"))
9       6     JMPZ(T(2), 14)
10      7     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
11      7     Au(4) = DO_FCALL_BY_NAME() [0 arguments]
12      8     BRK(0, C(1))
13      9     JMP(15)
14      9     JMP(19)
15     10     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
16     10     Au(5) = DO_FCALL_BY_NAME() [0 arguments]
17     11     BRK(0, C(1))
18     12     JMP(20)
19     12     JMP(15)
20     12     SWITCH_FREE(A(1))
21     13     RETURN(C(1))
22     13     HANDLE_EXCEPTION()
No.     CONT    BRK     Parent
0         20          20           -1

You can see there’s a little difference – the latter has CASE/BRK opcodes, which act more or less like IS_EQUAL and JMP, but their plumbing is a bit different, but in general, code is the same (you could even argue “switch” code is a bit less optimal, but that is really the area you shouldn’t be concerned with before you can read and understand the code in zend_vm_def.h – which is not exactly a beginner stuff.

Another thing that the author absolutely failed to mention and which should be one of the very first things anybody who cares about performance should do – is to use a bytecode cache. There are plenty of free ones (shameless plug: Zend Server CE includes one of them – all the performance improvements for $0 :) and you don’t have to change a bit of code to run it.

Now, I understand Google is not a PHP shop like Yahoo or Facebook or many others. But this article is signed “Eric Higgins, Google Webmaster” and one would expect something much more sound from such source. And in fact there are a lot of blogs and conference talks on the topic and lots of community folks around that I am sure would be ready to help with such article – I wonder why wasn’t it done? Why apparently the best advice we can find from Google is either trivial or useless or wrong?

I think they can do much better, and they should if they take “making the web faster” seriously.

P.S. After having all this written, I also found a comment from Gwynne Raskind, which I advise to read too.

About these ads

54 thoughts on “PHP performance tips from Google

  1. Pingback: What is the difference between Switch-Case and If-Else in PHP? | SeekPHP.com

  2. “Don’t copy variables for no reason”
    Google is right.
    Their first example creates two different variables, one with stripped tags, one without. PHP saves both instead of a diff for performance reasons. I doubt the first variable would be needed afterwards and the example won’t even need the second one after that either.

    • I think you’re missing a very important point – it doesn’t matter, since if you’re optimizing against string parsing, you’re looking for optimizations in a wrong place. It doesn’t matter if a particular microbenchmark gives that result or another (which would most probably be result of random fluctuations) – it’s not where you should look for optimizations.

  3. Hi all,
    PHP performance tips from Google I saw a link on twitter also reffed to PHP optimization page. There is also lot more advice into that page..
    thanks it nice Articles..

  4. Pingback: PHP tips for optimizing @ fake's

  5. Pingback: PHP performance tips from Google « Magento

  6. I am sorry, OnGe, but what you describe is not what happens. If you look at the opcodes for echo strip_tags($_POST['description']), you see something like:

              0       2     A(1) = FETCH_R(C("_POST")) [global]
              1       2     A(0) = FETCH_DIM_R(A(1), C("description")) [Standard]
              2       2     SEND_VAR(A(0), 1)
              3       2     A(0) = DO_FCALL(C("strip_tags")) [1 arguments]
              4       2     ECHO_OP(A(0))
    

    Function strip_tags DOES NOT alter its argument. It returns the modified string into A(0). The only difference is that when you assign it, A(0) survives longer. But it is always created. Old POST variable is ALWAYS there.

    This is one more reason why you should avoid tricks. It’s harder to get them right than you think.

  7. You are wrong with many points of your article.
    Don’t copy variables for no reason
    $description = strip_tags($_POST['description']);
    echo $description;

    this acctually consume more memory than just echo, simply because you need to save it to variable, then remember value. There would be no difference if there wasnt strip_tags function, but here you do not make reference to $_POST['description'] value, but make another value from return of strip_tags function. Just echo doesnt remember this at all.

    Avoid doing SQL queries within a loop
    There is appearantly no need to bother with sasnitation in article about optimalization, as well as there is no need to bother with optimalization in article about sanitation. Second, there is no user input, it can be expected that data is loaded from database and thus it must be safe right now.

    Avoid doing SQL queries within a loop
    If you care about performence, you will probably not use PDO. Another thing is PDO doesnt shrink you INSERT commands to one, it just lounch in in one bunch. It saves communication between web and DB server, but still is slower. Some SQL framework is good idea, of course, but such framework needs to be written by someone and then these people need to know such tweaks.

    About bytecode caching, it is of course nice thing but it doesnt make run code faster, it just saves time for checking and parsing sourcecode. You still can get very slow code even with such optimizers and you can get very fast code without them. No need to mention that not all webhosters offers such software on their machines.

    So, only thing I can agree with you is that there is no significat gain in performence in if/switch and “/’ stuff. Otherwise, you are perhaps missed something or you are just arguing about things you do not fully understand and feeling great, because you “beat” some giant as Google. Thats fine, so many people does some thing. But next time, try to test your stuff first – that google article is much more correct than your is;)

    • Don’t copy variables for no reason
      This consumes marginally more memory (about 30-40 bytes I’d say) but that’s really not what you should be thinking about when writing an app.

      “Just echo doesnt remember this at all.

      If you talk about the difference in the lifetimes, it does exist but if your code is properly modular, it would matter only as long as you’re in the same scope. Google’s claim that that code duplicates the variable is false anyway.

      There is apparently no need to bother with sanitation in article about optimization, as well as there is no need to bother with optimization in article about sanitation.

      You are neglecting the fact that people tend to copy code from such articles. And we’re not talking about some security infrastructure that takes major effort – we are talking about basic code hygiene. Which needs to happen from the start, not come as an afterthought.

      About bytecode caching, it is of course nice thing but it doesnt make run code faster, it just saves time for checking and parsing sourcecode

      And if you ever tried it’s effects on an application, it’s usually quite significant (2-3x is common), unless your application code is seriously slow.

      No need to mention that not all webhosters offers such software on their machines.

      If you care about performance, you should then choose hosters that do offer that option. There are pretty cheap VPSes out there, too.

      you are just arguing about things you do not fully understand and feeling great, because you “beat” some giant as Google.

      I don’t want to get into bragging contests, but I have reasonable confidence that after 10 years of working in the field, writing some of the best PHP tools out there and some of the actual engine code that runs PHP – I have at least some understanding of the matters. Of course, it can be flawed and I can be mistaken, and when I am pointed out that I am, with proof, I will always be ready to admit it and correct it.

      • That Google article isnt about how to tune up your server, its about how to write faster running PHP code. Thats it. Writing there about optimizers would be for book, not for article. I do not dispute gain of bytecode cashing, but it simply isnt matter of that article. Same thing about sanitation (of something that probably was sanitized before).

        Anyway, you write you are ready to admit and correct mistake. Then do that. You have big one in that Don’t copy variables for no reason part. Difference between echo function(‘something’); and echo $variable = function(‘something’); is increasing with size of variable. If you want proof, write this few lines of code and run it. Even if you echo small things, that really give you just few bytes, it can make big difference when you are preparing big data feed or loging progress of parsing such feeds. Simply because it happen hundreds, thousands or hundreds of thousands times :)

        Btw, google doesnt claim variable copy all the time, but when its altered. See article:
        DWhat this actually results in is doubled memory consumption (when the variable is altered), and therefore, slow scripts.

        PS: you can say one should not bother with such things when writing application. You are right, but it is true for any optimalization. This is something for what should be paid attention when app is done.

        • What that article is a collection of random tricks value of which is dubious at best. This is not how performance optimization is properly done, neither it should be “paid attention when app is done”. It should be paid attention from the start, but what you should pay attention to is not stupid tricks that try to cheat the engine of a dozen of bytes or CPU cycles, but proper architecture and design, and using the right tools for the job. I’ve seen too many apps full of such stupid tricks but not even bothering to cache database accesses or minimize filesystem interaction, etc. That’s the consequence of thinking performance optimization is a bag of tricks.

          I don’t know what Google meant by “when variable is altered” (in that case old value would be just destroyed, not duplicated) but nothing like that happens happens in the code there – it is not altered and not duplicated. And in the example they cited BOTH implementations would use 1MB of memory at the peak.

          • You appearenly do not distinct code optimalization and design optimalization (not sure about term, english isnt my language). First thing is exactly bunch of tricks and it is exactly what is that article about (IMHO). Design optimalization is that what you are talking about, it is often place where big gain can be achieved (bacause of poor design in way of performance). This is something that can be hardly generalized and written in understandable article.

            To altered variable problem:
            Look at the code:
            $description = strip_tags($_POST['description']);

            Here you pass _POST varuable to strip_tags function and store return value to variable description. Function strip_tags makes altered _POST['description'] variable. So, you get something different, that you store to new variable and then you get another value, because old _POST value is (of course) still there.

            This do not happen when you just echo it and this is exactly what is this example about.

  8. Pingback: PHP optimizing @ fake's

  9. Pingback: Fordnox » Blog Archive » PHP optimization advice from Google

  10. Pingback: Stationsbloggen » Arkivet » Google kan inte PHP

  11. Pingback: PHP communityn sågar Googles tips | Andreas Eriksson - Baronen

  12. Switch statements vs if/then/else statement performance is different from language to language. The preference for switch statements is primarily for “read-ability.” It’s actually good that the performance of switch and if/then/else statements in PHP is on the same level. There should now be no excuse to NOT use switch statements when applicable, for the sake of yourself and all other developers. Avoiding complicated logic chains should always be the goal.

    But I agree that most of the code examples by Google were horrendous.

  13. Pingback: Echte PHP Performance Tipps | CWD - Customized Web Development

  14. Pingback: PHP 10.0 Blog: More on PHP performance | Webs Developer

  15. Pingback: PHP 10.0 Blog: More on PHP performance | DreamNest - Technology | Web | Net

  16. Pingback: Desarrollo Web Varia | Propiedad Privada

  17. Pingback: Vesess » Google’s PHP performance tips attract ire from PHP world

  18. Pingback: Dicas de otimização de PHP do Google estavam furadas | José Ricardo

  19. Thanks Stas for pointing out the articles wrongness and shortcomings. Google should really be doing much better than that pap piece on PHP performance. To write about performance and not include any metrics is shameful. As is not following one’s own first piece of advice: “Profile your code to pinpoint bottlenecks”.

    There are many pages on the web giving similar performance tips for PHP and all they really seem to do is give PHP a bad name. Clearly, quite some effort went into making the Google article (the video is slick). It would have been much better if Google put the effort into making a more in-depth piece on profiling or caching. I know I’ve found information on APC, for instance, a bit lacking.

  20. Pingback: Lenguajes X » Recomendaciones de Google para optimizar PHP y las replicas

  21. Note that in the “Avoid unnecessary copies” bit you seem to have missed the call to strip_tags(). After the assignment, $description != $_POST['description'].

    • I know. However whatever is strip_tags producing, it’s there. It doesn’t matter if you assign it to variable, to 10 variables or to no variables. All assignment does is creating one more entry in hashtable and changing refcount.

  22. Hey,

    Nice article you have there.
    Now i’m wondering… how did you do those if and switch code blocks? is that assemble code or c code.. could you explain how i get output like that?

    Thanx,
    Mark.

  23. Pingback: PHP performance tips from Google | Mark Joseph Aspiras

  24. Good article, but I disagreed with you when you said “any possible difference in speed between parsing “” and ” is really negligible”. Even though it is a small amount of extra processing time. It can slowly add up and if you have a high traffic website a 0.1% increase in efficiency can lower server costs considerably.

    • This is not true. The ” vs ‘ is done before the code is even executed in PHP. They both have different token names. The cost is so low that it’s not even worth measuring. Use ” when it’s convenient. Use ‘ when it’s convenient.

      Also, not that it matters, but “hello $name” is probably marginally faster than ‘hello ‘ . $name.

      But I don’t think anyone should care. I doubt it’s 0.1, and if your web site has that much traffic, you should focus on loops and, in general, algorithms.

      Focusing on string grammars is pointless.

  25. Pingback: Top Posts « WordPress.com

  26. Great article. Way to stick it to Google.
    The only thing I might say in response is that they may have a point with the if/else vs. switch which is that if you look at the opcode you see that the if/else makes a call to access the key in the hash table for each if/elseif whereas switch only does it once. I don’t know how much of a difference this makes, but I’ve always tried to avoid hash table reads vs a read to a regular variable.
    Just my two cents.

    • It might make some sense if they had a function and noted that it takes significant time to run it, but saving a couple of hash lookups isn’t really a thing you should be worried about. Even then the difference is not between if() and switch() per se as between doing same thing once and multiple times.

  27. Pingback: PHP 10.0 Blog: PHP performance tips from Google | DreamNest - Technology | Web | Net

  28. Pingback: PHP 10.0 Blog: PHP performance tips from Google | Webs Developer

  29. Case/switch is actually a little less optimal. I’ve seen benchmarks done using if blocks and switch blocks. The if blocks performed a bit faster. My only guess is that Google is going for readability of the code.

  30. I don’t think there’s anything misleading or inappropriate about said article. I think all tips mentioned would be quite helpful to beginners. Sure, you may be saving micro-seconds here and there, but considering it doesn’t take any extra effort to implement these tips (besides just knowing and getting use to them), it’s definitely worth it.

    Now, let’s talk about the first example “Don’t copy variables for no reason.”. This is a perfectly valid tip. The result of the example code given is two completely separate strings, almost identical (hence, double the memory is taken up).

    As for the second example, this is also fine. As to not confuse the beginner, he’s focused only on the tip in focus, without confusing the user by talking about SQL injection, which hasn’t got anything to do with performance at all. If he were to mention such a thing, he’d almost have to write a completely separate article.

    Now finally, for the last tip you criticise, this is also a valid tip. I guess his only mistake was to not mention under what circumstances the switch statement would be quicker. If you have an if statement which contains some form of processing (besides the actual boolean evaluation), such as a function call, then the switch statement will be quicker as it will only have to run that function (as an example) once, where as an else/if statement would need to re-run the function for every ‘else if’.

    • I think there’s much more important point – my fault is that I failed to convey it beyond all the technical details. The point is even if those tricks were valid – which I still maintain most of them aren’t, at least in the form they were presented – if you are a beginner and start optimizing your application, giving you a random collection of engine tricks that might allow you to save 0.1% of execution time in random places is absolutely worst advice you could ever give to a beginner. Optimization should not start with engine tricks. Actually, when you properly optimize you will probably need no such tricks anyway, as I hardly can imagine any application which performance depends on if you use single or double quotes.
      Giving a beginner these random tricks of doubtful use and mentioning really important things only in passing gives readers impression that performance optimization is – at least when it comes to PHP – collection of weird tricks and that’s the way you should go to make your applications perform. Nothing could be farther from the truth.
      I think I need another post on the matter… :)

      • The topic of single quotes is often banished, but Zend Framework coding standards, for example, force to use single quotes. I think it’s a KISS approach: don’t use double quotes if you don’t do variable substitution because people will think that there are variables in the string, just as you use private methods instead of making all public because this way no one will suppose that the method is being called from an external class (and thus cannot be refactored easily). And you also avoid to have ‘\n’ parsed.

    • The variables tip is misleading. Obviously variable assignment in any language costs something, even in C. The author is focusing on PHP here and making the inference that there is enough of a cost to consciously avoid variable assignment. A beginner will take this “tip” and try to use as few variables as possible to make the code “fast”. As Stas points out, this will have almost zero effect on the speed of the code. It will, however, have a disastrous consequences to the readability and maintainability of the code.

    • $foo = $bar;

      This does *NOT* take up twice as much memory!

      It takes an extra 8 bytes or so, until you *change* $foo or $bar, at which point copy-on-write kicks in.

      • But the example didnt show $foo = $bar;
        It used a function, that actually _changes_ the variable

        $description = strip_tags($_POST['description']);

        The example given is completly valid. And even if $foo = $bar; doesnt double the Memory, you could make the argument that everyone that changes $foo, doubles the memory later. It makes no sense to use 2 vars if it can be done over one.

  31. erm. Just quoting is not enough and will not prevent sql injection.
    You shall properly cast and escape. Case numerics and use mysql_real_escape_string() – or an appropriate function for your DB – on all literals. I prefer sprintf() for that:

    $sql = sprintf( ‘
    SELECT …
    FROM table
    WHERE numeric_column = %d
    AND string_column = “%s”
    AND float_column = %f ‘
    , $numeric
    , mysql_real_escape_string( $string, $connection)
    , $float
    );

    • You are right, quoting is not a good description as it might be misunderstood as just sticking ”s around the variable is enough. I’ve corrected the wording.

    • $sql = “SELECT `…` FROM `table` ”
      .= “WHERE `numeric_column` = ‘” . (int)$numeric . “‘ ”
      .= “AND `string_column` = ‘” . mysql_real_escape_string( $string, $connection) . “‘ ”
      .= “AND `float_column` = ‘” . (float)$float . “‘ “;

      This is how I’d do it

      • That’s all well and good until someone who doesn’t know what they are doing comes along and “maintains” your code, or uses the same style and forgets a cast.

        Best bet is to move over to PDO and use prepared statements with type hinting.

  32. Pingback: links for 2009-06-26 | burningCat

Comments are closed.