PHP 10.0 Blog

What if…

Archive for the ‘Engine’ Category

5.3!!!

Posted by Stas on June 30, 2009

After a long string of delays, PHP 5.3 is finally out.  On the course of last 2 years, I was pretty sure a number of times that it will happen next month the latest, but there always were good reasons to postpone it. Now finally it’s officially out. I think it’s a huge step for PHP. Download it and try it!

Some major new features in 5.3:

  1. Namespaces! They didn’t end up exactly as I thought they would but they are a major feature PHP was missing for a long time, and I’m very curious to see how it works out in big projects.
  2. Closures and anonymous functions! PHP now has first-class functions, and you can do all kinds of crazy stuff with it. Or just make your code easier to read and maintain :)
  3. Garbage collection. PHP engine, being refcount-based, always has had a slight problem with reference loops. Even though usually it was not a big issue since at the end of the request everything is cleaned up, for long-running PHP applications not based on short request pattern it became a problem. Not anymore – now the engine knows to clean up such loops.
  4. Late static binding – it’s somewhat exotic thing for people that never encountered it, but was very burning issue for people that did need it. Basically, when class Foo extends class Bar, and the method func() defined in Foo is called as Bar::func(), there was no way to distinguish it from Foo::func(). Now there is. This allows to implement all kinds of cool patterns like ActiveRecord.
  5. Intl extension in core – lots of functions to allow you to internationalize your application.
  6. Phar in core – now you can pack all the application in one neat file and still be able to run it!

Also in 5.3:

  1. Nowdocs – same as heredocs, but doesn’t parse variables. Excellent feature for somebody that wants to include bing chunk of text into the script which can happen to have $’s etc. in it.
  2. ?: shortcut. That’s simple – $a?:$b is $a if $a is true, otherwise it’s $b.
  3. goto. Yes, I know. But now we have it too. Deal with it. :)
  4. mysqlnd – native PHP-specific mysql driver.

Last but definitely not least – tons of performance improvements, bug fixes, etc. Download it today! :)

Posted in Engine, Functions, PHP | 2 Comments »

PHP performance tips from Google

Posted by Stas on June 26, 2009

I saw a link on twitter referring to PHP optimization advice from Google. There are a bunch of advices there, some of them are quite sound, if not new – like use latest versions if possible, profile your code, cache whatever can be cached, etc. Some are of doubtful value – like the output buffering one, which could be useful in some situations but do nothing or be worse in others, and if you’re a beginner generally it’s better for you to leave it alone until you’ve solved the real performance problems.

However some of the advices make no sense at best and are potentially harmful at worst. Let’s get to it:

First one: Don’t copy variables for no reason. I don’t know what the author intended to describe there, but PHP engine is refcounting copy-on-write, and there’s absolutely no copying going on when assigning variables as they described it:

$description = strip_tags($_POST['description']);
echo $description;

I don’t know where it comes from but it’s just not so, unless maybe in some prehistoric version of PHP. Which means unless you’re going back to 1997 in a time machine this advice is no good for you.

Next one: Avoid doing SQL queries within a loop. This actually might make sense in some situations, however the code examples they give there is missing one important detail that makes it potentially harmful for beginners (see if you can spot it):

$userData = [];
foreach ($userList as $user) {
$userData[] = '("' . $user['first_name'] . '", "' . $user['last_name'] . '")';
}
$query = 'INSERT INTO users (first_name,last_name) VALUES' . implode(',', $userData);
mysql_query($query);

Please repeat after me – DO NOT INSERT USER DATA INTO SQL WITHOUT SANITIZING IT!
Of course, I can not know that $user was not sanitized. Maybe the intent was that it was. But if you give such example and target beginners, you should say so explicitly, every time! People tend to copy/paste examples, and then you get SQL injection in a government site.

Another thing: most of real-life PHP applications usually do not insert data in bulk, except for some very special scenarios (bulk data imports, etc.) – so actually in most cases one would be better off using PDO and prepared statements. Or some higher-level frameworks which will do it for you. But if you roll your own SQL – sanitize the data! This is much more important than any performance tricks.

Next one: Use single-quotes for long strings. PHP code is parsed and compiled, and any possible difference in speed between parsing “” and ” is really negligible unless you operate with hundreds of megabyte-size strings embedded in your code. If you do so, your quotes probably aren’t where you should start optimizing. And of course, using caching (see below) eliminates this difference altogether.

Next one: Use switch/case instead of if/else. This makes no sense since switch does essentially the same things as if’s do. See for yourself, here is the “if” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       2     T(2) = IS_EQUAL(A(1), C("add"))
3       2     JMPZ(T(2), 7)
4       3     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       3     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       4     JMP(16)
7       4     A(4) = FETCH_R(C("_POST")) [global]
8       4     A(5) = FETCH_DIM_R(A(4), C("action")) [Standard]
9       4     T(6) = IS_EQUAL(A(5), C("delete"))
10      4     JMPZ(T(6), 14)
11      5     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
12      5     Au(7) = DO_FCALL_BY_NAME() [0 arguments]
13      6     JMP(16)
14      7     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
15      7     Au(8) = DO_FCALL_BY_NAME() [0 arguments]
16      9     RETURN(C(1))
17      9     HANDLE_EXCEPTION()

Here is the “switch” code:

0       2     A(0) = FETCH_R(C("_POST")) [global]
1       2     A(1) = FETCH_DIM_R(A(0), C("action")) [Standard]
2       3     T(2) = CASE(A(1), C("add"))
3       3     JMPZ(T(2), 8 )
4       4     INIT_FCALL_BY_NAME(function_table, C("addUser"))
5       4     Au(3) = DO_FCALL_BY_NAME() [0 arguments]
6       5     BRK(0, C(1))
7       6     JMP(10)
8       6     T(2) = CASE(A(1), C("delete"))
9       6     JMPZ(T(2), 14)
10      7     INIT_FCALL_BY_NAME(function_table, C("deleteUser"))
11      7     Au(4) = DO_FCALL_BY_NAME() [0 arguments]
12      8     BRK(0, C(1))
13      9     JMP(15)
14      9     JMP(19)
15     10     INIT_FCALL_BY_NAME(function_table, C("defaultAction"))
16     10     Au(5) = DO_FCALL_BY_NAME() [0 arguments]
17     11     BRK(0, C(1))
18     12     JMP(20)
19     12     JMP(15)
20     12     SWITCH_FREE(A(1))
21     13     RETURN(C(1))
22     13     HANDLE_EXCEPTION()
No.     CONT    BRK     Parent
0         20          20           -1

You can see there’s a little difference – the latter has CASE/BRK opcodes, which act more or less like IS_EQUAL and JMP, but their plumbing is a bit different, but in general, code is the same (you could even argue “switch” code is a bit less optimal, but that is really the area you shouldn’t be concerned with before you can read and understand the code in zend_vm_def.h – which is not exactly a beginner stuff.

Another thing that the author absolutely failed to mention and which should be one of the very first things anybody who cares about performance should do – is to use a bytecode cache. There are plenty of free ones (shameless plug: Zend Server CE includes one of them – all the performance improvements for $0 :) and you don’t have to change a bit of code to run it.

Now, I understand Google is not a PHP shop like Yahoo or Facebook or many others. But this article is signed “Eric Higgins, Google Webmaster” and one would expect something much more sound from such source. And in fact there are a lot of blogs and conference talks on the topic and lots of community folks around that I am sure would be ready to help with such article – I wonder why wasn’t it done? Why apparently the best advice we can find from Google is either trivial or useless or wrong?

I think they can do much better, and they should if they take “making the web faster” seriously.

P.S. After having all this written, I also found a comment from Gwynne Raskind, which I advise to read too.

Posted in Engine, PHP | 28 Comments »

Y-Combinator in PHP

Posted by Stas on April 13, 2009

Since PHP 5.3 now has closures, all things that other languages with closures do should also be possible. One of them is having recursive closures. I.e. something like this:

$factorial = function($n) {
   if ($n <= 1)
     return 1;
   else
     return $n call_user_func(__FUNCTION__$n 1);
};

which does not work. One of the ways to do it is to use Y combinator function, which allows, by application of dark magic and friendly spirits from other dimensions, to convert non-recursive code to recursive code. In PHP, Y combinator function would look like this:

function Y($F) {
    $func =  function ($f) { return $f($f); };
    return $func(function ($f) use($F) {
            return $F(function ($x) use($f) {
            $ff $f($f);
            return $ff($x);
        });
    });
}

And then the factorial function would be:

$factorial Y(function($fact) {
    return function($n) use($fact) {
        return ($n <= 1)?1:$n*$fact($n-1);
    };
});

Which does work:

var_dump($factorial(6)); ==> int(720)

Of course, we could also cheat and go this way:

$factorial = function($n) use (&$factorial) {
      if ($n <= 1)
        return 1;
      else
        return $n $factorial($n 1);
};

Doing Y-combinator in PHP was attempted before (and here), but now I think it works better. It could be even nicer if PHP syntax allowed chaining function invocations – ($foo($bar))($baz) – but for now it doesn’t.

If you wonder, using such techniques does have legitimate applications, though I’m not sure if doing it in PHP this way is worth the trouble.

Posted in Engine, PHP | Tagged: , , , | 11 Comments »

5.3!

Posted by Stas on August 2, 2008

The PHP 5.3 release process has officially started with alpha1. Which hopefully means we’d have release in about 2-3 monthes.

This 5.3 release has two huge features that I think will have big influence on the future of PHP – namespaces and closures. There’s also late static binding, which allows to do all kinds of cool tricks, new cool extensions, new faster re2c-based parser, and many other smaller improvements.

Big thanks to everyone who helped to create it, provided feedback, helped with tests, documentation, etc. I think this version will be very successful. And separate thanks to Lukas who made this release from “erm… sometime soon” into “full speed ahead”!

Posted in Engine, PHP | Tagged: , | 2 Comments »

duck operator

Posted by Stas on June 5, 2008

Crazy idea for today – operator to check conformance to specific interface without actually implementing it. Why one would want that?
Well, if you are into duck typing style of programming, it may be interesting for you to have an object that implements certain set of functions, but not necessary declares it at class definition. Languages like Smalltalk do it all day along, so why PHP couldn’t? The idea is it looks like this:

interface Cow {
  function moo();
  function eatGrass();
}
/* somewhere else */
class MooingGrassEater {
  function moo() {/*stuff */}
  function eatGrass() {/*stuff */}
  /*stuff */
}
/* somewhere else */
function CowConsumer($classname) {
$foo = new $classname();
if($foo implements Cow) {
  echo "Behold the cow:";
  $foo->eatGrass();
  $foo->moo();
} else {
  echo "$classname is not a cow!";
}

implements here is our duck operator. Note that unlike instanceof, no formal relationship is required, but only practical implementation. So another name would be “common law marriage operator” ;)

Of course, this one would be anathema to “strict OO” camp, so if you subscribe to that, just ignore this post :)

Two challenges to this idea are:

  1. __call() – we have no way to know what __call does. So either we ignore it or say “ok, __call does everything”. I’d go for the latter.
  2. Performance. To check duck implementation one basically would have to match method lists, which amounts to number of is_callable calls equal to the number of methods in interface being checked.

Actually, PHP uses this style sometimes – see, for example, user defined streams. But there’s no nice way to work with it from the consumer side.

Posted in Engine | Tagged: , , | 15 Comments »

the secret of PHP

Posted by Stas on May 21, 2008

So, another “PHP sucks” post, this time from Jeff Atwood. He actually ends up even kind of praising PHP, surprised by its success. I have a couple of thoughts on that topic too.

First, people really need to stop reading something on PHP written somewhere in 2005 (probably about experiences that happened in 2001) and apply it to PHP as it is now, without even checking around for current trends. It’s as if people would dig up books from middle ages saying that there are only seven metals in existence or debating about phlogiston, and would use it speaking about the modern chemistry. Come on!

Then the next thing apparently wrong with PHP is too many functions. Right. Since when? Since when having a lot of functions is a problem? Does it hurt anybody? Does it make writing PHP code harder? Does it make programmer less successful in achieving his goals?
About keywords I could kind of understand – OK, a lot to remember (though I didn’t see anybody really having trouble to remember such complicated keywords as “while”, “if”, “class” or “public”) and it takes out some good English words that could be used as function/method names to confuse the enemy (who wouldn’t want to have function named endforeach() or static(), not to mention function()? too bad those are not available!). But complaining there’s too many actual functions that allow you to do real useful stuff? That is the thing that is bothering people? That is what scares people away from using the language “for years”?

The next beef with PHP is that people write sucky code on it. No, really, they do? Must be something really wrong with this language. It’s not like people write mind-bogglingly sucky code on every other “good” language on the planet. But I get it. The intent was – PHP makes easy to write sucky code. Yes, this is true. As true as “Porsche 997 makes it easy to drive at 100mph into a brick wall”.  PHP makes it easy to write various kinds of code – and if 90% of code written is sucky, then 90% of PHP code would be sucky. But my experience says quality of the production code almost never has much to do with the language, but only with the culture – organizational and personal, and with choosing right ways to do the job. The rest is just bad statistics in play. Like “I know 7-year-old writing websites, and his PHP code sucks”. I bet his Haskell code rules though ;)

That’s not to say PHP couldn’t use improvement. It could. And it does, actually – and there’s enough room for improvement still, in many areas. But it probably would never satisfy purists. It’s practical. Maybe it doesn’t allow you to write whole programs in one line of uncomprehensible character soup or play with high-level math theory concepts, but it allows people to write web applications. So they do – so where’s the surprise when one morning somebody wakes up and discovers there’s a ton of web applications around and they are written in PHP? :)

P.S. I wish for every 50 “PHP sucks” blogs people would write one good RFC.

Posted in Engine, Functions, PHP | Tagged: , | 17 Comments »

Namespaces FAQ

Posted by Stas on August 17, 2007

We now have an implementation of namespaces in PHP 6 HEAD, so here’s a short FAQ about how they work for those that are too laz^H^H^Hbusy to read the whole README.namespaces.

Q. Why PHP needs namespaces?
A. Because long names like PEAR_Form_Loader_Validate_Table_Element_Validator_Exception are really tiresome.

Q. What is the main goal of the namespace implementation?
A. To solve the problem above.

Q. What “namespace X::Y::Z” means?
A: 1. All class/function/method names are prefixed with X::Y::Z.
2. All class/function/method names are resolved first against X::Y::Z.

Q. What “import X::Y::Z as Foo” means?
A. Every time there’s Foo as a class/function name or prefix to the name, it really means X::Y::Z

Q. What “import X::Y::Z” means?
A. “import X::Y::Z as Z”, then see above.

Q. What “import Foo” means?
A. Nothing.

Q. What is the scope of namespace and import?
A. Current file.

Q. Can same namespace be used in multiple files?
A. Yes.

Q. Is there any relation between namespaces X::Y::Z and X::Y?
A. Only in programmer’s mind.

Q. How do I import all classes from namespace X::Y::Z into global space?
A. You don’t, since it brings back the global space pollution problem.
Instead, you import X::Y::Z and then prefix your classes with Z::.

Q. But doesn’t it mean I will still have long names?
A. Not longer then three elements: Namespace::Class::Element.

Q. Why it is not implemented like in <insert your favorite language here>?
A. Because PHP is not <insert your favorite language here> ;)

Also we are considering to add one more feature to namespaces – ability to declare a namespaced constant – i.e. constant named Name::Space::NAME – with same resolution rules like classes – with const operator. Consequently it may be also possible to have const NAME = ‘value’ in global context, meaning the same as define(’NAME’, ‘value’).

Also note namespaces are still work in progress, so it may happen it would be changed a lot when it’s released.

Posted in Engine | Tagged: , , , | 7 Comments »

Namespaces – can we keep it simple?

Posted by Stas on July 5, 2007

Dmitry Stogov has published the patch on PHP-internals implementing the simple namespace model for PHP that I co-authored. I urge everybody to please take a look and discuss it – best on the internals list since the audience is bigger, but comments here are welcome too.

The main idea of the proposal is to attack one target and this target only – the Super_Long_Really_Annoying_Enormous_Class_Names that lately became the bane of big project developer. All other things are considered secondary to this goal – no attempt to make some different include model, packaging model, etc. This approach, in my opinion, allows to greatly simplify the concept and the mechanics involved. It allows to reduce most of the work to simple text transformation, without any need to create complex hierarchies with obscure rules.

Of course, there are some edge cases still, but we aim to make frequently used cases easy and converting existing code to this model easier, while accepting that some edge cases might be uncomfortable. I am also sure that there are scenarios of which we did not think – and you are welcome to point those out.

One thing I feel might be missing from the current patch is the runtime resolution of namespaced names – currently if you use variable (new $classname) it has to contain the full name, possibly with use of __NAMESPACE__ constant containing current namespace name. I am not sure if we need runtime – it adds some convenience, but requires the engine to do much more work.

Posted in Engine | 18 Comments »

Kill resources

Posted by Stas on May 16, 2007

I wonder why we still have resource type in PHP?

Since 5.x, objects are perfectly capable on encapsulating any void * transparently (there’s at least 2 Java bridges doing that, for example) and of course using objects doesn’t force you to use OO syntax – i.e. you can do fread($foo) with $foo being either resource or object equally well. We can see ext/unicode/collator.c in PHP 6 as one example of dual interface also (I’m sure there are more, I just had to pick one). So objects as I see it can do anything resources can do. And much more – you could extend it (had we had file as object and not resource, streams probably would be much easier to implement), serialize it (provided correct methods of course), etc., etc.

Also, with some effort I think it would be possible to modify all resource-using code to use objects transparently – so all the scripts except for those that actually check the type to be “resource” (why one would do that anyway?) will keep working.

So, maybe it’s time to let the resource type go? Does anybody see any reason why resources are better than objects?

Posted in Engine | 5 Comments »

Improving executor

Posted by Stas on April 10, 2007

Calling function in PHP is not cheap. One of the reasons for that executor has a lot of things to take care of when calling function – a bunch of globals, execution state, symbol tables, etc., etc. And we do a lot of allocations and reallocations for them. Also since a number of these things live on the stack – on deep recursion the stack is depleted. So I was thinking how could we improve it?

  1. First step could be to unite all execution-state related variables into single structure. In compile-time we know how many Ts, CVs, etc. we might need, so this is fixed. Size of other structures is known too, so we know overall memory size for every function, and we can automatically allocate execution data on the function start. Which means no reallocs, only one allocation per execution cycle and probably even better memory usage due to the reuse of the memory blocks for frequently called functions.
  2. Right now some of the execution data is kept in a kind of stack. But we don’t really need it to be stack – as I see, pointer to previous structure is enough. Actually, when we doing backtraces stack even is a kind of problem since we need to figure out each time where we stand and where functions begin and end.
  3. We do need some kind of stack for function arguments and function-call-in-progress information. However, this stack does not need to be global – we do not use this information beyond current function call (counting functions called while calculating parameters for current function call). Thus, we could just make each function keep its own stack, and since we know the maximum function call depth for the code of any given user function at compile time, this stack can have fixed size too and fit into the structure in (1).
  4. Once we have all call information inside the single structure, we could rewrite execute() to use loop instead of recursive call, thus dramatically reducing stack requirements and probably speeding up the execution loop. Internal function calls would still use stack, of course – because that’s how C works :)
    Actually, we might be able to do it before but then we’d have to take care of a lot of different context things which would be very hard to do right. Having it in single structure means we can just switch one pointer and go to different context.
  5. All various EG’s that deal with execution state would be made to work through one global “current execution state” global pointing to the above mega-structure.
  6. We still need new symbol table for each call, so symbol table allocation and the related cache stays. However, we might have a good idea how many variables would each function require (size of CVs might be a good estimate) and could initialize the hashtable for this size. Downside would be that this hash won’t be then usable for other functions. So maybe we’d want to group cached tables by size (hash table implementation has only limited number of real sizes anyway). This should reduce number of reallocs when adding variables to the tables.
  7. Many functions are called repeatedly, but not recursively. Maybe we could reuse once-allocated memory block for each call of the function. The problem of course is to know if the function will be called again and not waste the block if it won’t – so it might be hard to do.

Any other ideas?

Posted in Engine | 1 Comment »