When compiling PHP code, there are a lot of constants that are used for lookups and are stored in opcodes. I wonder why the engine won’t precalculate hashes and store them? OK, PHP 5.1 has CVs, which does it for variables. But there are also classes, functions, constants. This would, of course, make some parts of the engine code more complicated since we may deal with non-constant call too, in which case the hash won’t be pre-calculated, but it still can be done I think. On top of that, due to the fact that classes and functions are case-insensitive, a lot of lowercasing is going on too, which might be saved too.


More inlining

Speaking of inlining (and here), why not an ability to inline virtually any function? Performance benefits from inlining simple functions might be significant, since function call in PHP is not cheap. We’d have some potential problems there:

  1. Variable scoping – we don’t want function variables to mess with our scope, so we’d probably rename them or something.
  2. Renaming variables of course is dangerous with functions that mess with current scope like extract() – so maybe such code can’t be inlined.
  3. Then we might get a problem if function messing with current scope is called indirectly so we can’t really know. Banning indirect function calls for inlining isn’t much of a problem too – I don’t think it’s used that much.
  4. Also, there might be functions that care about their own scope and having variables from external scope would drive them crazy even if their own variables are OK. I don’t know any such code but maybe it exists. So then we probably would have to give developer means to ensure some functions are never inlined.
  5. And then some may use end-of-scope for destruction of variables that have dtors, so when we’d clean up these variables? We can of course generate “free” opcodes at the end of the inlined call, but it’s be probably slowing the things down…

So, it’s not as easy as one might like it to be. But interesting, in my opinion.

Custom tags

I wonder how useful it would be to have PHP parser be able to define custom tags for HTML and be able to bind various functions (or classes) to them. I’m sure somebody must be doing something like that among dozens of frameworks and template systems. However, to do it in a really nice, fast and clean way would require support from the engine parser. It might be nice as a kind of shortcut template system. It might also make much easier to use PHP with various visual design tools.

On the other hand, ColdFusion does that, and ASP does that, and not everybody likes it. But ColdFusion takes it to the extreme – language is custom tags there (I’m not a CF programmer and never played one on TV, so sorry if I just said something totally wrong and silly) , I am thinking of it just as an addition. So maybe it’s still not that stupid 🙂

Object ID

I was writing this post about objects in PHP needing unique ID, but now 5.2 is out and it has spl_object_hash() function which does it. Too bad it’s not documented in the manual, but besides that it seems to be just what I wanted.  But I’d hate to waste all the effort of writing it, so I’ll make it about why I think it’s a good idea.

Objects in PHP 5 are perceived by the engine as a (handle, handler table) pair. Which effectively means that each object has an unique ID (note that this ID has mostly nothing to do with pointers, etc. – for example, more than one zval may have some handle/handler pair, meaning they refer to the same object). You can sort of see this ID if you do var_dump() on an object – you’d see something like:

object(stdClass)#1 (0) {

#1 there is the handle of the object. Handler table is the same for all objects created from user-defined classes, though some objects – like Java or COM objects – would have different one.

However, before 5.2 there was no way (except for the perverted way of capturing var_dump() output and parsing it, I guess 🙂 to access this object ID – and even more interesting the full object ID – from a PHP script. This access might be very useful for things like serialization of complex object structures, RPC and such – when it’s important to know if two variables point to the same object and/or fetch objects by some unique ID.

One could, of course, create “base object” class which provides custom ID. But since the engine already has one, it’s better to use it.

Only thing I have doubts about is using of MD5 in this  function. MD5 is somewhat expensive function, if there’s some system which uses it a lot it might be too expensive.  Fortunately, since these IDs are transient, we could easily replace it if it proves a problem.