PHP 10.0 Blog

What if…

Archive for the ‘Engine’ Category

duck operator

Posted by Stas on June 5, 2008

Crazy idea for today – operator to check conformance to specific interface without actually implementing it. Why one would want that?
Well, if you are into duck typing style of programming, it may be interesting for you to have an object that implements certain set of functions, but not necessary declares it at class definition. Languages like Smalltalk do it all day along, so why PHP couldn’t? The idea is it looks like this:

interface Cow {
  function moo();
  function eatGrass();
/* somewhere else */
class MooingGrassEater {
  function moo() {/*stuff */}
  function eatGrass() {/*stuff */}
  /*stuff */
/* somewhere else */
function CowConsumer($classname) {
$foo = new $classname();
if($foo implements Cow) {
  echo "Behold the cow:";
} else {
  echo "$classname is not a cow!";

implements here is our duck operator. Note that unlike instanceof, no formal relationship is required, but only practical implementation. So another name would be “common law marriage operator” ;)

Of course, this one would be anathema to “strict OO” camp, so if you subscribe to that, just ignore this post :)

Two challenges to this idea are:

  1. __call() – we have no way to know what __call does. So either we ignore it or say “ok, __call does everything”. I’d go for the latter.
  2. Performance. To check duck implementation one basically would have to match method lists, which amounts to number of is_callable calls equal to the number of methods in interface being checked.

Actually, PHP uses this style sometimes – see, for example, user defined streams. But there’s no nice way to work with it from the consumer side.

Posted in Engine | Tagged: , , | 15 Comments »

the secret of PHP

Posted by Stas on May 21, 2008

So, another “PHP sucks” post, this time from Jeff Atwood. He actually ends up even kind of praising PHP, surprised by its success. I have a couple of thoughts on that topic too.

First, people really need to stop reading something on PHP written somewhere in 2005 (probably about experiences that happened in 2001) and apply it to PHP as it is now, without even checking around for current trends. It’s as if people would dig up books from middle ages saying that there are only seven metals in existence or debating about phlogiston, and would use it speaking about the modern chemistry. Come on!

Then the next thing apparently wrong with PHP is too many functions. Right. Since when? Since when having a lot of functions is a problem? Does it hurt anybody? Does it make writing PHP code harder? Does it make programmer less successful in achieving his goals?
About keywords I could kind of understand – OK, a lot to remember (though I didn’t see anybody really having trouble to remember such complicated keywords as “while”, “if”, “class” or “public”) and it takes out some good English words that could be used as function/method names to confuse the enemy (who wouldn’t want to have function named endforeach() or static(), not to mention function()? too bad those are not available!). But complaining there’s too many actual functions that allow you to do real useful stuff? That is the thing that is bothering people? That is what scares people away from using the language “for years”?

The next beef with PHP is that people write sucky code on it. No, really, they do? Must be something really wrong with this language. It’s not like people write mind-bogglingly sucky code on every other “good” language on the planet. But I get it. The intent was – PHP makes easy to write sucky code. Yes, this is true. As true as “Porsche 997 makes it easy to drive at 100mph into a brick wall”.  PHP makes it easy to write various kinds of code – and if 90% of code written is sucky, then 90% of PHP code would be sucky. But my experience says quality of the production code almost never has much to do with the language, but only with the culture – organizational and personal, and with choosing right ways to do the job. The rest is just bad statistics in play. Like “I know 7-year-old writing websites, and his PHP code sucks”. I bet his Haskell code rules though ;)

That’s not to say PHP couldn’t use improvement. It could. And it does, actually – and there’s enough room for improvement still, in many areas. But it probably would never satisfy purists. It’s practical. Maybe it doesn’t allow you to write whole programs in one line of uncomprehensible character soup or play with high-level math theory concepts, but it allows people to write web applications. So they do – so where’s the surprise when one morning somebody wakes up and discovers there’s a ton of web applications around and they are written in PHP? :)

P.S. I wish for every 50 “PHP sucks” blogs people would write one good RFC.

Posted in Engine, Functions, PHP | Tagged: , | 18 Comments »

Namespaces FAQ

Posted by Stas on August 17, 2007

We now have an implementation of namespaces in PHP 6 HEAD, so here’s a short FAQ about how they work for those that are too laz^H^H^Hbusy to read the whole README.namespaces.

Q. Why PHP needs namespaces?
A. Because long names like PEAR_Form_Loader_Validate_Table_Element_Validator_Exception are really tiresome.

Q. What is the main goal of the namespace implementation?
A. To solve the problem above.

Q. What “namespace X::Y::Z” means?
A: 1. All class/function/method names are prefixed with X::Y::Z.
2. All class/function/method names are resolved first against X::Y::Z.

Q. What “import X::Y::Z as Foo” means?
A. Every time there’s Foo as a class/function name or prefix to the name, it really means X::Y::Z

Q. What “import X::Y::Z” means?
A. “import X::Y::Z as Z”, then see above.

Q. What “import Foo” means?
A. Nothing.

Q. What is the scope of namespace and import?
A. Current file.

Q. Can same namespace be used in multiple files?
A. Yes.

Q. Is there any relation between namespaces X::Y::Z and X::Y?
A. Only in programmer’s mind.

Q. How do I import all classes from namespace X::Y::Z into global space?
A. You don’t, since it brings back the global space pollution problem.
Instead, you import X::Y::Z and then prefix your classes with Z::.

Q. But doesn’t it mean I will still have long names?
A. Not longer then three elements: Namespace::Class::Element.

Q. Why it is not implemented like in <insert your favorite language here>?
A. Because PHP is not <insert your favorite language here> ;)

Also we are considering to add one more feature to namespaces – ability to declare a namespaced constant – i.e. constant named Name::Space::NAME – with same resolution rules like classes – with const operator. Consequently it may be also possible to have const NAME = ‘value’ in global context, meaning the same as define(‘NAME’, ‘value’).

Also note namespaces are still work in progress, so it may happen it would be changed a lot when it’s released.

Posted in Engine | Tagged: , , , | 12 Comments »

Namespaces – can we keep it simple?

Posted by Stas on July 5, 2007

Dmitry Stogov has published the patch on PHP-internals implementing the simple namespace model for PHP that I co-authored. I urge everybody to please take a look and discuss it – best on the internals list since the audience is bigger, but comments here are welcome too.

The main idea of the proposal is to attack one target and this target only – the Super_Long_Really_Annoying_Enormous_Class_Names that lately became the bane of big project developer. All other things are considered secondary to this goal – no attempt to make some different include model, packaging model, etc. This approach, in my opinion, allows to greatly simplify the concept and the mechanics involved. It allows to reduce most of the work to simple text transformation, without any need to create complex hierarchies with obscure rules.

Of course, there are some edge cases still, but we aim to make frequently used cases easy and converting existing code to this model easier, while accepting that some edge cases might be uncomfortable. I am also sure that there are scenarios of which we did not think – and you are welcome to point those out.

One thing I feel might be missing from the current patch is the runtime resolution of namespaced names – currently if you use variable (new $classname) it has to contain the full name, possibly with use of __NAMESPACE__ constant containing current namespace name. I am not sure if we need runtime – it adds some convenience, but requires the engine to do much more work.

Posted in Engine | 21 Comments »

Kill resources

Posted by Stas on May 16, 2007

I wonder why we still have resource type in PHP?

Since 5.x, objects are perfectly capable on encapsulating any void * transparently (there’s at least 2 Java bridges doing that, for example) and of course using objects doesn’t force you to use OO syntax – i.e. you can do fread($foo) with $foo being either resource or object equally well. We can see ext/unicode/collator.c in PHP 6 as one example of dual interface also (I’m sure there are more, I just had to pick one). So objects as I see it can do anything resources can do. And much more – you could extend it (had we had file as object and not resource, streams probably would be much easier to implement), serialize it (provided correct methods of course), etc., etc.

Also, with some effort I think it would be possible to modify all resource-using code to use objects transparently – so all the scripts except for those that actually check the type to be “resource” (why one would do that anyway?) will keep working.

So, maybe it’s time to let the resource type go? Does anybody see any reason why resources are better than objects?

Posted in Engine | 5 Comments »

Improving executor

Posted by Stas on April 10, 2007

Calling function in PHP is not cheap. One of the reasons for that executor has a lot of things to take care of when calling function – a bunch of globals, execution state, symbol tables, etc., etc. And we do a lot of allocations and reallocations for them. Also since a number of these things live on the stack – on deep recursion the stack is depleted. So I was thinking how could we improve it?

  1. First step could be to unite all execution-state related variables into single structure. In compile-time we know how many Ts, CVs, etc. we might need, so this is fixed. Size of other structures is known too, so we know overall memory size for every function, and we can automatically allocate execution data on the function start. Which means no reallocs, only one allocation per execution cycle and probably even better memory usage due to the reuse of the memory blocks for frequently called functions.
  2. Right now some of the execution data is kept in a kind of stack. But we don’t really need it to be stack – as I see, pointer to previous structure is enough. Actually, when we doing backtraces stack even is a kind of problem since we need to figure out each time where we stand and where functions begin and end.
  3. We do need some kind of stack for function arguments and function-call-in-progress information. However, this stack does not need to be global – we do not use this information beyond current function call (counting functions called while calculating parameters for current function call). Thus, we could just make each function keep its own stack, and since we know the maximum function call depth for the code of any given user function at compile time, this stack can have fixed size too and fit into the structure in (1).
  4. Once we have all call information inside the single structure, we could rewrite execute() to use loop instead of recursive call, thus dramatically reducing stack requirements and probably speeding up the execution loop. Internal function calls would still use stack, of course – because that’s how C works :)
    Actually, we might be able to do it before but then we’d have to take care of a lot of different context things which would be very hard to do right. Having it in single structure means we can just switch one pointer and go to different context.
  5. All various EG’s that deal with execution state would be made to work through one global “current execution state” global pointing to the above mega-structure.
  6. We still need new symbol table for each call, so symbol table allocation and the related cache stays. However, we might have a good idea how many variables would each function require (size of CVs might be a good estimate) and could initialize the hashtable for this size. Downside would be that this hash won’t be then usable for other functions. So maybe we’d want to group cached tables by size (hash table implementation has only limited number of real sizes anyway). This should reduce number of reallocs when adding variables to the tables.
  7. Many functions are called repeatedly, but not recursively. Maybe we could reuse once-allocated memory block for each call of the function. The problem of course is to know if the function will be called again and not waste the block if it won’t – so it might be hard to do.

Any other ideas?

Posted in Engine | 1 Comment »

static __call

Posted by Stas on March 23, 2007

As everybody knows, one of very nice OO features PHP 5 has is – if method that is not defined is called on an object of a class, the class could define catch-all method named __call and thus route this method call in any way the developer wants, transparent to the user. This allows very flexible way of defining interfaces between classes – even between entities that their interface might be not known to the developer of the class, such as SOAP services. Very useful indeed.

However, we can not do this on a class itself – we couldn’t define static __call and have it route class (static) method calls the same way regular __call routes the object method calls. I wonder maybe we should have it. Along with all other __methods for overloading stuff, of course. We couldn’t probably name it __call since we already have one call but something like __scall could work.

Posted in Engine, Functions | 11 Comments »

Shuffling methods

Posted by Stas on January 16, 2007

I’m writing some quite complicated class structure in PHP, and I have realised there’s one feature I am missing in PHP – I need to be able to define an interface with default method implementation. Why not class? But of course because I can not inherit two classes. And I don’t really want multiple inheritance with all its problems – I want something much more restricted. Let’s see an example.

Let’s say I am definining interface “Kickable” having method kickMe, which describes an object that can be kicked. Most of the kickable classes would have this wonderful code:

function kickMe() {
    echo "Oy-vey!";

So if I have 20 classes which are kickable, I’d probably have to write this 18 times (2 classes would say something different when kicked). This is boring. I’d want somehow to say “OK, here’s a function, here’s the default, if you want – reimplement it”. If you ask what happens when I define two interfaces with defaults for kickMe – well, then either it would be an error or the latest implemented would win. It is not worse than having to implement two interfaces with same function names and different semantics anyway, and that is supported right now.

Alternatively, another approach would work – if I could “steal” a method implementation from a brother class (or, more nicely, delegate my function to it). I.e., suppose I have the 18 classes mentioned above, and I define the function once and in other classes I say something like:

function kickMe same as DefaultKicker::kickMe;

That would be fun – I can write much less code and additionally if I’d want to internationalize it so it could say “Oh my!” and “O-la-la!” – I could do it just once.

There are ways to do it now – I can shuffle functions with runkit, but that’s runtime – not nice (and runkit is not in default PHP anyway). I could also implement the default behaviour separately and call it on need – which would solve the consistency problem, but would solve the boredom problem only partially and would not solve the problem that external implementation couldn’t access protected class data.

I could probably use some design pattern to do it, but I didn’t find anything that does exactly that (decorator looks similar, but it assembles at runtime, and I need design-time).

Posted in Engine | 14 Comments »

php -T

Posted by Stas on December 8, 2006

Perl and Ruby have variable tainting. Maybe PHP should have it too?

Approaches for Perl and Ruby are somewhat different. One difference is that in Perl you have some operations that untaint variables automagically, while in Ruby you have always to explicitly declare a variable non tainted anymore.
Also, Ruby has different levels of protection, so tainging can be light nuisance on the low level or full sandbox mode on a high level. That’s another interesting thing ot explore – using tainting to sandbox scripts. Though in PHP due to the fact that all runtime data are isolated per-request and the engine is built to support multiple requests, it might be easier to implement sandboxing in a different way, but the Ruby approach is interesting to explore.

Of course, due to the multitude of functions in PHP the approach of “mark unsafe functions” which Ruby seems to use is prone to the same failures as the safe mode – there’s always at least one function that isn’t properly restricted – so if one wants to implement proper tainting or sandboxing, it probably should be based on more generic approach that would account for existance of functions unknown in design time. It’s still not 100% as carefully miswritten extension can do anything the OS permissions allow C code to do, but some restrictions might still be done – e.g., on some security level function calls to functions not marked “safe for tainted data” with tainted arguments might be prohibited by the engine. That’d probably break 99% of the existing code, so it would come at cost in any case. But the benefit would be that once the application passes such test, we can reasonably claim certain level of security – not 100% security, but at least decent level of protection for people that do not remember to validate their data properly.

This can also be connected to zval custom info idea, as taitning flag is a good example of the custom info.

Posted in Engine, Functions | 2 Comments »


Posted by Stas on November 24, 2006

When compiling PHP code, there are a lot of constants that are used for lookups and are stored in opcodes. I wonder why the engine won’t precalculate hashes and store them? OK, PHP 5.1 has CVs, which does it for variables. But there are also classes, functions, constants. This would, of course, make some parts of the engine code more complicated since we may deal with non-constant call too, in which case the hash won’t be pre-calculated, but it still can be done I think. On top of that, due to the fact that classes and functions are case-insensitive, a lot of lowercasing is going on too, which might be saved too.

Posted in Engine | 1 Comment »


Get every new post delivered to your Inbox.