Objects as keys

I’m going to put to vote soon another of my RFCs, namely one about “objects as keys“. So, I want to outline the case for it here and address some criticisms and questions raised while discussing it.

Why we may want it it?

Traditionally, in PHP array keys could be strings or numbers, and this is deeply linked to how PHP hash tables (which store mostly everything in PHP that is a collection) work. And this was mostly enough until recently, when special type objects started popping up – such as GMP numbers. These work very closely to native numbers – i.e. you can add them, multiply or subtract them, etc. However, though they look like numbers, there are things you can do with numbers that you can not do with them. One of these things is using it as an array index.

There’s more – we have proposals to make UString class to represent Unicode strings. There may be more to represent special types of quantities and strings. It would be only natural for these to be able to do something that numbers and strings can do – namely, be array keys.

How to use it right

This idea is not without dangers, and certainly can be abused (channeling my internal Yogi Berra – especially if you’re using it wrong).

For starters, not every object is good for using as an array key. For example, mutable objects almost never are, since if you put something under key X and then it mutates to represent something different from X – where are you going to find it? PHP also doesn’t have very good means to control mutability, at least not just yet. Also, it may not be good to use complex objects as keys, e.g. a tree having 1000 elements rarely makes a good key since it’s not clear what exactly it even means it being a key, and how to really make it so same trees would produce same key but different ones would produce different ones – scanning every element would be quite expensive.

So this feature is mostly good for value objects – i.e. objects that express some simple and usually immutable value. Of course, there may be scenarios where other objects find it useful, but those would be exceptions from a general rule.

Why not just use __toString()?

Of course, it would be very easy to say “if the object has __toString, just convert it to string and we know how to deal with those”. However, there are two problems with this:

  1. Human-readable representation of an object and value used for keys may not be the same one. As I mention in the RFC, most languages that allow object keys, have separate functions, some for technical reasons (i.e. needing number instead of string), some for semantical. I think both of these reasons are valid – some values that objects represent may be more efficiently represented as numbers, and some the developer may want to be different when expecting the human and the engine to look at it.
  2. Using __toString has the reverse side – if we use __toString, every object that has __toString becomes hash-able. But, as I noted above, we may not want every object to work this way – for some, it just makes no sense and may be even dangerous, but __toString may still make sense. It would be better if whoever designs the class explicitly allows to do this. Of course, they can choose to go back to __toString – this door is always open – but they have an option not to.

Why not just use spl_object_hash()?

This function provides the identity of the specific object – but for value objects, its identity and what it represents may not be the same. I.e. do we want two distinct objects both representing number 42 considered different? Sometimes we may want it, but if those are true value objects, we may not. It may be even more complicated if the objects have some data that changes but still represent same values. The best solution here would be to give control to the developer, since the engine can not know what which part of the object means.

And, of course, the point of not being able to control which object can be used this way is the same as above.

Inherent implementation problems and disadvantages

One point that was raised when discussing it is that the objects do not really become the array keys – instead, we “using” them as the keys, but the value derived from them in developer-specified manner is used. So, if you do a foreach loop over such array later, you do not get the original object back. You could maybe reconstitute it from the derived value, if you chose the key representation cleverly, but it won’t be the original one. It is true, and the reason for that is what I said at the beginning – the implementation of PHP hashes. So if we really want that, we’d need different – potentially slower and more complex, though maybe not – hash tables. For myself, this is a much bigger task that I want to take now, and I am not sure if it will ever happen in PHP at all, as the percentage of cases where we need object support is not big, and messing with the mechanism that is used in literally everything in the engine for that may be not smart. I don’t say it can not be done, I just don’t believe it will be actually done in PHP within any reasonable time.

This is why, while I recognize my solution is not the ideal “objects as keys” solution, and the criticism pointing in this direction is valid, I think it still would be a useful feature and better than not providing any support for it while waiting for something that may never happen.

The name

The proposal names the proposed new magic function as __hash. In this name, __ is a given, but the rest is not. __toKey was proposed, and various others. I personally do not have a strong preference, and would be fine with any logical name. As both __hash and __toKey have its merits, I think the best may be to just have it as a voting option and see which one is the most appealing to the majority (provided, of course, the majority would support the proposal as a whole).

Default constructors

Consider the following code:

class Animal {
    protected $what = "nothing";
    function sound() {
        echo get_class($this)." says {$this->what}"; 
    }
}

class Cow extends Animal {
    protected $what = "moo";
    protected $owner;
    public function __construct($owner) {
        $this->owner = $owner;
        // parent::__construct(); (?)
    }
}

$a = new Cow("Old McDonald");
$a->sound();

This code represents a simple class hierarchy. Now let us consider the line marked by (?). Of course we can not call the parent ctor there since we do not have one. But let’s say we refactored the base class and added the parent ctor which does some stuff:

class Animal {
   protected $born;
   public function __construct() {
      $this->born = time();
   }
}

Seemingly, we didn’t do anything wrong here, right? But now our code is broken, since Cow::__construct does not call Animal::__construct. So we should go to every class extending Animal and fix them. The problem here we could not avoid this problem – unless we stick empty ctor into Animal when it doesn’t need it, we can not call it from Animal’s child classes. Sticking empty ctor into every class in case we’d ever want to extend it does not sound like a nice idea. Not being able to add a default ctor (i.e. one not needing any parameters) to a base class is also not good.

So what if we make default ctor always exist? If it’s not defined, calling parent::__construct() would just do exactly nothing. But if we ever implement it, all the child classes will be ready.

In fact, in Java for example it is mandatory to call the parent ctor, and if the class has none the default one is supplied by the language.
PHP does not enforce it, but it is very rarely a good idea not to. Right now, PHP does not allow to do the right thing here, but it should.

Static typing

There is some renewed discussion about introducing static typing in PHP. I just read one very interesting post: The Safyness of Static Typing which I suggest everybody that is interested in this topic should read (and the links there). You may agree or disagree, but it is worth reading and even if you disagree it is worth ensuring you know the answers to the questions raised there, otherwise your disagreement lacks substance. I must admit I liked that post because it agreed with my feelings (not substantiated prior to that by any experimental data besides general experience I’ve acquired in the field) that type safety is not as close to silver bullet as some put it.

Within the context of PHP, I’m not sure if more strict typing (coercive typing is something in between and would require a bit different treatment) would be beneficial. I can see where it could be useful – i.e., for making JIT it probably would be very nice. On the other hand, Javascript has excellent JIT engines, as I have heard, without any additions of strict typing, so it’s not absolutely necessary. With PHP code living in runtime and static analysis tools not being routine part of mainstream development, at least as far as I have seen, I’m not sure addition of strict typing would help in any substantial way. Facebook guys, obviously, disagree – I wonder if they have some data to back it up, i.e. how that worked in practice and especially how “hybrid” model – i.e. having typed and untyped code coexist (that as I understand is what is happening, may be I am wrong here) works out and if it indeed provides better safety and reduced development time?

P.S. oh, and if you want a surefire way to annoy me, please call strict typing “type hinting”. I’m sure in the history of PHP there were examples of worse terminology (“safe mode” comes to mind as one) but that does not excuse this most unfortunate decision to name strictly typed arguments “hinting”.

PHP 5.6 – looking forward

Having taken a look in the past, now it’s time to look into the future, namely 5.6 (PHP 7 is the future future, we’ll get there eventually). So I’d like to make some predictions of what would work well and not so well and then see if it would make sense in two years or turn out completely wrong.

High impact

I expect those things to be really helpful for people going to PHP 5.6:

Constant expressions – the fact that you could not define const FOO = BAR + 1; was annoying for some for a long time. Now that this is allowed I expect people to start using it with gusto.

Variadics – while one can argue variadics are not strictly necessary, as PHP can already accept variable number of args for every function, if you’re going to 5.6 the added value would be enough so you’d probably end up using them instead of func_get_args and friends.

Operator overloading for extensions – the fact that you can sum GMP numbers with + is great, and I think more extensions like this would show up. E.g., for business apps dealing with money ability to work with fractions without precision loss is a must, and right now one has to invent elaborate wrappers to handle it. Having an extension for this would be very nice. Finding a way to transition from integer to GMP when number becomes too big would be a great thing too.
Still not convinced having it in userspace is a great idea, what C++ did to it is kind of scary.

phpdbg – not having gdb for PHP was for a long time one of the major annoyances. I expect to use it a lot.

Low impact

Function and constant importing – this was asked for a long time, but I still have hard time believing a lot of people would do it, since people who need imports usually are doing it in OO way anyway.

Hurdles

OpenSSL becoming strict with regard to peer verification by default may be a problem, especially for intranet apps running on self-signed certs. While this problem is easily fixable and the argument can be made that it should have been like this from the start – too many migrations go on very different paths depending on if it requires changing code/configs or not.

Adoption – again, with 5.5 adoption being still in single digits, I foresee a very slow adoption for 5.6. I don’t know a cure for “good enough” problem and I can understand people that do not want to move from something that already works, but look at the features! Look at the performance! I really hope people would move forward on this quicker.

While 5.4 will always have a special place in my heart, I hope people now staying on 5.2 and 5.3 would jump directly to 5.6 or at least 5.5. The BC delta in 5.5 and 5.6 is much smaller – I think 5.3->5.4 was the highest hurdle recently, and 5.4 to 5.5 or 5.6 should go much smoother.

Anything you like in PHP 5.6 and I forgot to mention? Anything that you foresee may be a problem for migration? Please add in comments. 

5.4 is out!

Since May 2011 we have worked on releasing PHP 5.4, and now it happened. Thanks everybody who helped with it!

PHP 5.4 has some new and exciting features – for some of them, like traits, I have no idea right now how they will work out and what people would do with them. It’d be very interesting to see.
For some of them, I feel they are basic common sense and long overdue in PHP (of course, not everybody may share my opinion ;) – like [‘short’,’array’,’syntax’] or detaching <?= from INI settings. Some were just missing features that we didn’t catch up with before – more fluent syntax, linking objects to closures, etc.
Some things in PHP, as we have come to realize, were clearly mistakes – like register_globals, some were driven by real needs but proved to do more harm than good at the end – like magic quotes and “safe” mode – so we had to lay them to rest.

One of the best things that happened in 5.4 though is not immediately apparent. The engine behind 5.4 is significantly faster and consumes less memory than before. How much faster and how much less? Depends on your application, of course, but from some benchmarks 10-20% speed improvement and 20-30% memory improvement can be expected. Do your own benchmarks and blog about the results! Also would be a good time to ensure your application runs fine on 5.4 – since 5.4 is now the stable version and you have to start using it! :)

Another great things that happened – and is continuing to happen – is that we are finally moving towards more streamlined release process, towards having more regular releases (expect 5.4.1 release cycle to start in late March-early April and 5.4.1 be out in about a month after that) and more organized feature/change proposal process. There’s a wiki where people can post their RFC proposals, we have a voting process and while we may be still working out some fine details of the procedure, I feel we definitely have improved in this regard. It is time to get some organization into the process – we don’t need to create a bureaucracy, but some process is definitely needed and we’re establishing it.

Yet another thing that is happening – PHP project is slowly but surely moving from SVN to Git. This will be a great improvement. Having used Git for the last couple of years, I can clearly see that it can make many things we’re doing in PHP everyday so much easier. Can’t wait for it :) Also makes easy for people to do pull requests and for core devs to merge them, among other things – should make bug resolutions, etc. work faster.

And after catching our breaths a bit and relaxing soon there will be time to think what we do in 5.5 – remember the thing about more regular releases? :) I’ll try to post some thoughts about what I’d want in 5.5 soon.

Shortcuts

Being at OSCON, I’ve attended one good talk about Python oddities, which got me thinking about language syntax in general.

PHP is notorious among scripting languages for it’s verbose syntax – you have to spell out many things that are much shorter in other languages. Some people think it’s very bad that they can’t be “expressive”, meaning writing more clever code with less keystrokes. Sometimes they are right, sometimes they are not. Let’s consider two examples:

From PHP: 5.4 has a new array syntax: ['foo', 'bar'] which is the same as array('foo', bar').

I think it’s a good shortcut – because [] is a common expression for arrays (or structures that work like PHP arrays) in many languages, and it is obvious for most people how it works.

What’s an example a shortcut that isn’t good? Python (version 2) has this syntax for exception handling (this was one of the examples in the talk):

except Foo, Bar:

Now what this means: are we catching two exception types, Foo and Bar, or are we catching exception Foo and assigning it to Bar? The correct answer is the latter – it’s exception Foo assigned to Bar. I think it’s bad shortcut – because it uses a comma – which is common expression for lists and enumeration – to separate the type and the parameter. This leads to people writing things like “except KeyError, IndexError” which doesn’t do what one would expect to.

Pyhton people seem to agree with me, as in Python 3 the syntax has been changed to:

except Foo as Bar:

One could argue it’s not as “expressive” and more verbose – bu it’s definitely much more readable and would lead to less broken code. It’d be even better if they used the word “exception” or try/catch as all the rest of the world does :)

Some people in PHP community think all “shortcuts” are best to be avoided. I think some of them could be useful, provided clarity is not sacrificed and there’s not “too much magic”. I know it’s subjective but my personal criteria is that if it’s not immediately clear what’s going on for a person with reasonable knowledge of the matter – it’s probably too much magic.

On PHP features

When people ask why PHP doesn’t have this or that cool feature which allows to do some tricky mind-bending function and requires a dozen pages of manual to describe it, I find this quote from Brian W. Kernighan is very appropriate:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Corollary of that is that some clever tricks are better left not done and some cool things may not look that cool when you have to deal with the consequences.

That being said, I think there are many cool things yet to be done in PHP. So, I’ve posted a poll on Programmers.StackExchange site about what people want to see in PHP. I know people regularly post on PHP lists about that, etc. but I think SE crowd would be somewhat different from what is found on PHP internal lists, so it would be interesting to see what people’s wishlists for PHP are.

If you happen to want to add your 2c there or just vote on stuff – feel welcome. Absolutely no promises it would have any weight, influence anything (except maybe my own opinion about what people want in PHP), etc. of course.