Objects as keys

I’m going to put to vote soon another of my RFCs, namely one about “objects as keys“. So, I want to outline the case for it here and address some criticisms and questions raised while discussing it.

Why we may want it it?

Traditionally, in PHP array keys could be strings or numbers, and this is deeply linked to how PHP hash tables (which store mostly everything in PHP that is a collection) work. And this was mostly enough until recently, when special type objects started popping up – such as GMP numbers. These work very closely to native numbers – i.e. you can add them, multiply or subtract them, etc. However, though they look like numbers, there are things you can do with numbers that you can not do with them. One of these things is using it as an array index.

There’s more – we have proposals to make UString class to represent Unicode strings. There may be more to represent special types of quantities and strings. It would be only natural for these to be able to do something that numbers and strings can do – namely, be array keys.

How to use it right

This idea is not without dangers, and certainly can be abused (channeling my internal Yogi Berra – especially if you’re using it wrong).

For starters, not every object is good for using as an array key. For example, mutable objects almost never are, since if you put something under key X and then it mutates to represent something different from X – where are you going to find it? PHP also doesn’t have very good means to control mutability, at least not just yet. Also, it may not be good to use complex objects as keys, e.g. a tree having 1000 elements rarely makes a good key since it’s not clear what exactly it even means it being a key, and how to really make it so same trees would produce same key but different ones would produce different ones – scanning every element would be quite expensive.

So this feature is mostly good for value objects – i.e. objects that express some simple and usually immutable value. Of course, there may be scenarios where other objects find it useful, but those would be exceptions from a general rule.

Why not just use __toString()?

Of course, it would be very easy to say “if the object has __toString, just convert it to string and we know how to deal with those”. However, there are two problems with this:

  1. Human-readable representation of an object and value used for keys may not be the same one. As I mention in the RFC, most languages that allow object keys, have separate functions, some for technical reasons (i.e. needing number instead of string), some for semantical. I think both of these reasons are valid – some values that objects represent may be more efficiently represented as numbers, and some the developer may want to be different when expecting the human and the engine to look at it.
  2. Using __toString has the reverse side – if we use __toString, every object that has __toString becomes hash-able. But, as I noted above, we may not want every object to work this way – for some, it just makes no sense and may be even dangerous, but __toString may still make sense. It would be better if whoever designs the class explicitly allows to do this. Of course, they can choose to go back to __toString – this door is always open – but they have an option not to.

Why not just use spl_object_hash()?

This function provides the identity of the specific object – but for value objects, its identity and what it represents may not be the same. I.e. do we want two distinct objects both representing number 42 considered different? Sometimes we may want it, but if those are true value objects, we may not. It may be even more complicated if the objects have some data that changes but still represent same values. The best solution here would be to give control to the developer, since the engine can not know what which part of the object means.

And, of course, the point of not being able to control which object can be used this way is the same as above.

Inherent implementation problems and disadvantages

One point that was raised when discussing it is that the objects do not really become the array keys – instead, we “using” them as the keys, but the value derived from them in developer-specified manner is used. So, if you do a foreach loop over such array later, you do not get the original object back. You could maybe reconstitute it from the derived value, if you chose the key representation cleverly, but it won’t be the original one. It is true, and the reason for that is what I said at the beginning – the implementation of PHP hashes. So if we really want that, we’d need different – potentially slower and more complex, though maybe not – hash tables. For myself, this is a much bigger task that I want to take now, and I am not sure if it will ever happen in PHP at all, as the percentage of cases where we need object support is not big, and messing with the mechanism that is used in literally everything in the engine for that may be not smart. I don’t say it can not be done, I just don’t believe it will be actually done in PHP within any reasonable time.

This is why, while I recognize my solution is not the ideal “objects as keys” solution, and the criticism pointing in this direction is valid, I think it still would be a useful feature and better than not providing any support for it while waiting for something that may never happen.

The name

The proposal names the proposed new magic function as __hash. In this name, __ is a given, but the rest is not. __toKey was proposed, and various others. I personally do not have a strong preference, and would be fine with any logical name. As both __hash and __toKey have its merits, I think the best may be to just have it as a voting option and see which one is the most appealing to the majority (provided, of course, the majority would support the proposal as a whole).