Objects as keys

I’m going to put to vote soon another of my RFCs, namely one about “objects as keys“. So, I want to outline the case for it here and address some criticisms and questions raised while discussing it.

Why we may want it it?

Traditionally, in PHP array keys could be strings or numbers, and this is deeply linked to how PHP hash tables (which store mostly everything in PHP that is a collection) work. And this was mostly enough until recently, when special type objects started popping up – such as GMP numbers. These work very closely to native numbers – i.e. you can add them, multiply or subtract them, etc. However, though they look like numbers, there are things you can do with numbers that you can not do with them. One of these things is using it as an array index.

There’s more – we have proposals to make UString class to represent Unicode strings. There may be more to represent special types of quantities and strings. It would be only natural for these to be able to do something that numbers and strings can do – namely, be array keys.

How to use it right

This idea is not without dangers, and certainly can be abused (channeling my internal Yogi Berra – especially if you’re using it wrong).

For starters, not every object is good for using as an array key. For example, mutable objects almost never are, since if you put something under key X and then it mutates to represent something different from X – where are you going to find it? PHP also doesn’t have very good means to control mutability, at least not just yet. Also, it may not be good to use complex objects as keys, e.g. a tree having 1000 elements rarely makes a good key since it’s not clear what exactly it even means it being a key, and how to really make it so same trees would produce same key but different ones would produce different ones – scanning every element would be quite expensive.

So this feature is mostly good for value objects – i.e. objects that express some simple and usually immutable value. Of course, there may be scenarios where other objects find it useful, but those would be exceptions from a general rule.

Why not just use __toString()?

Of course, it would be very easy to say “if the object has __toString, just convert it to string and we know how to deal with those”. However, there are two problems with this:

  1. Human-readable representation of an object and value used for keys may not be the same one. As I mention in the RFC, most languages that allow object keys, have separate functions, some for technical reasons (i.e. needing number instead of string), some for semantical. I think both of these reasons are valid – some values that objects represent may be more efficiently represented as numbers, and some the developer may want to be different when expecting the human and the engine to look at it.
  2. Using __toString has the reverse side – if we use __toString, every object that has __toString becomes hash-able. But, as I noted above, we may not want every object to work this way – for some, it just makes no sense and may be even dangerous, but __toString may still make sense. It would be better if whoever designs the class explicitly allows to do this. Of course, they can choose to go back to __toString – this door is always open – but they have an option not to.

Why not just use spl_object_hash()?

This function provides the identity of the specific object – but for value objects, its identity and what it represents may not be the same. I.e. do we want two distinct objects both representing number 42 considered different? Sometimes we may want it, but if those are true value objects, we may not. It may be even more complicated if the objects have some data that changes but still represent same values. The best solution here would be to give control to the developer, since the engine can not know what which part of the object means.

And, of course, the point of not being able to control which object can be used this way is the same as above.

Inherent implementation problems and disadvantages

One point that was raised when discussing it is that the objects do not really become the array keys – instead, we “using” them as the keys, but the value derived from them in developer-specified manner is used. So, if you do a foreach loop over such array later, you do not get the original object back. You could maybe reconstitute it from the derived value, if you chose the key representation cleverly, but it won’t be the original one. It is true, and the reason for that is what I said at the beginning – the implementation of PHP hashes. So if we really want that, we’d need different – potentially slower and more complex, though maybe not – hash tables. For myself, this is a much bigger task that I want to take now, and I am not sure if it will ever happen in PHP at all, as the percentage of cases where we need object support is not big, and messing with the mechanism that is used in literally everything in the engine for that may be not smart. I don’t say it can not be done, I just don’t believe it will be actually done in PHP within any reasonable time.

This is why, while I recognize my solution is not the ideal “objects as keys” solution, and the criticism pointing in this direction is valid, I think it still would be a useful feature and better than not providing any support for it while waiting for something that may never happen.

The name

The proposal names the proposed new magic function as __hash. In this name, __ is a given, but the rest is not. __toKey was proposed, and various others. I personally do not have a strong preference, and would be fine with any logical name. As both __hash and __toKey have its merits, I think the best may be to just have it as a voting option and see which one is the most appealing to the majority (provided, of course, the majority would support the proposal as a whole).

Default constructors

Consider the following code:

class Animal {
    protected $what = "nothing";
    function sound() {
        echo get_class($this)." says {$this->what}"; 
    }
}

class Cow extends Animal {
    protected $what = "moo";
    protected $owner;
    public function __construct($owner) {
        $this->owner = $owner;
        // parent::__construct(); (?)
    }
}

$a = new Cow("Old McDonald");
$a->sound();

This code represents a simple class hierarchy. Now let us consider the line marked by (?). Of course we can not call the parent ctor there since we do not have one. But let’s say we refactored the base class and added the parent ctor which does some stuff:

class Animal {
   protected $born;
   public function __construct() {
      $this->born = time();
   }
}

Seemingly, we didn’t do anything wrong here, right? But now our code is broken, since Cow::__construct does not call Animal::__construct. So we should go to every class extending Animal and fix them. The problem here we could not avoid this problem – unless we stick empty ctor into Animal when it doesn’t need it, we can not call it from Animal’s child classes. Sticking empty ctor into every class in case we’d ever want to extend it does not sound like a nice idea. Not being able to add a default ctor (i.e. one not needing any parameters) to a base class is also not good.

So what if we make default ctor always exist? If it’s not defined, calling parent::__construct() would just do exactly nothing. But if we ever implement it, all the child classes will be ready.

In fact, in Java for example it is mandatory to call the parent ctor, and if the class has none the default one is supplied by the language.
PHP does not enforce it, but it is very rarely a good idea not to. Right now, PHP does not allow to do the right thing here, but it should.

unserialize() and being practical

I have recently revived my “filtered unserialize()” RFC and I plan to put it to vote today. Before I do that, I’d like to outline the arguments on why I think it is a good thing and put it in a somewhat larger context.

It is known that using unserialize() on outside data can lead to trouble unless you are very careful. Which in projects large enough usually means “always”, since practically you rarely can predict all interactions amongst a million lines of code. So, what can we do?

Of course, the first thing would be to never use unserialize() in this context, and this means no problem, right? However, this approach has the following issues:

  1. It goes against what is natural for people (using PHP native serialization mechanisms) to do and what is widely done in the field. Usually when you try to work against what is natural for people to do, it is an uphill battle where losses are much more frequent than wins. Doing the right thing should be easy, and if it is not so, then the chance that right thing is not done raises accordingly. From that perspective, anything that makes doing the right thing easier is a benefit.
  2. There is no other mechanism which matches serialize() by capability but does not have its issues. Yes, I know in many cases data being serialized is simple enough so JSON or something akin to it would suffice. But sometimes it may not, and in that case we need some solution too. Let’s say we said using JSON is a best practice. However, let’s say one finds a rare corner case where it is not enough. What would we offer in that case? If we do not provide any solution, people would do homebrew solutions, and many of these will be done wrong.
  3. Contexts change, and what were internal context before may suddenly become exposed, and then may be in for an expensive refactoring effort if no other solution is available.

So that is why I think we should have a middle ground between “never use unserialize() on external data and if you do, you’re going to hell and we’re not going to talk to a sinner like you until you repent and rewrite all your code” and “let’s rewrite PHP library functions in PHP because that’s what it takes for our code to work”. I think it is a practical solution which allows your code to be more predictable (i.e, less prone to security issues) while allowing you to work with your code as it is and not requiring extensive rewrites.

Is this a security measure? I removed the reference “security” from the RFC title because I think it has lead the discussion in a wrong direction. Yes, it does not provide perfect security, and yes, you should not rely only on that for security. Security, much like ogres and onions, has layers. So this is trying to provide one more layer – in case that is what you need. I think it improves security but I’d much rather concentrate on the useful options that it adds to the programmer’s toolkit than on semantics of the term “security” and its implications.

Static typing

There is some renewed discussion about introducing static typing in PHP. I just read one very interesting post: The Safyness of Static Typing which I suggest everybody that is interested in this topic should read (and the links there). You may agree or disagree, but it is worth reading and even if you disagree it is worth ensuring you know the answers to the questions raised there, otherwise your disagreement lacks substance. I must admit I liked that post because it agreed with my feelings (not substantiated prior to that by any experimental data besides general experience I’ve acquired in the field) that type safety is not as close to silver bullet as some put it.

Within the context of PHP, I’m not sure if more strict typing (coercive typing is something in between and would require a bit different treatment) would be beneficial. I can see where it could be useful – i.e., for making JIT it probably would be very nice. On the other hand, Javascript has excellent JIT engines, as I have heard, without any additions of strict typing, so it’s not absolutely necessary. With PHP code living in runtime and static analysis tools not being routine part of mainstream development, at least as far as I have seen, I’m not sure addition of strict typing would help in any substantial way. Facebook guys, obviously, disagree – I wonder if they have some data to back it up, i.e. how that worked in practice and especially how “hybrid” model – i.e. having typed and untyped code coexist (that as I understand is what is happening, may be I am wrong here) works out and if it indeed provides better safety and reduced development time?

P.S. oh, and if you want a surefire way to annoy me, please call strict typing “type hinting”. I’m sure in the history of PHP there were examples of worse terminology (“safe mode” comes to mind as one) but that does not excuse this most unfortunate decision to name strictly typed arguments “hinting”.

LinkedInSecurity

This is an uncharacteristically non-PHP post, but I thought it may interest the audience anyway, and this is as good place as any to have it. So the TLDR of this post is that I’ve recently had an interaction with certain security issue in LinkedIn, this issue is still there, LinkedIn is not inclined to fix it and you may be affected.

The Story

All names (except, obviously, LinkedIn) in the story has been changed to protect the privacy, but refer to real people, entities and events.
The story of discovering this issue begun when one morning I have woken up and found in my mailbox a message saying “here’s the link to reset your password” from LinkedIn. As I have not reset my password on LinkedIn, I was somewhat surprised, but thought – OK, maybe somebody is trying to play tricks with my account, I’m pretty sure this would go nowhere. Then, as my brain was waking up, I looked at the email closely and discovered two things:

  1. This email has not my name, by the name of my colleague, let’s call him B., at the company, let’s call it Westeros Inc.
  2. The email was not sent to me directly, neither it was sent to B. directly, instead it was sent to an internal company mailing list goldcloaks@westerosinc.com.

I didn’t know what to make out of it but decided maybe B. copy-pasted wrong address to some field in LinkedIn.
Later the same day, talking to B. and other coworkers, I have mentioned this event. B. said that he indeed reset his LI password recently, but he never added the goldcloaks list to LinkedIn. I’ve started to get suspicious and asked how then I’ve got his password reset email? He didn’t know. So we (myself and B.) did an experiment:

We went to LinkedIn, logged out and clicked “forgot password” on the login form. Then we entered the address of the goldcloaks@westerosinc.com and in a couple of seconds, I’ve got the password reset link, with B.’s name on it. Clicking on that link, I’ve got a form to reset the password (no additional questions like what’s my favorite pokemon) and after another click I’ve got the email saying “B., your password was successfully reset“. I used the new password to log in, but then I was stopped by the two-factor verification. Which means two things: 1) password change worked, since 2-factor kicks in only when password is right and 2) B. is a smart man and has protected his account against password thieves. I had to ask him for the code – now that I have his account’s password, this was the only way to give him the control back. After getting the code, I could successfully log into his account and could see all his deepest secrets (which I didn’t) or return the control back to him (which I did). Before that, we verified that goldcloaks@westerosinc.com is indeed in B.’s list of account’s emails.

Then I decided to see how comes the goldcloaks list ended up in B.’s email list. I went to my own email list, and, surprisingly, discovered that in my own list, among my regular emails, there is another mailing list, maesters@westerosinc.com, which I definitely did not ever add there and had zero reason to. I asked other people sitting around in the office to check their lists and they too have discovered a couple of extra emails, added by some mysterious way, in their profiles.

The Analysis

Basing on these discoveries, I have arrived at the following conclusions:

1. There is a way, currently unknown to me, to add a group mailing list to one’s profile on LinkedIn, without their explicit consent (at least without them knowing that this is what they consented to).
2. LinkedIn accepts this group list email and any non-primary email as an email to send password reset requests too.
3. Reading emails from this address is the only thing needed to reset the password – even if 2-factor auth is enabled. With 2-factor auth, you will not be able to access the account after the password has been reset (unless you find a way to cheat there, I did not try) but you will be able to reset the password.
4. For the majority of people asked, LinkedIn password emails to goldcloaks@westerosinc.com ended up in a spam folder, which means the victim of the shenanigans may not even notice what happened.

This looked like a security issue, so I have written up the whole story (in a bit less colorful words than here) to security@linkedin.com and went back to work, expecting the email from LinkedIn with heartfelt thanks and promises of speedy fix implementation.

The Security Response

Of course, that is not what happened. Instead, what happened that I have got an answer from some very helpful individual from frontline support, asking me for “detailed information about your problem and if you think it might help, attach a screenshot, too“. As I have just spent significant time on composing big encrypted email full of details, I was a bit confused as to which details I was missing and where screenshots may be useful there, but I have not relented at first and wrote second explanation of the issue. The response was:

1. LinkedIn support took the extraordinary security measure of logging me out of all my current sessions with LinkedIn.
2. They advised me not to write down my password in publicly accessible places and suggested that if I continue to leave my computer sitting around in public places without logging out, bad things may happen to my account. My sincerest pleas that such thing never happened and the problem I am talking about is not because I forgot my laptop in a pub while being drunk (and so, apparently, did my coworkers) were met with utter disbelief. They also instructed me to not use my LinkedIn password on other sites and gave me a full page of very useful boilerplate password security advise, as prudent as having no relation to the case being discussed.
3. They assured me that my account was not compromised (which I never implied) and my password is safe.
4. They assured me that “The only way to add an email into an account is via the settings after logging in.”

By that time I was sure nobody at LinkedIn is going to believe me there’s a problem (beyond my implied propensity to leaving my laptop around and thus letting strangers add emails to my LinkedIn account) so I decided I’ve done my responsible disclosure part and should not spend more time on it. However, then I’ve got another email from LinkedIn stating this:

Sometimes, when a member accepts an invitation to connect that was sent to an email distribution list, that list becomes associated with the member’s account.

Please be assured that no one on the distribution list would be able to use the password reset link to access your account unless they knew both your email address and your password.

The first part, of course, completely belies the claim that “The only way to add an email into an account is via the settings after logging in.“, as apparently the other way is to send an invitation via the email list and have it accepted. The second part, however, can not be true, as password reset link can not require anybody to know the password – such link would be completely useless, and they do not even need to know my email – only the list email. But this provided the confirmation and brings us to the conclusion.

The Conclusion

  1. There is, indeed, a way to inject group email address into your LinkedIn account, LinkedIn knows about it and they don’t see any problem with it. Most probably, this can be done by sending an invitation for a person to connect to a mailing list. You can imagine the social engineering possibilities.
  2. While you can see the target email in the email connect invite from LinkedIn, you can not see it, AFAIK, in the LinkedIn web interface, which makes “group” invite indistinguishable from a regular one.
  3. There is, and probably will be for a foreseeable time, a way to use that group email address to reset your password using that group, by anybody who has access to group emails.
  4. LinkedIn knows about the issues outlined above but they do not perceive it as a security issue.

The Advice

So here’s some advice if you have a LinkedIn account:

  1. Enable two-factor on your LinkedIn account NOW.
  2. Check your email list (go to Settings, click on “Account” and then “Add & change email addresses”) and see if you don’t have any unknown emails there. Do that at regular intervals, especially after accepting connections.
  3. Do not accept connections from strangers that you do not recognize. 
  4. Do not expect big companies to have a meaningful way to report a security problem.

And a wishlist for LinkedIn:

  1. Make password request only work with primary email.
  2. Make associating an email with the account always an explicit action.
  3. Have some way to escalate security issues. 

If you have any additional info or ideas on this topic, please feel free to comment.

PHP Spec – a dream come true

Almost 8 years ago, I wrote “What is PHP anyway?“. This blog is supposed to be about some long-term dreams, and in this case it was the dream come true – Sara Golemon and the excellent Facebook team made a draft PHP spec and with some paint and polish it can become a real spec pretty soon. Not sure if it can be ready by the 8th anniversary of that post, but it probably will be out by the 9th 🙂

Talking to people, I recently discovered not everybody knows this thing exists. So here it goes – it exists right here. It is still a draft. If you see something wrong, submit a pull request. If you feel you can contribute more by working on it or refining some points, “standards” mailing list was re-purposed to be the working group list.

PHP 5.6 – looking forward

Having taken a look in the past, now it’s time to look into the future, namely 5.6 (PHP 7 is the future future, we’ll get there eventually). So I’d like to make some predictions of what would work well and not so well and then see if it would make sense in two years or turn out completely wrong.

High impact

I expect those things to be really helpful for people going to PHP 5.6:

Constant expressions – the fact that you could not define const FOO = BAR + 1; was annoying for some for a long time. Now that this is allowed I expect people to start using it with gusto.

Variadics – while one can argue variadics are not strictly necessary, as PHP can already accept variable number of args for every function, if you’re going to 5.6 the added value would be enough so you’d probably end up using them instead of func_get_args and friends.

Operator overloading for extensions – the fact that you can sum GMP numbers with + is great, and I think more extensions like this would show up. E.g., for business apps dealing with money ability to work with fractions without precision loss is a must, and right now one has to invent elaborate wrappers to handle it. Having an extension for this would be very nice. Finding a way to transition from integer to GMP when number becomes too big would be a great thing too.
Still not convinced having it in userspace is a great idea, what C++ did to it is kind of scary.

phpdbg – not having gdb for PHP was for a long time one of the major annoyances. I expect to use it a lot.

Low impact

Function and constant importing – this was asked for a long time, but I still have hard time believing a lot of people would do it, since people who need imports usually are doing it in OO way anyway.

Hurdles

OpenSSL becoming strict with regard to peer verification by default may be a problem, especially for intranet apps running on self-signed certs. While this problem is easily fixable and the argument can be made that it should have been like this from the start – too many migrations go on very different paths depending on if it requires changing code/configs or not.

Adoption – again, with 5.5 adoption being still in single digits, I foresee a very slow adoption for 5.6. I don’t know a cure for “good enough” problem and I can understand people that do not want to move from something that already works, but look at the features! Look at the performance! I really hope people would move forward on this quicker.

While 5.4 will always have a special place in my heart, I hope people now staying on 5.2 and 5.3 would jump directly to 5.6 or at least 5.5. The BC delta in 5.5 and 5.6 is much smaller – I think 5.3->5.4 was the highest hurdle recently, and 5.4 to 5.5 or 5.6 should go much smoother.

Anything you like in PHP 5.6 and I forgot to mention? Anything that you foresee may be a problem for migration? Please add in comments. 

PHP 5.4 – looking back

With 5.6.0 having been released and 5.4 branch nearing its well-earned retirement in security-fixes-only status I decided to try and revive this blog. As the last post before the long hiatus was about the release of the 5.4, I think it makes sense to look back and see how 5.4 has been doing so far.

Great

Release process. Combined with RFCs and git. It’s hard to believe we used not to have it. RFC process is working great, git makes all the processes tick and we have scheduled releases, working CI setup and much better predictability and management of releases overall. It’s no big deal unless you remember how it was before.

Built-in webserver. It really helps when you can just set up something browseable (is this a word? now it is) with PHP alone, without bothering with Apache setup and other moving parts. This is again a case of something that you don’t realize how much you missed it until you start using it.

$this support in closures. Having to write 5.3-compatible code for the last couple of years, I can’t emphasize enough how sorely it was missing in 5.3. I really regret the fact we could not get it into 5.3.

Syntax sugar like [] and <?= working everywhere. It’s a small thing but it adds up. I usually do not give much weight to saving couple of keystrokes and so on, but these to me really improve coder’s quality of life.

Removal of old “features” (since they ended up in the dump, is it right to call them features anymore?). Nobody is missing the safe mode or magic quotes or register_globals. Good riddance. Wish we parted ways sooner.

Meh

Traits. I must say I haven’t seen big adoption of the traits feature. Yes, of course people use it, there are tutorials, there are articles, etc. But at the same time compared to how much namespaces were needed or how much closures proved to be a great help, traits adoption, IMHO, remains lukewarm at best. To me, it has not lived yet up to its promise. Maybe I’m missing something, tell me if you have great examples there.

Adoption. This is a problem for new PHP versions and for developers of distributable PHP software – PHP versions are becoming “good enough” and people are reluctant to move forward, which also delays adoption of new features by library & packaged software writers. Look at the numbers: almost 3/4 of the PHP developers are using EOLed versions! 5.4 adoption is low at 22% and 5.5 adoption is abysmal. I hope that more streamlined release cycles and heightened attention to BC matters would bend this tendency. But so far it is not encouraging. WordPress numbers look even worse.

Don’t know

callable type. How wide is the usage? How useful it is in practice? Is it being used in major projects? I really don’t know.

Performance. It feels weird to put an obviously great improvement in this category. I would expect performance be a major driver for people to move forward, but the numbers suggest otherwise. As much as I love the performance improvements (a lot!), I really have no idea on how much it influenced the community and made them go to 5.4 (or beyond). Are there any surveys, links, studies, etc. in this regard? I see a lot of talk about performance but how many people also walk the walk? Are the numbers quoted above misleading?

mysqlnd. 5.4 is the first version where mysqlnd is the default mode of doing mysql. It has better performance, great features and I’ve been using it for years without a problem. But how widely it is adopted – do people still prefer the old way or love the new one? Do they use the plugin API widely?

Anything else?

Did I forget to mention something that really made your life better in 5.4? Did I conceal some flop that you wish we didn’t do? Please tell.

5.4 is out!

Since May 2011 we have worked on releasing PHP 5.4, and now it happened. Thanks everybody who helped with it!

PHP 5.4 has some new and exciting features – for some of them, like traits, I have no idea right now how they will work out and what people would do with them. It’d be very interesting to see.
For some of them, I feel they are basic common sense and long overdue in PHP (of course, not everybody may share my opinion 😉 – like [‘short’,’array’,’syntax’] or detaching <?= from INI settings. Some were just missing features that we didn’t catch up with before – more fluent syntax, linking objects to closures, etc.
Some things in PHP, as we have come to realize, were clearly mistakes – like register_globals, some were driven by real needs but proved to do more harm than good at the end – like magic quotes and “safe” mode – so we had to lay them to rest.

One of the best things that happened in 5.4 though is not immediately apparent. The engine behind 5.4 is significantly faster and consumes less memory than before. How much faster and how much less? Depends on your application, of course, but from some benchmarks 10-20% speed improvement and 20-30% memory improvement can be expected. Do your own benchmarks and blog about the results! Also would be a good time to ensure your application runs fine on 5.4 – since 5.4 is now the stable version and you have to start using it! 🙂

Another great things that happened – and is continuing to happen – is that we are finally moving towards more streamlined release process, towards having more regular releases (expect 5.4.1 release cycle to start in late March-early April and 5.4.1 be out in about a month after that) and more organized feature/change proposal process. There’s a wiki where people can post their RFC proposals, we have a voting process and while we may be still working out some fine details of the procedure, I feel we definitely have improved in this regard. It is time to get some organization into the process – we don’t need to create a bureaucracy, but some process is definitely needed and we’re establishing it.

Yet another thing that is happening – PHP project is slowly but surely moving from SVN to Git. This will be a great improvement. Having used Git for the last couple of years, I can clearly see that it can make many things we’re doing in PHP everyday so much easier. Can’t wait for it 🙂 Also makes easy for people to do pull requests and for core devs to merge them, among other things – should make bug resolutions, etc. work faster.

And after catching our breaths a bit and relaxing soon there will be time to think what we do in 5.5 – remember the thing about more regular releases? 🙂 I’ll try to post some thoughts about what I’d want in 5.5 soon.

ZF Oauth Provider

Zend Framework has pretty good OAuth consumer implementation. However, it has no support for implementing OAuth provider, and it turns out that there aren’t many other libraries for it. Most examples out there base on PECL oauth extension, which works just fine, with one caveat – you have to have this PECL extension installed, while ZF implementation does not require that.

So I went ahead and wrote some code that allows to easily add OAuth provider to your ZF-based or ZF-using application. That should make writing OAuth provider easier.

Note that the code does not implement the whole server – just the OAuth protocol wrapper, you’d still have to do all the work of managing tokens/keys/nonces by yourself. See example server in the repository and the wiki on github for more details on how to do it, but the protocol follows what PECL oauth does pretty closely, so many tutorials for it would be mostly applicable to this one too.

Check out Zend_Oauth_Provider on github, if you want to improve it – please fork and submit pull requests.