Default constructors

Consider the following code:

class Animal {
    protected $what = "nothing";
    function sound() {
        echo get_class($this)." says {$this->what}"; 
    }
}

class Cow extends Animal {
    protected $what = "moo";
    protected $owner;
    public function __construct($owner) {
        $this->owner = $owner;
        // parent::__construct(); (?)
    }
}

$a = new Cow("Old McDonald");
$a->sound();

This code represents a simple class hierarchy. Now let us consider the line marked by (?). Of course we can not call the parent ctor there since we do not have one. But let’s say we refactored the base class and added the parent ctor which does some stuff:

class Animal {
   protected $born;
   public function __construct() {
      $this->born = time();
   }
}

Seemingly, we didn’t do anything wrong here, right? But now our code is broken, since Cow::__construct does not call Animal::__construct. So we should go to every class extending Animal and fix them. The problem here we could not avoid this problem – unless we stick empty ctor into Animal when it doesn’t need it, we can not call it from Animal’s child classes. Sticking empty ctor into every class in case we’d ever want to extend it does not sound like a nice idea. Not being able to add a default ctor (i.e. one not needing any parameters) to a base class is also not good.

So what if we make default ctor always exist? If it’s not defined, calling parent::__construct() would just do exactly nothing. But if we ever implement it, all the child classes will be ready.

In fact, in Java for example it is mandatory to call the parent ctor, and if the class has none the default one is supplied by the language.
PHP does not enforce it, but it is very rarely a good idea not to. Right now, PHP does not allow to do the right thing here, but it should.

unserialize() and being practical

I have recently revived my “filtered unserialize()” RFC and I plan to put it to vote today. Before I do that, I’d like to outline the arguments on why I think it is a good thing and put it in a somewhat larger context.

It is known that using unserialize() on outside data can lead to trouble unless you are very careful. Which in projects large enough usually means “always”, since practically you rarely can predict all interactions amongst a million lines of code. So, what can we do?

Of course, the first thing would be to never use unserialize() in this context, and this means no problem, right? However, this approach has the following issues:

  1. It goes against what is natural for people (using PHP native serialization mechanisms) to do and what is widely done in the field. Usually when you try to work against what is natural for people to do, it is an uphill battle where losses are much more frequent than wins. Doing the right thing should be easy, and if it is not so, then the chance that right thing is not done raises accordingly. From that perspective, anything that makes doing the right thing easier is a benefit.
  2. There is no other mechanism which matches serialize() by capability but does not have its issues. Yes, I know in many cases data being serialized is simple enough so JSON or something akin to it would suffice. But sometimes it may not, and in that case we need some solution too. Let’s say we said using JSON is a best practice. However, let’s say one finds a rare corner case where it is not enough. What would we offer in that case? If we do not provide any solution, people would do homebrew solutions, and many of these will be done wrong.
  3. Contexts change, and what were internal context before may suddenly become exposed, and then may be in for an expensive refactoring effort if no other solution is available.

So that is why I think we should have a middle ground between “never use unserialize() on external data and if you do, you’re going to hell and we’re not going to talk to a sinner like you until you repent and rewrite all your code” and “let’s rewrite PHP library functions in PHP because that’s what it takes for our code to work”. I think it is a practical solution which allows your code to be more predictable (i.e, less prone to security issues) while allowing you to work with your code as it is and not requiring extensive rewrites.

Is this a security measure? I removed the reference “security” from the RFC title because I think it has lead the discussion in a wrong direction. Yes, it does not provide perfect security, and yes, you should not rely only on that for security. Security, much like ogres and onions, has layers. So this is trying to provide one more layer – in case that is what you need. I think it improves security but I’d much rather concentrate on the useful options that it adds to the programmer’s toolkit than on semantics of the term “security” and its implications.

Static typing

There is some renewed discussion about introducing static typing in PHP. I just read one very interesting post: The Safyness of Static Typing which I suggest everybody that is interested in this topic should read (and the links there). You may agree or disagree, but it is worth reading and even if you disagree it is worth ensuring you know the answers to the questions raised there, otherwise your disagreement lacks substance. I must admit I liked that post because it agreed with my feelings (not substantiated prior to that by any experimental data besides general experience I’ve acquired in the field) that type safety is not as close to silver bullet as some put it.

Within the context of PHP, I’m not sure if more strict typing (coercive typing is something in between and would require a bit different treatment) would be beneficial. I can see where it could be useful – i.e., for making JIT it probably would be very nice. On the other hand, Javascript has excellent JIT engines, as I have heard, without any additions of strict typing, so it’s not absolutely necessary. With PHP code living in runtime and static analysis tools not being routine part of mainstream development, at least as far as I have seen, I’m not sure addition of strict typing would help in any substantial way. Facebook guys, obviously, disagree – I wonder if they have some data to back it up, i.e. how that worked in practice and especially how “hybrid” model – i.e. having typed and untyped code coexist (that as I understand is what is happening, may be I am wrong here) works out and if it indeed provides better safety and reduced development time?

P.S. oh, and if you want a surefire way to annoy me, please call strict typing “type hinting”. I’m sure in the history of PHP there were examples of worse terminology (“safe mode” comes to mind as one) but that does not excuse this most unfortunate decision to name strictly typed arguments “hinting”.

LinkedInSecurity

This is an uncharacteristically non-PHP post, but I thought it may interest the audience anyway, and this is as good place as any to have it. So the TLDR of this post is that I’ve recently had an interaction with certain security issue in LinkedIn, this issue is still there, LinkedIn is not inclined to fix it and you may be affected.

The Story

All names (except, obviously, LinkedIn) in the story has been changed to protect the privacy, but refer to real people, entities and events.
The story of discovering this issue begun when one morning I have woken up and found in my mailbox a message saying “here’s the link to reset your password” from LinkedIn. As I have not reset my password on LinkedIn, I was somewhat surprised, but thought – OK, maybe somebody is trying to play tricks with my account, I’m pretty sure this would go nowhere. Then, as my brain was waking up, I looked at the email closely and discovered two things:

  1. This email has not my name, by the name of my colleague, let’s call him B., at the company, let’s call it Westeros Inc.
  2. The email was not sent to me directly, neither it was sent to B. directly, instead it was sent to an internal company mailing list goldcloaks@westerosinc.com.

I didn’t know what to make out of it but decided maybe B. copy-pasted wrong address to some field in LinkedIn.
Later the same day, talking to B. and other coworkers, I have mentioned this event. B. said that he indeed reset his LI password recently, but he never added the goldcloaks list to LinkedIn. I’ve started to get suspicious and asked how then I’ve got his password reset email? He didn’t know. So we (myself and B.) did an experiment:

We went to LinkedIn, logged out and clicked “forgot password” on the login form. Then we entered the address of the goldcloaks@westerosinc.com and in a couple of seconds, I’ve got the password reset link, with B.’s name on it. Clicking on that link, I’ve got a form to reset the password (no additional questions like what’s my favorite pokemon) and after another click I’ve got the email saying “B., your password was successfully reset“. I used the new password to log in, but then I was stopped by the two-factor verification. Which means two things: 1) password change worked, since 2-factor kicks in only when password is right and 2) B. is a smart man and has protected his account against password thieves. I had to ask him for the code – now that I have his account’s password, this was the only way to give him the control back. After getting the code, I could successfully log into his account and could see all his deepest secrets (which I didn’t) or return the control back to him (which I did). Before that, we verified that goldcloaks@westerosinc.com is indeed in B.’s list of account’s emails.

Then I decided to see how comes the goldcloaks list ended up in B.’s email list. I went to my own email list, and, surprisingly, discovered that in my own list, among my regular emails, there is another mailing list, maesters@westerosinc.com, which I definitely did not ever add there and had zero reason to. I asked other people sitting around in the office to check their lists and they too have discovered a couple of extra emails, added by some mysterious way, in their profiles.

The Analysis

Basing on these discoveries, I have arrived at the following conclusions:

1. There is a way, currently unknown to me, to add a group mailing list to one’s profile on LinkedIn, without their explicit consent (at least without them knowing that this is what they consented to).
2. LinkedIn accepts this group list email and any non-primary email as an email to send password reset requests too.
3. Reading emails from this address is the only thing needed to reset the password – even if 2-factor auth is enabled. With 2-factor auth, you will not be able to access the account after the password has been reset (unless you find a way to cheat there, I did not try) but you will be able to reset the password.
4. For the majority of people asked, LinkedIn password emails to goldcloaks@westerosinc.com ended up in a spam folder, which means the victim of the shenanigans may not even notice what happened.

This looked like a security issue, so I have written up the whole story (in a bit less colorful words than here) to security@linkedin.com and went back to work, expecting the email from LinkedIn with heartfelt thanks and promises of speedy fix implementation.

The Security Response

Of course, that is not what happened. Instead, what happened that I have got an answer from some very helpful individual from frontline support, asking me for “detailed information about your problem and if you think it might help, attach a screenshot, too“. As I have just spent significant time on composing big encrypted email full of details, I was a bit confused as to which details I was missing and where screenshots may be useful there, but I have not relented at first and wrote second explanation of the issue. The response was:

1. LinkedIn support took the extraordinary security measure of logging me out of all my current sessions with LinkedIn.
2. They advised me not to write down my password in publicly accessible places and suggested that if I continue to leave my computer sitting around in public places without logging out, bad things may happen to my account. My sincerest pleas that such thing never happened and the problem I am talking about is not because I forgot my laptop in a pub while being drunk (and so, apparently, did my coworkers) were met with utter disbelief. They also instructed me to not use my LinkedIn password on other sites and gave me a full page of very useful boilerplate password security advise, as prudent as having no relation to the case being discussed.
3. They assured me that my account was not compromised (which I never implied) and my password is safe.
4. They assured me that “The only way to add an email into an account is via the settings after logging in.”

By that time I was sure nobody at LinkedIn is going to believe me there’s a problem (beyond my implied propensity to leaving my laptop around and thus letting strangers add emails to my LinkedIn account) so I decided I’ve done my responsible disclosure part and should not spend more time on it. However, then I’ve got another email from LinkedIn stating this:

Sometimes, when a member accepts an invitation to connect that was sent to an email distribution list, that list becomes associated with the member’s account.

Please be assured that no one on the distribution list would be able to use the password reset link to access your account unless they knew both your email address and your password.

The first part, of course, completely belies the claim that “The only way to add an email into an account is via the settings after logging in.“, as apparently the other way is to send an invitation via the email list and have it accepted. The second part, however, can not be true, as password reset link can not require anybody to know the password – such link would be completely useless, and they do not even need to know my email – only the list email. But this provided the confirmation and brings us to the conclusion.

The Conclusion

  1. There is, indeed, a way to inject group email address into your LinkedIn account, LinkedIn knows about it and they don’t see any problem with it. Most probably, this can be done by sending an invitation for a person to connect to a mailing list. You can imagine the social engineering possibilities.
  2. While you can see the target email in the email connect invite from LinkedIn, you can not see it, AFAIK, in the LinkedIn web interface, which makes “group” invite indistinguishable from a regular one.
  3. There is, and probably will be for a foreseeable time, a way to use that group email address to reset your password using that group, by anybody who has access to group emails.
  4. LinkedIn knows about the issues outlined above but they do not perceive it as a security issue.

The Advice

So here’s some advice if you have a LinkedIn account:

  1. Enable two-factor on your LinkedIn account NOW.
  2. Check your email list (go to Settings, click on “Account” and then “Add & change email addresses”) and see if you don’t have any unknown emails there. Do that at regular intervals, especially after accepting connections.
  3. Do not accept connections from strangers that you do not recognize. 
  4. Do not expect big companies to have a meaningful way to report a security problem.

And a wishlist for LinkedIn:

  1. Make password request only work with primary email.
  2. Make associating an email with the account always an explicit action.
  3. Have some way to escalate security issues. 

If you have any additional info or ideas on this topic, please feel free to comment.

PHP Spec – a dream come true

Almost 8 years ago, I wrote “What is PHP anyway?“. This blog is supposed to be about some long-term dreams, and in this case it was the dream come true – Sara Golemon and the excellent Facebook team made a draft PHP spec and with some paint and polish it can become a real spec pretty soon. Not sure if it can be ready by the 8th anniversary of that post, but it probably will be out by the 9th :)

Talking to people, I recently discovered not everybody knows this thing exists. So here it goes – it exists right here. It is still a draft. If you see something wrong, submit a pull request. If you feel you can contribute more by working on it or refining some points, “standards” mailing list was re-purposed to be the working group list.

PHP 5.6 – looking forward

Having taken a look in the past, now it’s time to look into the future, namely 5.6 (PHP 7 is the future future, we’ll get there eventually). So I’d like to make some predictions of what would work well and not so well and then see if it would make sense in two years or turn out completely wrong.

High impact

I expect those things to be really helpful for people going to PHP 5.6:

Constant expressions – the fact that you could not define const FOO = BAR + 1; was annoying for some for a long time. Now that this is allowed I expect people to start using it with gusto.

Variadics – while one can argue variadics are not strictly necessary, as PHP can already accept variable number of args for every function, if you’re going to 5.6 the added value would be enough so you’d probably end up using them instead of func_get_args and friends.

Operator overloading for extensions – the fact that you can sum GMP numbers with + is great, and I think more extensions like this would show up. E.g., for business apps dealing with money ability to work with fractions without precision loss is a must, and right now one has to invent elaborate wrappers to handle it. Having an extension for this would be very nice. Finding a way to transition from integer to GMP when number becomes too big would be a great thing too.
Still not convinced having it in userspace is a great idea, what C++ did to it is kind of scary.

phpdbg – not having gdb for PHP was for a long time one of the major annoyances. I expect to use it a lot.

Low impact

Function and constant importing – this was asked for a long time, but I still have hard time believing a lot of people would do it, since people who need imports usually are doing it in OO way anyway.

Hurdles

OpenSSL becoming strict with regard to peer verification by default may be a problem, especially for intranet apps running on self-signed certs. While this problem is easily fixable and the argument can be made that it should have been like this from the start – too many migrations go on very different paths depending on if it requires changing code/configs or not.

Adoption – again, with 5.5 adoption being still in single digits, I foresee a very slow adoption for 5.6. I don’t know a cure for “good enough” problem and I can understand people that do not want to move from something that already works, but look at the features! Look at the performance! I really hope people would move forward on this quicker.

While 5.4 will always have a special place in my heart, I hope people now staying on 5.2 and 5.3 would jump directly to 5.6 or at least 5.5. The BC delta in 5.5 and 5.6 is much smaller – I think 5.3->5.4 was the highest hurdle recently, and 5.4 to 5.5 or 5.6 should go much smoother.

Anything you like in PHP 5.6 and I forgot to mention? Anything that you foresee may be a problem for migration? Please add in comments. 

PHP 5.4 – looking back

With 5.6.0 having been released and 5.4 branch nearing its well-earned retirement in security-fixes-only status I decided to try and revive this blog. As the last post before the long hiatus was about the release of the 5.4, I think it makes sense to look back and see how 5.4 has been doing so far.

Great

Release process. Combined with RFCs and git. It’s hard to believe we used not to have it. RFC process is working great, git makes all the processes tick and we have scheduled releases, working CI setup and much better predictability and management of releases overall. It’s no big deal unless you remember how it was before.

Built-in webserver. It really helps when you can just set up something browseable (is this a word? now it is) with PHP alone, without bothering with Apache setup and other moving parts. This is again a case of something that you don’t realize how much you missed it until you start using it.

$this support in closures. Having to write 5.3-compatible code for the last couple of years, I can’t emphasize enough how sorely it was missing in 5.3. I really regret the fact we could not get it into 5.3.

Syntax sugar like [] and <?= working everywhere. It’s a small thing but it adds up. I usually do not give much weight to saving couple of keystrokes and so on, but these to me really improve coder’s quality of life.

Removal of old “features” (since they ended up in the dump, is it right to call them features anymore?). Nobody is missing the safe mode or magic quotes or register_globals. Good riddance. Wish we parted ways sooner.

Meh

Traits. I must say I haven’t seen big adoption of the traits feature. Yes, of course people use it, there are tutorials, there are articles, etc. But at the same time compared to how much namespaces were needed or how much closures proved to be a great help, traits adoption, IMHO, remains lukewarm at best. To me, it has not lived yet up to its promise. Maybe I’m missing something, tell me if you have great examples there.

Adoption. This is a problem for new PHP versions and for developers of distributable PHP software – PHP versions are becoming “good enough” and people are reluctant to move forward, which also delays adoption of new features by library & packaged software writers. Look at the numbers: almost 3/4 of the PHP developers are using EOLed versions! 5.4 adoption is low at 22% and 5.5 adoption is abysmal. I hope that more streamlined release cycles and heightened attention to BC matters would bend this tendency. But so far it is not encouraging. WordPress numbers look even worse.

Don’t know

callable type. How wide is the usage? How useful it is in practice? Is it being used in major projects? I really don’t know.

Performance. It feels weird to put an obviously great improvement in this category. I would expect performance be a major driver for people to move forward, but the numbers suggest otherwise. As much as I love the performance improvements (a lot!), I really have no idea on how much it influenced the community and made them go to 5.4 (or beyond). Are there any surveys, links, studies, etc. in this regard? I see a lot of talk about performance but how many people also walk the walk? Are the numbers quoted above misleading?

mysqlnd. 5.4 is the first version where mysqlnd is the default mode of doing mysql. It has better performance, great features and I’ve been using it for years without a problem. But how widely it is adopted – do people still prefer the old way or love the new one? Do they use the plugin API widely?

Anything else?

Did I forget to mention something that really made your life better in 5.4? Did I conceal some flop that you wish we didn’t do? Please tell.