Default constructors

Consider the following code:

class Animal {
    protected $what = "nothing";
    function sound() {
        echo get_class($this)." says {$this->what}"; 

class Cow extends Animal {
    protected $what = "moo";
    protected $owner;
    public function __construct($owner) {
        $this->owner = $owner;
        // parent::__construct(); (?)

$a = new Cow("Old McDonald");

This code represents a simple class hierarchy. Now let us consider the line marked by (?). Of course we can not call the parent ctor there since we do not have one. But let’s say we refactored the base class and added the parent ctor which does some stuff:

class Animal {
   protected $born;
   public function __construct() {
      $this->born = time();

Seemingly, we didn’t do anything wrong here, right? But now our code is broken, since Cow::__construct does not call Animal::__construct. So we should go to every class extending Animal and fix them. The problem here we could not avoid this problem – unless we stick empty ctor into Animal when it doesn’t need it, we can not call it from Animal’s child classes. Sticking empty ctor into every class in case we’d ever want to extend it does not sound like a nice idea. Not being able to add a default ctor (i.e. one not needing any parameters) to a base class is also not good.

So what if we make default ctor always exist? If it’s not defined, calling parent::__construct() would just do exactly nothing. But if we ever implement it, all the child classes will be ready.

In fact, in Java for example it is mandatory to call the parent ctor, and if the class has none the default one is supplied by the language.
PHP does not enforce it, but it is very rarely a good idea not to. Right now, PHP does not allow to do the right thing here, but it should.

unserialize() and being practical

I have recently revived my “filtered unserialize()” RFC and I plan to put it to vote today. Before I do that, I’d like to outline the arguments on why I think it is a good thing and put it in a somewhat larger context.

It is known that using unserialize() on outside data can lead to trouble unless you are very careful. Which in projects large enough usually means “always”, since practically you rarely can predict all interactions amongst a million lines of code. So, what can we do?

Of course, the first thing would be to never use unserialize() in this context, and this means no problem, right? However, this approach has the following issues:

  1. It goes against what is natural for people (using PHP native serialization mechanisms) to do and what is widely done in the field. Usually when you try to work against what is natural for people to do, it is an uphill battle where losses are much more frequent than wins. Doing the right thing should be easy, and if it is not so, then the chance that right thing is not done raises accordingly. From that perspective, anything that makes doing the right thing easier is a benefit.
  2. There is no other mechanism which matches serialize() by capability but does not have its issues. Yes, I know in many cases data being serialized is simple enough so JSON or something akin to it would suffice. But sometimes it may not, and in that case we need some solution too. Let’s say we said using JSON is a best practice. However, let’s say one finds a rare corner case where it is not enough. What would we offer in that case? If we do not provide any solution, people would do homebrew solutions, and many of these will be done wrong.
  3. Contexts change, and what were internal context before may suddenly become exposed, and then may be in for an expensive refactoring effort if no other solution is available.

So that is why I think we should have a middle ground between “never use unserialize() on external data and if you do, you’re going to hell and we’re not going to talk to a sinner like you until you repent and rewrite all your code” and “let’s rewrite PHP library functions in PHP because that’s what it takes for our code to work”. I think it is a practical solution which allows your code to be more predictable (i.e, less prone to security issues) while allowing you to work with your code as it is and not requiring extensive rewrites.

Is this a security measure? I removed the reference “security” from the RFC title because I think it has lead the discussion in a wrong direction. Yes, it does not provide perfect security, and yes, you should not rely only on that for security. Security, much like ogres and onions, has layers. So this is trying to provide one more layer – in case that is what you need. I think it improves security but I’d much rather concentrate on the useful options that it adds to the programmer’s toolkit than on semantics of the term “security” and its implications.