We are doomed!

Generally I enjoy reading all kinds of “PHP sucks” and “PHP is doomed” articles, of which there’s no shortage. First, many times the authors have very interesting ideas on places where PHP does suck and can use improvement – and the more good ideas the merrier. Second, once you read a dozen of them you can’t help noticing how people say PHP sucks and is doomed for entirely contradictory reasons which makes it fun. And last but not least – if people write about flaws of PHP it means they care. Nobody writes about why PL/1 sucks and how Clipper is doomed ;)

Recently on PHP blogs I saw a reference to one blog entry – named “PHP is Doomed!” of course – that proclaims PHP is doomed for one single reason – it doesn’t support multithreading. I personally think that claiming PHP is doomed because of that just shows that the author does not take the right perspective of what tasks PHP is meant to do and how in general Web applications work.
Having said that, however, I must also note that there are some actions which you may want to do asynchronously. While there are a bunch of possible solutions for that – from implementing async I/O in an extension to using job managers like Zend’s job queue – PHP also does have kind of “multithreading” inside.
This is something named “ticks” – I wonder how many of the PHP developers heard about it and of those how many actually used it. Could it be used for offloading long-running I/O-bound tasks or grouping them together (e.g. so we could wait for DB and HTTP in parallel and not sequentially)? Would there be any use at all for such functionality and if so – how it’s supposed to work? I.e. how would you know it’s done and how you would collect and use the results?

About these ads

22 thoughts on “We are doomed!

  1. Ticks can __not__ be used for async IO, nor does it allow even pseudo-multithreading.

    A tick is an event that occurs for every N low-level statements executed by the parser within the declare block.

    Given this example code:

    function my_function2()
    {
    echo "\trunning tick function2: $i\n";
    for ($i = 0; $i < 3; $i++)
    {
    echo "\t\trunning tick loop2: $i\n";
    sleep (1);
    }
    }

    function my_function()
    {
    echo "\trunning tick function: $i\n";
    for ($i = 0; $i < 3; $i++)
    {
    echo "\t\trunning tick loop: $i\n";
    sleep (1);
    }
    }

    register_tick_function('my_function');
    register_tick_function('my_function2');

    declare(ticks=1)
    {
    while (1);
    }

    You get this result…

    running tick function:
    running tick loop: 0
    running tick loop: 1
    running tick loop: 2
    running tick function2:
    running tick loop2: 0
    running tick loop2: 1
    running tick loop2: 2
    running tick function:
    running tick loop: 0
    running tick loop: 1
    running tick loop: 2

    Note that each tick based function is fully blocking. Also, if there was real code inside of the declare control structure it would of course suffer from the blocking too.

    Ticks is almost always best left unused. The lack of real threading in its model serves only to obfuscate the codes behavior; combined with no async io you get a very unfriendly environment for applications needing multiple concurrent IO.

  2. Mauro –

    Thanks! I hope it helps too! While I rarely use PHP, when I do, I always seem to spend a lot of time finding what I need on the docs. I think a lot of it is the usage of a generic search engine (Google) for the searching, rather than something customized for just that purpose. That’s why a lot of my suggestions are organizational, not content related items.

    J.Ja

  3. Justin,

    I think you’ve made really good points… I really like some of your suggestions. I know Stanislav is an important person in the PHP community, hope he also likes them.

    Regards,
    Mauro.

  4. Mauro –

    That link was exceedingly helpful, thank you! Interestingly enough, probably the only piece of useful information that I ever found on Oracle’s site (Oracle takes the prize of “worst documentation in history”, on that note).

    J.Ja

  5. PHP’s documentation is no worse than the JavaDocs, and usually better. Unfortunately, it is a strictly “how based” set of information, and a developer like me also wants (and often needs) a “why based” set of documentation. Microsoft’s documentation occassionally goes in this direction, and Perl’s is almost entirely like this (often at the expense of “how”). In other words, it is a reference to show you what parameters are needed, but it rarely gives any clue as to the internal workings, what the alternatives are, and where/when/why you would do something one way and not another way. Documentation like that is frequently a cause of less-than-expert coders cranking out bad code. A great example of this is all over Microsoft’s .Net documentation. They have examples of nearly everything, but the examples 1) frequently show poor coding practices and 2) are often simply showing the method or property or function being used within a useless context.

    As an example of the difference, Perl’s documentation nicely shows multiple ways to slurp a file, and shows how to do it line by line (by far the most common usage) and explains why that is better for most uses. .Net, Java, and PHP’s documentation most likely are happy to show you how to slurp a file into an array with one command, without every explaining why you probably would rarely want to do that. Remember, many, many coders simply copy/paste from the docs and change the variable names!

    This session thing is another great example. Understanding how session access works in a Web server environment is crucial. Yet, it is up to a commenter to put up the needed information. Useful and accurate information from the comments should be rolled back into the documentation, especially since the comments are often contradictory or out-of-date.

    It really is not something where submitting a doc bug will help, it requires a thorough change in mindset. Again, it is hardly PHP-specific. It is rare to find truly excellent documentation (MySQL’s documentation is quite good as well, as is Apache’s and the FreeBSD Handbook). Things to improve PHP’s documentation:

    * “Getting started with XYZ” sections which could show off a number of different functions *in context* with good, PHP-style coding practices… a great example would be a simple, database driven page that demonstrates connecting to a database, parameterization (and why!) and so on.
    * An FAQ.
    * More internal details, like how sessions really work.
    * A thorough review of the comments and rolling into the documentation information from comments that is accurate and useful, or refreshing/expanding documentation on things that people often show confusion about.
    * A breakout of the information into “reference only”, “beginner”, “intermediate”, and “advanced” with the user choosing how much detail they want to see on a global level with the ability to see the other levels on a page-by-page basis. This would let a PHP wiz who just needs to be reminded of a parameter see that, while someone like me could see the long, drawn out details of internal operations, while the newcomer could be given information at their level. This would be a *killer feature*.
    * The ability to find what you want by functionality, not just module/function name. Someone who has no idea what they are looking for will end up using the first thing they find which looks like it might work, instead of doing it the right way. For example, group it by things like “string processing” and “database connectivity”. Someone not familiar with PHP might not know that “PECL” is where to find regex information for example, but they would defintely go to a “string processing” category!

    You have to remember, you are a pretty darned smart guy, much smarter and more experienced than your average programmer (or at least that is the impression I get from your writing!). Whether or not the docs make sense to you is probably not a good yardstick. Put yourself in the shoes of a first year CS student with no background in programming, and ask yourself if the documentation is adequete, that is the real yardstick. Because most developers really are shake-n-bake folks who can glue libs together and not much more. They need better docs to learn how to do things right and to help them grow.

    J.Ja

  6. Well, I was not trying to compare PHP with C, it was just a joke. That was just a part of my argument.

    While I don’t totally agree with saying that PHP is a poorly documented language, I have to tell you that more than once I’ve discussed this same issue in lots of places and people doesn’t really know what’s happening behind the hood. I don’t think it should be the first thing to mention in the manual, but maybe an appendix or something like that could help someone who wants to know more than the basics. Besides that, I think PHP manual is a great reference: It goes to the point when it needs to and includes references all over the place (also, users comments usually show you tips and very good tutorials).

    Now, regarding the session approach, it has its advantages and its drawbacks. While it offers a safe alternative to a great majority (I can’t really talk about percentage numbers), it gives complete freedom to the advanced user who wants to implement his own session handler module. Try a google search: you’ll find PHP session handlers of all kinds (database, memcached, sharedance, mohawk, etc etc). IIRC this approach was chosen because it plays well with the shared-nothing architecture intended for the language (*).

    Regards,
    Mauro.

    (*) http://www.oracle.com/technology/pub/articles/php_experts/rasmus_php.html

  7. I can not agree that PHP is “poorly documented”. PHP is documented quite well, though indeed some things may be missing. If you feel something is really missing, contact the docs team or just submit a documentation bug.

    As for session, I agree it can make some scenarios (like opening massive number of pages from the same site simultaneously) slower – however most web site accesses are different from this scenario and if in specific site it isn’t so then one might use specific techniques – like closing session ASAP not waiting for the request to end – to improve the matters. Or even implement custom session module, if concurrent sessions are really that critical.

  8. Mauro –

    Yes, I actually discovered that behavior about 2 weeks ago, while researching a blog about Web frameworks. I have two things to say about it, really:

    1) Has anyone wondered why the PHP documentation does not mention it? ASP.Net’s documentation has a decent paragraph or two explaining, in depth, how the Session object handles this scenario. Java’s documentation pretends that the topic does not exist, which is typical for the JavaDocs. PHP’s documentation does not mention it… if I had not pored through the comments (and only one comment mentioned concurrency, as far as I could tell), I never would have noticed it. And frankly, I am much more inclined to beleive the documentation that a comment, particularly an older comment. So, I pose the question to the PHP folks: why is PHP such a poorly documented language?

    2) In general, I am of two minds about this, in terms of safety vs. performance. On the one hand, it kills the performance from the end user’s point of view. After all, they landed on a home page, scanned for links, did a ton of open in new window/tab, and half the pages took forever to load because they all blocked (it certainly explained a lot of odd site behavior to me, when I learned this!). On the other hand, it *does* guarantee no concurrency issues, and it was dead simple to code, no doubt. Simply putting a file lock on the session file into the session close command (forget it off hand) was called or the process ends.

    The reasons why C is still around makes sense to me. It is capable of writing application that perform very fast (if you know what you are doing) and it is low level enough to be insanely flexible and do anything (again, if you know what you are doing). It is more or less a less labor intensive assembly language. Comparing a fairly high level langugage like PHP (or Perl or Java, or any of the .Net languages, etc.) to C really is not fair. I may add, C’s low level nature makes it able to be dapated to new platforms with great results. Look at what Intel is doing with the Ct compiler if you are curious… blowing away other compilers *with less code*. I am not 100% sure how much of the Ct compiler collateral and information is public at this point, but from everything I’ve read, it works wonders. You simply are not going to see those kinds of gains with a more high level language, because the core langugage itself would undergo too many changes. I am sure we’ve all heard the legend that C was designed to have no more functionality than a thermostat would use for basic logic built into it, and I beleive it. While this means that everything useful is in a library, it also means that the base langauge itself is not impacted if the libraries or compilers radically change. Higher level langugage do not see this, simply because there is too much functionality built into the language itself.

    Wow, that was a massive digression, but an important one.

    At the end of the day, I think PHP’s approach to session handling has the merit of simplicity and is foolproof. And being foolproof is especially important when you consider how many programmers (particularly Web developers) have zero concept of what “concurrency” means, let alone the dangers inherent in a non-thread safe system. If you want to keep the 80%+ of programmers who are completely clueless safe, PHP’s approach is completely rational. On the flip side, it makes life miserable for a “better than shake-n-bake” programmer who is working on a system where blocking on a session is not acceptable.

    J.Ja

  9. Stas – “Though PHP of course has support for some locking mechanisms and can easily support any mechanism available to C programmer.”
    Justin – “The default function list does not list it; I beleive there is an add on to support these things, but it is not in the default install, sadly.”

    Just to clarify: By default, PHP uses disk based sessions. That is, a file with the content of the session is stored on disk. Whenever a user opens a session (session_start()) a lock is aquired on that file, so if another request tries to open it, it has to wait until the first one finishes. That’s why the session_write_close() function exists: to allow the programmer to release the session when he knows it won’t be used anymore.

    So, by default, there’s no concurrency problem with sessions if you have only one webserver because requests are serialized.

    If you start to increase the number of webservers, you can always use session_set_save_handler() and use your preferred distributed locking mechanism. Also, I recommend Sharedance ( http://sharedance.pureftpd.org/project/sharedance ).

    So, I tend to agree with the need of having some sort of multithreading support, but I don’t think PHP is doomed because of that. Also, it’s not late to add anything. I think languages are not doomed because of changes in processors or hardware trends. You have to talk about syntax, learning curves and, why not, marketing strategies. After all, C is pretty “old” ;)

    Regards,
    Mauro.

  10. the only reason PHP can ever be really doomed or whatever is if that beech decides to give away free msdn and dotnet and asp.net and all those for free with ms library books for free and . . aa nah that still wont work :d

    long live we all php-eaters

  11. Pingback: developercast.com » PHP 10.0 Blog: We are doomed! (and Ticks in PHP)

  12. Pingback: PHPDeveloper.org

  13. I think, especially in .Net, we will see more dynamic languages get used within the system (also things like Groovy in J2EE), but I think that most shops would rather learn 1 framework and 1, maybe 2 languages, than try to work with PHP/Perl/Ruby/Python/whatever code that is completely divorced from the rest of their code.

    Well, I certainly agree that some shops do it and some don’t, and of course from my perspective I see more shops that do since ones that don’t just do not work with us :) But I see this trend exists and some major players (including companies like IBM or Microsoft or Sun) take very real interest in dynamic languages and interfacing them with enterprise operational backends.

    On that note, I think that if PHP beefed up its Web Services, SOAP, etc. support, it could reinvent itself as the presentation layer language of choice.

    That’s what we are doing in PHP itself and in Zend Framework. We are only at the beginning, and we plan to support more and more things.

  14. “Fortunately, concurrent accesses to same session are quite rare due to the nature of the web application (one user has one brain, one set of eyes, one mouse and one keyboard, so they usually would have just one browser and one application session going on at the same time), thus serializing on sessions is usually OK.”

    I agree 100%. Tabbed browsers are making it more likely, and AJAX makes it *substantially* more likely (after all, when every little mouse click kicks off a request to the server…). But it is still a pretty remote possibility. What is much more likely are two requests from the same user accessing the same external resource, like the database, an external file, etc., or two requests from different users trying to get a hold of the same resource or framework maintained object (someone, please take the “Application” object out of these things!).

    “Though PHP of course has support for some locking mechanisms and can easily support any mechanism available to C programmer.”

    The default function list does not list it; I beleive there is an add on to support these things, but it is not in the default install, sadly.

    “In fact, it is very hard to enforce full data consistency without either serializing the accesses (and thus sacrificing at least some scalability) or greatly complicating the code with locks, mutexes and synchronization features – on the application level. Most of code dealing directly with multithreaded data is very complex and very hard to make right, so requiring PHP programmers to deal with it would be a disservice.”

    Again, no arguement from me! I have been saying the same thing for a while. At the end of the day, it is nigh impossible to reliably get a Web based “application” to meet the levels of thread safety (again, every Web app is multithreaded, unless you have a Web server which puts requests per user into a one by one queue) that client/server apps were meeting, say, 20 years ago, or mainframe apps met 30 years ago. This is why I am a huge proponent of ditching the idea of Web based “applications”. Web sites and dynamic Web sites? Great thing. Web applications? Not even “half baked”. HTTP was there and available, and malleable enough to kludge this junk together. For a true “application”, we would be much better off using something like X; indeed, it would be more reliable, better tolerant of network connectivity problems, and use less bandwidth (compared to AJAX), and possibly even less CPU & RAM. Heck, keeping one thread open and running for the window is a lot less rough than the constant building up and tearing down of HTTP connections, interpreter sessions (or J2EE/.Net threads), and so on. At the end of the day, trying to write a stateful system that maintains a connection over a connectionless, stateless protocol is plain dumb.

    “I think it makes much more sense – and it is actually what we see happening nowdays in the industry – for companies to combine strengths of .net/J2EE platforms with strengths of the dynamic languages.”

    I would love to agree with this (it is the direction I am trying to take my project, in fact), but I am not seeing it. Most shops really would rather use the wrong tool than add another language to the maintenance mix. I think, especially in .Net, we will see more dynamic languages get used within the system (also things like Groovy in J2EE), but I think that most shops would rather learn 1 framework and 1, maybe 2 languages, than try to work with PHP/Perl/Ruby/Python/whatever code that is completely divorced from the rest of their code.

    On that note, I think that if PHP beefed up its Web Services, SOAP, etc. support, it could reinvent itself as the presentation layer language of choice. Let the enterprise folks handle the tricky stuff like concurrency in the middle layer in .Net or J2EE, but have PHP do the front end. That would play quite nicely to each systems’ strengths, and the natural separation would maked the mixing of languages and environments less of a concern.

    J.Ja

  15. Speaking of sessions, sessions do have locking mechanism which prevents concurrent modification. However, sessions don’t need PHP to multithread and as a storage mechanism they are quite unique – many other data sharing mechanisms do not have such property built in . Though PHP of course has support for some locking mechanisms and can easily support any mechanism available to C programmer.
    Fortunately, concurrent accesses to same session are quite rare due to the nature of the web application (one user has one brain, one set of eyes, one mouse and one keyboard, so they usually would have just one browser and one application session going on at the same time), thus serializing on sessions is usually OK. If you need massively shared data structure, there can be trouble, however I don’t see how concurrent script would help.

    In fact, it is very hard to enforce full data consistency without either serializing the accesses (and thus sacrificing at least some scalability) or greatly complicating the code with locks, mutexes and synchronization features – on the application level. Most of code dealing directly with multithreaded data is very complex and very hard to make right, so requiring PHP programmers to deal with it would be a disservice.

    As for comparing to J2EE and .Net, I absolutely agree with you that PHP is better for some tasks and not better for many others. The trick is to use right tool for the job, and one who insists on using the same tool everywhere would end up using inadequate and inefficient tools in many cases. Thus I must disagree that most shops would choose .Net and J2EE and for that reason abandon PHP. I think it makes much more sense – and it is actually what we see happening nowdays in the industry – for companies to combine strengths of .net/J2EE platforms with strengths of the dynamic languages.

    And if anything, I do see serious demand for good PHP programmers – actually, Zend has a number of openings right here :).

  16. Thanks for taking the time to read an older post of mine, and stirring up some interesting talk about it! I have a point or two that I would like to respond to, of course. :)

    “I personally think that claiming PHP is doomed because of that just shows that the author does not take the right perspective of what tasks PHP is meant to do and how in general Web applications work.”

    Yes, and no.

    Multithreading is a tool that more and more developers working on logically complex projects will be needing. For a less complex project, multithreading is really not needed, and often hurts performance. Indeed, if you look at the environment that PHP is typically running in, Apache (or some other HTTP server) handles the multithreading, and the assumption is that no single request will need super power. After all, if your application is so hardcore that you need to do something like block requests so that each user may only have one request running at a time, you will end up writing your own app server anyways.

    It is also interesting to see the breakout of these things. WordPress is a good example. Logically, it is a very simple application. The volume that it handles can be huge. But very, VERY little occurs with each request. As such, multithreading is really not needed. In fact, for many, many applications that PHP (and other Web development langugages, for that matter) are used for, it would be quite possible to have the backend just generating static HTML and updating links and such as needed; it would be just as “labor intensive” the first time just doing it once and letting the Web server efficiently dish out the static HTML.

    However, for a true “application” (subtly differentiated from a Web site, or even a “dynamic Web site”), multithreading is the way of the future, and even the way of the present. Take a look at your CPU history. Up until a year or two ago, Moore’s law made multithreading fairly irrelevant outside of the base OS, and allowing your “Cancel” button to work. No more. Moore’s law continues, but only if you add up the speeds of all of the cores on a physical CPU. The speed per core has barely budged in quite some time! And it will be a while yet before we break 4 gHz. In other words, the current (and future) crops of CPUs are great multithreaders, but are really not so hot for non-multithreaded work.

    Finally, there is the matter of concurrency. Without support for multithreaded concurrency devices like mutex, semaphore, critical sections, monitors, and other locking mechanisms, a highly intensive application blows up quite easily. Without them, too often a database sequence or database constraint is the only arbiter of data integrity. Take the following line of code, for instance (pardon my syntax please):

    $session(“Some integer”) = $session(“Some integer”) + $request(“Some numeric type”);

    Looks good, right? Wrong! No locking! You have *zero* guarantee that $session(“Some integer”) does not change between the read to perform the addition, and the assignment. While the chances of hitting this are next to zero, it illustrates exactly why *every* Web developers needs to know and use locking. Period. A language that cannot support locking is generating applications that are prone to failure under heavy loads in which concurrency problems become quite possible. Doomsday scenario? Maybe. But the applications I have worked on in the past provided zero room for failure. Or, to put it another way, would you want the radar system guiding your plane or the health devices used to deliver your baby to have a failure of this variety, even if it is at the “snowball in hell” level of probablity? Most likely not.

    This is why I think that PHP’s chances in the enterprise are really limited. Is it a good system? To be frankly honest, I think PHP is pretty bad. Is it better than J2EE or .Net? Absolutely, for some tasks, and no way for others. I do not like J2EE or .Net that much either. At the end of the day, most shops can only afford to support one, maybe two systems. That means .Net and J2EE, with PHP as the “odd man out”. I think a search through Monster will confirm that PHP developers are not in high demand. It is a good system for something in which user input is overwhelmingly only requests for data (a “dynamic Web site”) but for “applications”, PHP misses the mark. Multithreading is a good start. I could go on and on about PHP’s shortcomings, but I think that the lack of multithreading support immediately rules it out as a viable option for “applications” at this point.

    J.Ja

  17. I created an IRC/gaming “bot” in PHP which connects to 4 different TCP connections (and when needed, sends stuff out on a 5th UDP socket) using a sort-of pseudo-multithreading environment using socket_select which works insanely nicely. I experimented with using ticks as well but it didn’t end up being needed with the timeout of socket_select in a loop, took care of that for me, at least for what I am doing with it.

  18. A few months back I attended a presentation by Zeev and he demoed the PHP Java bridge. One example showed how you can use the Java bridge to parallelize the retrieval of multiple web pages for processing instead of doing this sequential using normal PHP code. It showed a nice performance improvement (I don’t know the exact numbers anymore but I believe a 5x speed improvement). It nicely demoed that if you want to parallelize some stuff that you could use the Java bridge, but it would be nice if something like this could be done natively inside PHP. The job queue is not an option because using it means your code might be executed once your main script is already finished. The “ticks” don’t seem very usable for this kind of thing either. So yeah, I would like it when PHP get threading support, but there are a lot of tasks which you can do which really don’t need it.

  19. I believe, that multithreading would be useful for PHP. Sometimes, when I work on “LARGE” project, I start thinking about creating an application-server… And I miss 2 things in PHP which would let me do it the way I like: threads and proper FastCGI support

  20. Well, usually, if PHP needs to wait for something, it’s either an exception and doesn’t matter or there’s something wrong with your application. So as you said, PHP is not meant to have them.. :-)

Comments are closed.