Sunday, January 13, 2008

Three Flavors of mod_*

Ian Bicking talks about the conceptual flavors of web deployment. Go read that because this post is only tangential to his point.

Under the Apache 1.x series I have used mod_perl, mod_php, and mod_python. All three do things slightly differently with regards to loading code. Keep in mind apache 1.x only uses the pre-forking model so each request handling process starts life as an exact copy of the root process. The root process's main job is to spawn child processes. [everything below was true in 2000 and for apache 1.x. Don't make decisions based on my recolections unless you plan on running some ancient servers.]

mod_perl allows you to preload scripts when the root process starts. This means no matter how large your code base you only have to compile and execute it once and then every child process starts life with the same initialized modules already in place. At a perl-based company where I worked the preload script was a single line
    use Everything;
The 'Everything' package loaded, well, everything (is this a standard perl-ism nowadays?). Everything.pm was the most delicate module in a large code base. It imported all the packages in the code base in a way that avoided cycles. As a downside the server couldn't use apache's "graceful" restart because the root server didn't have a way to unload the perl runtime and reload Everything. Besides reloading everything was slooow on the 400MHz CPUs of the day. The work-around was to have a proxy server that could point to one of two local ports that ran mod_perl. To gracefully switch the unused app server was stopped and restarted and the proxy was repointed to the new local port.

mod_php is a bit of yuck by comparison. Or a joy for the reasons Bicking states. The yuck is that mod_php compiles and executes scripts anew on every request. This is great for shared hosts as it acts like CGI but with less overhead because it doesn't have to fork on every request. This sucks for application servers that know they want to run the same code over and over again. It gets worse when the codebase is large and there is a package like Everything.pm that loads the entire codebase of a good sized application on every request [yes, this post is somewhat autobiographical].

mod_python is somewhere between mod_perl and mod_php. The first time a child handles a request it pays the price of loading and executing a script. After that all requests handled by the child are quick because sys.modules still contains all the modules loaded by previous runs. mod_python could support preloading modules in the parent process like mod_perl with a trivial patch [see the mod_python archives circa 2001 for the patch I submitted] but it was decided that the security risk (because the parent process usually starts as root) outweighed the benefit of child nodes having a longer startup time.

In conclusion I would urge you to read the source of the mod_whatever for the language you use. They are small and all work give-or-take the same. It also gives you some idea of how the internals of your favorite language work. My first trip was into mod_perl and I found it readable but with a lot of baggage. When I read mod_php I thought "what a hack! Is this broken on purpose?" Finally I read mod_python and with my perl background I was aghast at how simple and clean the interpreter related code was.

See you at PyCon!

No comments: