PHP was started as a template engine but it's grown into a full featured language, so much so that for some, keeping logic and presentation separate is a real challenge. That's led to an entire class of "template engines" being implemented on PHP. On one extreme, these engines provide a wholly distinct template markup language that in turn is compiled into PHP - e.g. Smarty. On the other side are contenders like Savant - but the unanswered question is, how do they perform?
Savant and similar template engines aren't really template engines at all - they are PHP classes that are designed to help guide the programmer to keep separation of logic and presentation, but the templates themselves are PHP and the template author therefore still has access to all things PHP - including the power to push things in the template that probably don't belong there. The most notable example of this is Savant.
If you're designing for high performance, the selection of templating engine is important. Most of your requests will probably encounter the template engine in some regard, and it's not hard to imagine instances in which the time spent in the template engine is more than we think it should be.
So, I've decided to do some very simple analysis of how the template engines behave, in order to figure out whether the convenience comes at a cost - and if so, how much. My hypothesis is that APC opcode cache settings will have a significant impact. Keeping with APC best practices, all pathnames (wherever possible) are absolute. In the one case that wasn't possible (Savant), I provided it with enough information to be able to construct an absolute pathname internally.
For a control, I wrote a simple PHP script that echoed 'Hello world' to the user. In addition, I had it issue a getcwd() call, because getcwd() is a one-time expense that all php scripts written for apc.stat=0 of moderate complexity will have a call to getcwd().
<?php getcwd(); echo "Hello world!";
And a script in Smarty:
<?php
$cwd = getcwd();
include("$cwd/libs/Smarty.class.php");
$tpl = new Smarty();
$tpl->template_dir = "$cwd";
$tpl->compile_dir = "$cwd/compiled";
$tpl->cache_dir = "$cwd/cached";
$tpl->assign('var', 'Hello world!');
echo $tpl->fetch("$cwd/smarty.tpl");
With the template:
{$var}
And a corresponding script with Savant:
<?php
include("/usr/local/share/pear/Savant3.php");
$tpl = new Savant3(array('template_path' => getcwd()));
$tpl->assign('value',"Hello world!");
$tpl->display("savant.tpl");
With a template savant.tpl:
<?php echo $this->value ?>
My method of testing is to modify php.ini, restart apache, and then run the testing script twice, using the second's results. The first run allows APC to cache it (a thousand times over, but hey) and takes care of other one-time or fixed costs (making sure the relevant apache pages are resident, etc.). For each script, I placed a single url that referred to it in {scriptbasename}.url. My testing script was then:
#!/bin/sh for i in *.url do printf "$i:n" && http_load -fetches 1000 -parallel 1 $i ; printf "nn" done
I then proceeded to run and record the output for three different apache configurations: apc off, apc on with stat on, and apc off with stat off. For each, I assumed a normal distribution of msecs/first-response and used the standard deviation provided by my hacked http_load to plot probability distribution functions for comparisons.
First, with apc off:

As you'll notice, with APC off there's a significant difference between all three templating engines. It seems the expense of using the template engine is in the range of 6-10 milliseconds. In a busier application or on a more loaded server, don't underestimate this value of this - that's 6 to 10 milliseconds that your user is waiting, and 6 to 10 milliseconds that the apache thread can't serve anyone else.
There's more variance in the slower populations. I'm going to guess that the main reason for this is that every time we have a blocking syscall (e.g. disk IO), we're at the mercy of the kernel in deciding when our process next gets a turn running.
Let's take a look at the data with APC on and stat on:

And with stat turned off:

This one was a bit surprising - hardly any significant difference.
Now, I'm going to go out on a limb here - in favor of savant and smarty - and assume that template complexity and length affect all template approaches equally. I do not actually believe this to be the case - especially as you nest templates, running a stat() (or, heaven forbid, an open()) on each nested template on each pageview... well, it's not attractive. But I make this assumption because it allows me to compare something useful: the constant overhead for each approach, as measured by milliseconds until first response. So by looking at the differences in millisecond first response, we're able to figure out how many milliseconds per request we're giving up in order to gain the complexity of a template engine. It looks like we're giving up more than 1 and less than 2 milliseconds - not the end of the world.
But let's question that assumption. Where is the time being spent? To answer the question, I used ktrace and httpd -X to watch what syscalls apache had to perform with regards to savant/smarty and the templates themselves. Obviously with apc off, we're going to have to parse every included file, and with it on and stat on, we're going to have to stat them all. But what about with apc enabled and apc.stat off?
What I would hope to see for each test is that, after the first request, there are no open calls per request, and the only stat is on the script I'm requesting (apache, making sure that the script is readable). This would be indicative of apc being able to cache the entire script in one cache entry, which is our ideal.
What I found:
Requests for direct.php had to stat direct.php on each request. The ideal.
Requests for smarty.php had to stat smarty.php, smarty.tpl, and the compiled smarty.tpl.php on each request. It seems that the smarty.tpl and smarty.tpl.php are either not included with an absolute path, or are included conditionally, or both. Either way, three times as many filesystem syscalls to our docroot per request is not fun, and I suspect it's a good part of the reason why direct.php was able to pull roughly three times as many fetches per second.
Requests for savant.php had to stat savant.php, and actually open (gasp) Filter.php and Plugin.php on each request. Does this mean that the PHP compiler had to do any lifting on these files? Maybe, maybe not - but looking into that isn't worth doing, when it's easy enough to fix - have Savant ditch the load-time plugins, or just don't use Savant. It's important to note there that because we avoided a stat on the template itself, we can feel a bit more confident that using nested templates won't make us regret living.
Next, I used a very small PHP object template system that I borrowed from massassi.com, named it sft for 'small and fast template,' and benchmarked it in the last testbed setup.

Tests with ktrace confirmed that the only stat/open in the docroot per request was the stat of the requested script. We're rewarded with a msec/first-response that's about half of savant or smarty's. It's still a bit slower than a simple echo from inside the script we're running, but I think it's worth it.