201008.17

PHP's preg functions don't release memory??

We were writing some parsing code for a client today. It takes a long string (html) and parses it out into array items. It loops over the string recursively and running a few preg_replaces on it every pass. We got "out of memory" errors when running it. After putting in some general stats, we found that memory usage was climbing 400k after each block of preg_replaces, which was being added on each loop (there were around 600 loops or so). This memory just grew and grew, even though the recursion at most got 6 levels deep. It was never being released.

I did some reading and found that the preg* functions cache up to 4096 regex results in a request. This is the problem...a pretty stupid one too. It would be nice if they made this a configurable option or at least let you turn it off when, say, you are running a regex on a different string every time (why the hell would I run the same regex on the same string twice...isn't that what variables are for?) Unless I'm misunderstanding and PHP caches the compiled regex (but not its values)...but either way, memory was climbing based on the length of the string.

Since the regex was only looking at the beginning of the string and disregarding the rest (thank god), the fix was easy (although a bit of a hack):

$val = preg_replace('/.../', '', $long_string);

Becomes:

$short_string = substr($long_string, 0, 128);
$val = preg_replace('/.../', '', $short_string);

PHP guys: how about an option to make preg* NOT have memory leaks =).