I recently (read: today) had an obnoxious problem: I'm writing some code for creating an ATOM feed, and kept getting errors about entity-escaped values. Namely, things like ’, •, etc. Even written as entities, Opera and IE7 did not recognize them. I read somewhere that it was necessary to convert the named entities to numbered entities. Great.
Well, PHP doesn't have a native function for this. Why, I do not know...there seems to be functions for many other things, and adding an argument to htmlentities that returns numbered entities would seem easy enough. Either way, I wrote a quick function that takes the htmlentities translation table, adds any missing values that are not in the translation table, and runs the conversion to numbered entities. Check it:
function htmlentities_numbered($string)
{
$table = get_html_translation_table(HTML_ENTITIES);
$trans = array();
foreach($table as $char => $ent)
{
$trans[$ent] = '&#'. ord($char) .';';
}
$trans['€'] = '€';
$trans['‚'] = '‚';
$trans['ƒ'] = 'ƒ';
$trans['„'] = '„';
$trans['…'] = '…';
$trans['†'] = '†';
$trans['‡'] = '‡';
$trans['ˆ'] = 'ˆ';
$trans['‰'] = '‰';
$trans['Š'] = 'Š';
$trans['‹'] = '‹';
$trans['Œ'] = 'Œ';
$trans['‘'] = '‘';
$trans['’'] = '’';
$trans['“'] = '“';
$trans['”'] = '”';
$trans['•'] = '•';
$trans['–'] = '–';
$trans['—'] = '—';
$trans['˜'] = '˜';
$trans['™'] = '™';
$trans['š'] = 'š';
$trans['›'] = '›';
$trans['œ'] = 'œ';
$trans['Ÿ'] = 'Ÿ';
$string = strtr($string, $trans);
return $string;
}
Hope it's helpful.
UPDATE - apparently, even the numbered entities are not valid XML. Fair enough, I've converted them all to unicode (0x80 - 0x9F). All my ATOM feeds validate now (through feedvalidator.org).