-
I recently (read: today) had an obnoxious problem: I'm writing some code for creating an ATOM feed, and kept getting errors about entity-escaped values. Namely, things like ’, •, etc. Even written as entities, Opera and IE7 did not recognize them. I read somewhere that it was necessary to convert the named entities to numbered entities. Great.
Well, PHP doesn't have a native function for this. Why, I do not know...there seems to be functions for many other things, and adding an argument to htmlentities that returns numbered entities would seem easy enough. Either way, I wrote a quick function that takes the htmlentities translation table, adds any missing values that are not in the translation table, and runs the conversion to numbered entities. Check it:
function htmlentities_numbered($string) { $table = get_html_translation_table(HTML_ENTITIES); $trans = array(); foreach($table as $char => $ent) { $trans[$ent] = '&#'. ord($char) .';'; } $trans['€'] = '€'; $trans['‚'] = '‚'; $trans['ƒ'] = 'ƒ'; $trans['„'] = '„'; $trans['…'] = '…'; $trans['†'] = '†'; $trans['‡'] = '‡'; $trans['ˆ'] = 'ˆ'; $trans['‰'] = '‰'; $trans['Š'] = 'Š'; $trans['‹'] = '‹'; $trans['Œ'] = 'Œ'; $trans['‘'] = '‘'; $trans['’'] = '’'; $trans['“'] = '“'; $trans['”'] = '”'; $trans['•'] = '•'; $trans['–'] = '–'; $trans['—'] = '—'; $trans['˜'] = '˜'; $trans['™'] = '™'; $trans['š'] = 'š'; $trans['›'] = '›'; $trans['œ'] = 'œ'; $trans['Ÿ'] = 'Ÿ'; $string = strtr($string, $trans); return $string; }
Hope it's helpful.
UPDATE - apparently, even the numbered entities are not valid XML. Fair enough, I've converted them all to unicode (0x80 - 0x9F). All my ATOM feeds validate now (through feedvalidator.org).