使用PHP的XML中的特殊字符 [英] Special Character in XML using PHP
问题描述
我正在尝试生成一个XML文件,该文件的某些值包含特殊字符,例如μmol/l,x10³单元格/μl等.还需要添加上标的功能.
I am trying to generate a XML file with some of the values that contains special characters such as μmol/l, x10³ cells/µl and many more. also need functionality to put in superscripts.
我使用php.net的ordutf8函数将文本μmol/l编码为类似的形式
I encoded the text μmol/l to something like this using a ordutf8 function from php.net
μmol/l
μmol/l
function ords_to_unistr($ords, $encoding = 'UTF-8'){
// Turns an array of ordinal values into a string of unicode characters
$str = '';
for($i = 0; $i < sizeof($ords); $i++){
// Pack this number into a 4-byte string
// (Or multiple one-byte strings, depending on context.)
$v = $ords[$i];
$str .= pack("N",$v);
}
$str = mb_convert_encoding($str,$encoding,"UCS-4BE");
return($str);
}
function unistr_to_ords($str, $encoding = 'UTF-8'){
// Turns a string of unicode characters into an array of ordinal values,
// Even if some of those characters are multibyte.
$str = mb_convert_encoding($str,"UCS-4BE",$encoding);
$ords = array();
// Visit each unicode character
for($i = 0; $i < mb_strlen($str,"UCS-4BE"); $i++){
// Now we have 4 bytes. Find their total
// numeric value.
$s2 = mb_substr($str,$i,1,"UCS-4BE");
$val = unpack("N",$s2);
$ords[] = $val[1];
}
return($ords);
}
我已使用PHPExcel成功地将此代码转换回"richtext",以生成Excel文档和PDF,但现在需要将其放入XML.
I have sucessfully converted this code back to "richtext" using PHPExcel to generate Excel documents and PDF, but I now need to put it into a XML.
如果我按原样使用&#字符,则会收到一条错误消息,提示
If i use the &# characters as is I get a error message saying
SimpleXMLElement :: addChild():无效的十进制字符值
SimpleXMLElement::addChild(): invalid decimal character value
这里我需要在数据库中提供更多的值,以使其成为"XML"友好型
Here are more values I have in the database that needs to be made "XML" friendly
x10<sup>6</supp>& #32cells/µl
x10<sup>6</sup> cells/µl
从x10 3 细胞/µl转换为
Converted from x103 cells/µl
推荐答案
此处无需对这些字符进行编码. XML字符串可以使用UTF-8或其他编码.根据编码方式,序列化程序将根据需要进行编码.
Here is no need to encode these characters. XML strings can use UTF-8 or another encoding. Depending on the encoding the serializer will encode as necessary.
$foo = new SimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', 'μmol/l, x10³ cells/µl');
echo $foo->asXml();
输出(未编码特殊字符)
Output (special characters not encoded):
<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>μmol/l, x10³ cells/µl</bar></foo>
要强制实体使用特殊字符,您需要更改编码:
To force entities for the special characters, you need to change the encoding:
$foo = new SimpleXmlElement('<?xml version="1.0" encoding="ASCII"?><foo/>');
$foo->addChild('bar', 'μmol/l, x10³ cells/µl');
echo $foo->asXml();
输出(编码特殊字符):
Output (special characters encoded):
<?xml version="1.0" encoding="ASCII"?>
<foo><bar>μmol/l, x10³ cells/µl</bar></foo>
我建议您将自定义编码转换回UTF-8.这样,XML Api可以处理它.如果您想使用自定义编码存储字符串,则需要解决一个错误
I suggest you convert your custom encoding back to UTF-8. That way the XML Api can take care of it. If you like to store string with the custom encoding you need to work around a bug.
像x10<su
这样的字符串会触发SimpleXML/DOM中的错误. SimpleXMLElement::addChild()
和DOMDocument::createElement()
的第二个自变量具有转义符.您需要将内容创建为文本节点并将其附加.
A string like x10<su
triggers a bug in SimpleXML/DOM. The second argument of SimpleXMLElement::addChild()
and DOMDocument::createElement()
have a broken escaping. You need to create the content as text node and append it.
这是扩展SimpleXMLElement并添加解决方法的小类:
Here is a small class that extends SimpleXMLElement and adds a workaround:
class MySimpleXMLElement extends SimpleXMLElement {
public function addChild($nodeName, $content = NULL) {
$child = parent::addChild($nodeName);
if (isset($content)) {
$node = dom_import_simplexml($child);
$node->appendChild($node->ownerDocument->createTextNode($content));
}
return $child;
}
}
$foo = new MySimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', 'x10<su');
echo $foo->asXml();
输出:
<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&#120&#49&#48&#60&#115&#117</bar></foo>
自定义编码中的&
作为实体&
被转义-因为它是XML中的特殊字符. XML解析器将对其进行解码.
The &
from your custom encoding get escaped as the entity &
- because it is an special character in XML. The XML parser will decode it.
$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&#120&#49&#48&#60&#115&#117</bar></foo>
XML;
$foo = new SimpleXMLElement($xml);
var_dump((string)$foo->bar);
输出:
string(27) "x10<su"
这篇关于使用PHP的XML中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!