使用PHP的XML中的特殊字符 [英] Special Character in XML using PHP

查看:283
本文介绍了使用PHP的XML中的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试生成一个XML文件,该文件的某些值包含特殊字符,例如μmol/l,x10³单元格/μl等.还需要添加上标的功能.

I am trying to generate a XML file with some of the values that contains special characters such as μmol/l, x10³ cells/µl and many more. also need functionality to put in superscripts.

我使用php.net的ordutf8函数将文本μmol/l编码为类似的形式

I encoded the text μmol/l to something like this using a ordutf8 function from php.net

&#956&#109&#111l&#47&#108

&#956&#109&#111&#108&#47&#108

function ords_to_unistr($ords, $encoding = 'UTF-8'){
    // Turns an array of ordinal values into a string of unicode characters
    $str = '';
    for($i = 0; $i < sizeof($ords); $i++){
        // Pack this number into a 4-byte string
        // (Or multiple one-byte strings, depending on context.)               
        $v = $ords[$i];
        $str .= pack("N",$v);
    }
    $str = mb_convert_encoding($str,$encoding,"UCS-4BE");
    return($str);           
}

function unistr_to_ords($str, $encoding = 'UTF-8'){       
    // Turns a string of unicode characters into an array of ordinal values,
    // Even if some of those characters are multibyte.
    $str = mb_convert_encoding($str,"UCS-4BE",$encoding);
    $ords = array();

    // Visit each unicode character
    for($i = 0; $i < mb_strlen($str,"UCS-4BE"); $i++){       
        // Now we have 4 bytes. Find their total
        // numeric value.
        $s2 = mb_substr($str,$i,1,"UCS-4BE");                   
        $val = unpack("N",$s2);           
        $ords[] = $val[1];               
    }       
    return($ords);
}

我已使用PHPExcel成功地将此代码转换回"richtext",以生成Excel文档和PDF,但现在需要将其放入XML.

I have sucessfully converted this code back to "richtext" using PHPExcel to generate Excel documents and PDF, but I now need to put it into a XML.

如果我按原样使用&#字符,则会收到一条错误消息,提示

If i use the &# characters as is I get a error message saying

SimpleXMLElement :: addChild():无效的十进制字符值

SimpleXMLElement::addChild(): invalid decimal character value

这里我需要在数据库中提供更多的值,以使其成为"XML"友好型

Here are more values I have in the database that needs to be made "XML" friendly

&#120&#49&#48<&#115&#117&#112>&#54&#60&#47&#115u&#112&#112&#62& #32c&#101ll&#115/µ&#108

&#120&#49&#48&#60&#115&#117&#112&#62&#54&#60&#47&#115&#117&#112&#62&#32&#99&#101&#108&#108&#115&#47&#181&#108

从x10 3 细胞/µl转换为

Converted from x103 cells/µl

推荐答案

此处无需对这些字符进行编码. XML字符串可以使用UTF-8或其他编码.根据编码方式,序列化程序将根据需要进行编码.

Here is no need to encode these characters. XML strings can use UTF-8 or another encoding. Depending on the encoding the serializer will encode as necessary.

$foo = new SimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', 'μmol/l, x10³ cells/µl'); 
echo $foo->asXml();

输出(未编码特殊字符)

Output (special characters not encoded):

<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>μmol/l, x10³ cells/µl</bar></foo>

要强制实体使用特殊字符,您需要更改编码:

To force entities for the special characters, you need to change the encoding:

$foo = new SimpleXmlElement('<?xml version="1.0" encoding="ASCII"?><foo/>');
$foo->addChild('bar', 'μmol/l, x10³ cells/µl');
echo $foo->asXml();

输出(编码特殊字符):

Output (special characters encoded):

<?xml version="1.0" encoding="ASCII"?>
<foo><bar>&#956;mol/l, x10&#179; cells/&#181;l</bar></foo>

我建议您将自定义编码转换回UTF-8.这样,XML Api可以处理它.如果您想使用自定义编码存储字符串,则需要解决一个错误

I suggest you convert your custom encoding back to UTF-8. That way the XML Api can take care of it. If you like to store string with the custom encoding you need to work around a bug.

&#120&#49&#48&#60&#115&#117这样的字符串会触发SimpleXML/DOM中的错误. SimpleXMLElement::addChild()DOMDocument::createElement()的第二个自变量具有转义符.您需要将内容创建为文本节点并将其附加.

A string like &#120&#49&#48&#60&#115&#117 triggers a bug in SimpleXML/DOM. The second argument of SimpleXMLElement::addChild() and DOMDocument::createElement() have a broken escaping. You need to create the content as text node and append it.

这是扩展SimpleXMLElement并添加解决方法的小类:

Here is a small class that extends SimpleXMLElement and adds a workaround:

class MySimpleXMLElement extends SimpleXMLElement {

  public function addChild($nodeName, $content = NULL) {
    $child = parent::addChild($nodeName);
    if (isset($content)) {
      $node = dom_import_simplexml($child);
      $node->appendChild($node->ownerDocument->createTextNode($content));
    }
    return $child;
  }
}

$foo = new MySimpleXmlElement('<?xml version="1.0" encoding="UTF-8"?><foo/>');
$foo->addChild('bar', '&#120&#49&#48&#60&#115&#117'); 
echo $foo->asXml();

输出:

<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&amp;#120&amp;#49&amp;#48&amp;#60&amp;#115&amp;#117</bar></foo>

自定义编码中的&作为实体&amp;被转义-因为它是XML中的特殊字符. XML解析器将对其进行解码.

The & from your custom encoding get escaped as the entity &amp; - because it is an special character in XML. The XML parser will decode it.

$xml = <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<foo><bar>&amp;#120&amp;#49&amp;#48&amp;#60&amp;#115&amp;#117</bar></foo>
XML;

$foo = new SimpleXMLElement($xml);
var_dump((string)$foo->bar);

输出:

string(27) "&#120&#49&#48&#60&#115&#117"

这篇关于使用PHP的XML中的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆