如何告诉DOMDocument-> load()我想要使用哪种编码? [英] How do I tell DOMDocument->load() what encoding I want it to use?

查看:106
本文介绍了如何告诉DOMDocument-> load()我想要使用哪种编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从其他地方搜索并处理XML文件,并且需要
将它们与某些XSLT进行转换。没问题。使用PHP5和DOM
库,一切都很快。工作得很好,直到现在。今天,简单的
字符在XML文件中 - 来自Word的智能引号,它看起来像
一样。无论如何,DOMDocument->加载抱怨他们,说
他们不是UTF-8,并指定编码。



这些XML文件中未指定编码。如果我
添加'encoding =iso-8859-1'到标题,它工作正常。擦除是
我无法控制这些XML文件。



将文件读入字符串,修改其标题并将其写回
到另一个位置似乎是我唯一的选择,但我宁愿做
它,而不必使用XML文件的临时副本。
有什么办法简单地告诉解析器来解析它们,就好像它们是
iso-8859-1?

解决方案

这是否适用于您?

  $ doc = new DOMDocument('1.0','iso-8859- 1' ); 
$ doc-> load($ xmlPath);

编辑:由于看起来这不行,可以改为与现有方法类似,但没有临时文件。只需使用标准IO操作( file_get_contents()或某些东西)从源中读取XML文件,然后对所需的编码执行任何更改( iconv () utf8_decode()),然后使用 loadXML()

  $ myXMLString = file_get_contents($ xmlPath); 
$ myXMLString = utf8_decode($ myXMLString);
$ doc = new DOMDocument('1.0','iso-8859-1');
$ doc-> loadXML($ myXMLString);


I search for and process XML files from elsewhere, and need to transform them with some XSLTs. No problem. Using PHP5 and the DOM library, everything's a snap. Worked fine, up till now. Today, funky characters were in the XML file -- "smart" quotes from Word, it looks like. Anyways, DOMDocument->load complained about them, saying that they weren't UTF-8, and to specify the encoding.

Lo and behold, the encoding is not specified in these XML files. If I add in 'encoding="iso-8859-1"' to the header, it works fine. The rub is I have no control over these XML files.

Reading the file into a string, modifying its header and writing it back out to another location seems to be my only option, but I'd prefer to do it without having to use temporary copies of the XML files at all. Is there any way to simply tell the parser to parse them as if they were iso-8859-1?

解决方案

Does this work for you?

$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->load($xmlPath);

Edit: Since it appears that this doesn't work, what you could do instead is similar to your existing method but without the temp file. Read the XML file from your source just using standard IO operations (file_get_contents() or something), then perform whatever changes to the encoding you need (iconv() or utf8_decode()) and then use loadXML()

$myXMLString = file_get_contents($xmlPath);
$myXMLString = utf8_decode($myXMLString);
$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->loadXML($myXMLString);

这篇关于如何告诉DOMDocument-> load()我想要使用哪种编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆