PHP Simplexml_Load_File 失败 [英] PHP Simplexml_Load_File fails
问题描述
我已成功获得 xml 格式的发布结果页面并将内容写入本地文件Publications.xml".问题是当我使用 simplexml_load_file("Publications.xml") 时,它失败了.无法弄清楚为什么.
在最后但第二行,解析器失败,我收到消息不能".我已经仔细检查了 xml 文件,它看起来状况良好.
如果有人能告诉我有关此问题的任何解决方法,我将不胜感激.这是上面的 PHP 脚本尝试读取的 xml 文件的副本(http://pastebin.com/U0fEKmZL):
<预><PubmedArticle><MedlineCitation Status="Publisher" Owner="NLM"><PMID Version="1">23314841</PMID><创建日期><年>2013</年><月>1</月><Day>14</Day></DateCreated><Article PubModel="Print-Electronic"><期刊><ISSN IssnType="电子">1432-0932</ISSN><JournalIssue CitedMedium="互联网"><发布日期><年>2013</年><月>一月</月><Day>12</Day></PubDate>...(太长,见链接)
出于某种原因,pubmed 服务器将整个 XML 文件作为 HTML 文件返回,其中包含单个 <pre>
标记XML.它还包含多个 XML 片段(有多个 <PubmedArticle>
元素并且它们周围没有容器).显然,这是为了由一些古怪的自定义代码处理.
您可以通过调用 SimpleXML 两次来解包"XML,如下所示:
$outer_xml = simplexml_load_file($local);$inner_xml = simplexml_load_string('<dummyContainer>' . (string)$outer_xml .'</dummyContainer>');foreach ( $inner_xml->PubmedArticle 作为 $article ){//等等}
解释:
- 外部XML 文档"是 HTML,它具有
- 将其转换为字符串(为了清晰和良好的习惯,我已经明确地使用
(string)
完成)将为您提供<pre>
标记的内容,即所有
元素 - 将该内容包装在
标记中将为您提供一个有效的 XML 文档,其中每个
元素作为顶级子元素在文档中
I have successfully been able to get a pubmed results page in xml format and write the contents to a local file "Publications.xml". The problem is when I use simplexml_load_file("Publications.xml"), it fails. Not able to figure out why.
<?php
$feed = 'http://www.ncbi.nlm.nih.gov/pubmed?term=carl&sort=pubdate&report=xml';
$local = 'Publications.xml';
$curtime = time();
$filemodtime;
if( (!file_exists($local)) || (time() - filemtime($local)) > 86400 )
{
$contents = file_get_contents($feed);
$fp = fopen($local,"w");
fwrite($fp, $contents);
fclose($fp);
}
$xml = simplexml_load_file($local) or ("Can't");
?>
On the last but the second line the parser fails and I get the message "Can't". I have double checked the xml file and it appears to be in a good shape.
If anyone can let me know about any workarounds for this one, I will be very grateful. Here's a copy of the xml file the PHP script above tries to read (http://pastebin.com/U0fEKmZL):
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<pre>
<PubmedArticle>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">23314841</PMID>
<DateCreated>
<Year>2013</Year>
<Month>1</Month>
<Day>14</Day>
</DateCreated>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1432-0932</ISSN>
<JournalIssue CitedMedium="Internet">
<PubDate>
<Year>2013</Year>
<Month>Jan</Month>
<Day>12</Day>
</PubDate>
... (too long, see link)
For some reason, the pubmed server is returning that entire XML file as an HTML file with a single <pre>
tag containing the XML. It also contains multiple XML fragments (there's several <PubmedArticle>
elements and no container around them). Clearly this is intended to be processed by some wacky custom code.
You could "unwrap" the XML by calling SimpleXML twice, like so:
$outer_xml = simplexml_load_file($local);
$inner_xml = simplexml_load_string('<dummyContainer>' . (string)$outer_xml . '</dummyContainer>');
foreach ( $inner_xml->PubmedArticle as $article )
{
// etc
}
To explain:
- the outer "XML document" is the HTML, which has a single outer element of
<pre>
- casting that to string (which I've done explicitly with
(string)
for clarity and good habit) will give you the contents of that<pre>
tag, i.e. all the<PubmedArticle>
elements - wrapping that content in a
<dummyElement>
tag will give you a valid XML document, with each of the<PubmedArticle>
elements as a top-level child in the document
这篇关于PHP Simplexml_Load_File 失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!