使用Javascript从RSS XML提取CDATA [英] Extract CDATA from RSS XML using Javascript

查看:101
本文介绍了使用Javascript从RSS XML提取CDATA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用JS提取了RSS feed内容,但是'Description'节点包含CDATA,我想将其拆分。

I have extracted RSS feed content using JS, however the 'Description' node contains CDATA and I want to split this out.

例如,对于每个Description节点在项目下,我只想提取< b>简短说明:< / b> 到第一个< / div>

For example, for each Description node under Item I would like to extract only the content that is from <b>Brief Description:</b> to the first </div> .

这可能吗?下面是我到目前为止所拥有的范例以及下面RSS提要中的xml。

Is this possible? Below is an exmaple of what I have thus far and also the xml from the RSS feed below.

希望有人可以提供帮助:)

Hope someone can help :)

脚本示例

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>


推荐答案

CDATA部分的内容仅为文本,因此您可以t使用DOM进一步解析其内容。您可以使用 DOMParser()将CDATA节的字符串内容重新构造回XML并从那里使用DOM方法,或者使用正则表达式。

CDATA section content is just text, so you can't parse its contents further using the DOM. You can either use DOMParser() to reconstruct the string contents of the CDATA section back into XML and use DOM methods from there, or else use regular expressions.

要使用后一种方法,请将 document.write()行更改为:

To use the latter approach, change your document.write() line to this:

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

使用前一种方法,在这种情况下不理想,但在其他情况下可能会有所帮助,您可以在for循环中执行此操作:

To use the former approach, which is less than ideal in this case but could be helpful in other situations, you could do this inside the for loop:

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...但请确保仅调用 media()

...but being sure to only invoke media() after the DOM content has loaded.

也许您有一些充分的理由,但是根据您提供的代码,它会很多这样做更简单:

And maybe you have some good reason for it, but based on the code you supplied, it'd be a lot simpler just to do this:

for (i=1; i<description.length; i++) {

...然后忘记a和b(即,将b更改为i)

...and forget about a and b (i.e., change b to i)

还有一个提示:如果您自己构建RSS,请注意,您将无法使用嵌套在CDATA部分中的CDATA部分。

And one tip: if you construct the RSS yourself, note that you won't be able to use CDATA sections nested within CDATA sections.

这篇关于使用Javascript从RSS XML提取CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆