如何从Node Sax中的XML文件读取CDATA [英] How to read CDATA from an XML file in Node Sax

查看:110
本文介绍了如何从Node Sax中的XML文件读取CDATA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的XML结构:

I have an XML structure like this:

<?xml version="1.0" encoding="utf-8"?>
<videos>
    <video>
        <id>47288</id>
        <thumbs>
            <thumb><![CDATA[http://foo.com/bar.jpg]]></thumb>
        </thumbs>
        <link><![CDATA[http://foo.com/bar.html]]></link>
        <title><![CDATA[Sample Title Here]]></title>
        <categories>
            <category><![CDATA[Cat1]]></category>
            <category><![CDATA[Cat2]]></category>
        </categories>
        <tags>
            <tag><![CDATA[Tag1]]></tag>
            <tag><![CDATA[Tag2]]></tag>
            <tag><![CDATA[Tag3]]></tag>
            <tag><![CDATA[Tag4]]></tag>
            <tag><![CDATA[Tag5]]></tag>
            <tag><![CDATA[Tag6]]></tag>
        </tags>
        <duration><![CDATA[9:57]]></duration>
        <pubDate><![CDATA[2013-12-17]]></pubDate>
    </video>
    // insert 200,000 more <video> entries here

不知道为什么都将其写为CDATA,但是我对此无能为力,这是我得到的数据。我读取此大型(1.5gb)XML文件的代码是使用fs将其流式传输到sax,然后传输到saxpath,如下所示:

No idea why this is all written as CDATA but there's not much I can do about it, it's the data I've been given. My code to read this massive (1.5gb) XML file is to stream it using fs to sax then to saxpath, like so:

var saxpath = require('saxpath')
var fs = require('fs')
var sax = require('sax')
var parseString = require('xml2js').parseString;
var util = require('util');

var saxParser = sax.createStream(true)
var streamer = new saxpath.SaXPath(saxParser, '/videos/video')

streamer.on('match', function(xml) {
    console.log(xml);
    parseString(xml, function (err, result) {
        var json1 = JSON.stringify(result);
        var json = JSON.parse(json1);
        console.log(util.inspect(json, false, null));
    });

});

fs.createReadStream('./xml/big_data_file.xml').pipe(saxParser)

但是,当我进入console.log(xml)时,它显示以下内容:

However, when I get to the console.log(xml), it shows this:

<video>
    <id>620339</id>
    <thumbs>
        <thumb></thumb>
    </thumbs>
    <link></link>
    <title></title>
    <categories>
        <category></category>
        <category></category>
    </categories>
    <tags>
        <tag></tag>
        <tag></tag>
        <tag></tag>
        <tag></tag>
        <tag></tag>
        <tag></tag>
        <tag></tag>
    </tags>
    <duration></duration>
    <pubDate></pubDate>
</video>

里面没有数据。 Saxpath Docs 中没有提及CDATA,尽管我不确定这是否是< a href = https://github.com/StevenLooman/saxpath rel = nofollow> Saxpath 或 Sax 本身。

No data inside whatsoever. There's no mention of CDATA in the Saxpath Docs, although I'm not sure if this is an issue with Saxpath or Sax itself.

有什么办法可以解决这个问题吗?

Any ideas how I can remedy this?

干杯!

推荐答案

这是SaXPath 0.5.4的局限性,刚刚推送到npm的v0.5.5现在可以处理CDATA(请参见提交)。

That's a limitation of SaXPath 0.5.4, v0.5.5 that was just pushed to npm now handles CDATA (see commit) as you would expect.

使用完全相同的代码和SaXPath的最新版本:

With the exact same code and the last version of SaXPath:

<video>
        <id>47288</id>
        <thumbs>
            <thumb><![CDATA[http://foo.com/bar.jpg]]></thumb>
        </thumbs>
        <link><![CDATA[http://foo.com/bar.html]]></link>
        <title><![CDATA[Sample Title Here]]></title>
        <categories>
            <category><![CDATA[Cat1]]></category>
            <category><![CDATA[Cat2]]></category>
        </categories>
        <tags>
            <tag><![CDATA[Tag1]]></tag>
            <tag><![CDATA[Tag2]]></tag>
            <tag><![CDATA[Tag3]]></tag>
            <tag><![CDATA[Tag4]]></tag>
            <tag><![CDATA[Tag5]]></tag>
            <tag><![CDATA[Tag6]]></tag>
        </tags>
        <duration><![CDATA[9:57]]></duration>
        <pubDate><![CDATA[2013-12-17]]></pubDate>
</video>

以及 xml2js 的解析结果:

{ video: 
   { id: [ '47288' ],
     thumbs: [ { thumb: [ 'http://foo.com/bar.jpg' ] } ],
     link: [ 'http://foo.com/bar.html' ],
     title: [ 'Sample Title Here' ],
     categories: [ { category: [ 'Cat1', 'Cat2' ] } ],
     tags: [ { tag: [ 'Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5', 'Tag6' ] } ],
     duration: [ '9:57' ],
     pubDate: [ '2013-12-17' ] } }

这篇关于如何从Node Sax中的XML文件读取CDATA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆