Node.js示例,用于将Xml转换为大型Xml文件的JSON [英] Node.js Example to convert Xml to JSON for large Xml file

查看:51
本文介绍了Node.js示例,用于将Xml转换为大型Xml文件的JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Node.js的新手.我正在尝试将83个XML文件(每个文件的大小约为400MB)转换为JSON.

I'm relatively new to Node.js. I'm trying to convert 83 XML files that are each around 400MB in size into JSON.

每个文件都包含这样的数据(除了每个元素都有大量的附加语句):

Each file contains data like this (except each element has a large number of additional statements):

<case-file>
  <serial-number>75563140</serial-number>
  <registration-number>0000000</registration-number>
  <transaction-date>20130101</transaction-date>
  <case-file-header>
     <filing-date>19981002</filing-date>
     <status-code>686</status-code>
     <status-date>20130101</status-date>
  </case-file-header>
  <case-file-statements>
     <case-file-statement>
        <type-code>D10000</type-code>
        <text>"MUSIC"</text>
     </case-file-statement>
     <case-file-statement>
        <type-code>GS0351</type-code>
        <text>compact discs</text>
     </case-file-statement>
  </case-file-statements>
  <case-file-event-statements>
     <case-file-event-statement>
        <code>PUBO</code>
        <type>A</type>
        <description-text>PUBLISHED FOR OPPOSITION</description-text>
        <date>20130101</date>
        <number>28</number>
     </case-file-event-statement>
     <case-file-event-statement>
        <code>NPUB</code>
        <type>O</type>
        <description-text>NOTICE OF PUBLICATION</description-text>
        <date>20121212</date>
        <number>27</number>
     </case-file-event-statement>
   </case-file-event-statements>

我尝试了很多不同的Node模块,包括sax,node-xml,node-expat和xml2json.显然,我需要从文件中流式传输数据,并将其通过XML解析器进行管道传输,然后将其转换为JSON.

I have tried a lot of different Node modules, including sax, node-xml, node-expat and xml2json. Obviously, I need to stream the data from the file and pipe it through an XML parser and then convert it to JSON.

我还尝试阅读一些博客等,试图从表面上解释如何解析Xml.

I have also tried reading a number of blogs, etc. attempting to explain, albeit superficially, how to parse Xml.

在Node Universe中,我首先尝试了sax,但是我不知道如何以可以将其转换为JSON的格式提取数据. xml2json在流上不起作用. node-xml看起来令人鼓舞,但我无法弄清楚它如何以任何有意义的方式解析块. node-expat指向libexpat文档,该文档似乎需要博士学位. Node elementree做了同样的事情,指向了Python的实现,但是没有解释已经实现了什么或如何使用它.

In the Node universe, I tried sax first but I can't figure out how to extract the data in a format that I can convert it to JSON. xml2json won't work on streams. node-xml looks encouraging but I can't figure out how it parses chunks in any manner that makes sense. node-expat points to libexpat documentation, which appears to requires a Ph.D. Node elementree does the same, pointing to the Python implementation but doesn't explain what has been implemented or how to use it.

有人可以指出我可以用来入门的例子吗?

Can someone point me to example that I could use to get started?

推荐答案

尽管这个问题已经很久了,但我要分享我的问题&该解决方案可能对所有尝试将XML转换为JSON的人都有用.

Although this question is quite old, I am sharing my problem & solution which might be helpful to all who are trying to convert XML to JSON.

这里的实际问题不是转换,而是处理巨大的XML文件而不必一次将它们保存在内存中.

The actual problem here is not the conversion but processing huge XML files without having to hold them in memory at once.

在使用几乎所有广泛使用的软件包时,我遇到了以下问题-

Working with almost all widely used packages, I came across following problem -

  • 许多软件包都支持从XMLJSON的转换,涵盖了所有情况,但它们不适用于大文件.

  • A lot of packages support XML to JSON conversion covering all scenarios but they don't work well with large files.

非常少的软件包(例如 xml-flow xml-stream )支持大型XML文件转换,但是转换过程会漏掉一些极端情况,在这种情况下,转换要么失败或给出了不可预测的JSON结构(在此SO问题中进行了解释 ).

Very few packages (like xml-flow, xml-stream) support large XML file conversion but the conversion process misses out few corner case scenarios where the conversion either fails or gives unpredictable JSON structure (explained in this SO question).

理想的解决方案是结合这两种方法的优点,这正是我所做的,并且是 xtreamer节点程序包.

The ideal solution would be to combine the advantages from both the approaches which is exactly what I did and came up with xtreamer node package.

简而言之,xtreamerxml-flow/xml-stream一样接受重复节点,但是发出重复的xml节点而不是转换后的JSON.这提供了以下优点-

In simple words, xtreamer accepts repeating node just like xml-flow / xml-stream but emits repeating xml nodes instead of converted JSON. This provides following advantages -

  • 我们可以在扩展transform stream的任何可读流中通过管道传送xtreamer.
  • 发出的XML节点可以传输到任何XML到JSON解析器中,以获取所需的JSON.
  • 我们可以更进一步,并使用xtreamer&它将调用JSON解析器并相应地发出JSON.
  • xtreamerstream作为其唯一依赖项&作为转换流扩展,可以灵活地与其他流进行管道传输.
  • We can pipe xtreamer with any readable stream as it extends transform stream.
  • The emitted XML nodes can be transferred to any XML to JSON parser to get desired JSON.
  • We can go one step further and hook up the JSON parser with xtreamer & it will invoke the JSON parser and emit JSON accordingly.
  • xtreamer has stream as its only dependency & being a transform stream extension, it can be piped with other streams flexibly.

如果XML结构不固定怎么办?

What if XML structure is not fixed?

我设法提出了另一个基于sax的节点程序包 xtagger ,该程序读取XML文件,并以以下格式提供文件的结构-

I managed to come up with another sax based node package xtagger which reads the XML file and provides the structure of the file in following format -

structure: { [name: string]: { [hierarchy: number]: number } };

此程序包允许找出重复的节点名称,然后将其传递给xtreamer进行解析.

This package allows to figure out the repeating node name which can then be passed to xtreamer for parsing.

我希望这会有所帮助. :)

I hope this helps. :)

这篇关于Node.js示例,用于将Xml转换为大型Xml文件的JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆