将大型XML文件转换为关系数据库 [英] Converting large XML file to relational database

查看：110 发布时间：2019/6/6 10:48:20 javascript python xml node.js relational-database

本文介绍了将大型XML文件转换为关系数据库的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试找出实现以下目标的最佳方法：

I'm trying to figure out the best way to accomplish the following:

每天下载大型XML（1GB）文件来自第三方网站的基础

将该XML文件转换为我服务器上的关系数据库

添加搜索数据库的功能

对于第一部分，这是需要手动完成的，还是可以用cron完成？

For the first part, is this something that would need to be done manually, or could it be accomplished with a cron?

与XML和关系数据库相关的大多数问题和答案都是指Python或PHP。这可以用javascript / nodejs完成吗？

Most of the questions and answers related to XML and relational databases refer to Python or PHP. Could this be done with javascript/nodejs as well?

如果这个问题更适合不同的StackExchange论坛，请告诉我，我会把它移到那里。

If this question is better suited for a different StackExchange forum, please let me know and I will move it there instead.

以下是xml代码示例：

Below is a sample of the xml code:

<case-file>
  <serial-number>123456789</serial-number>
    <transaction-date>20150101</transaction-date>
      <case-file-header>
       <filing-date>20140101</filing-date>
      </case-file-header>
      <case-file-statements>
       <case-file-statement>
        <code>AQ123</code>
        <text>Case file statement text</text>
       </case-file-statement>
       <case-file-statement>
        <code>BC345</code>
        <text>Case file statement text</text>
       </case-file-statement>
     </case-file-statements>
   <classifications>
  <classification>
   <international-code-total-no>1</international-code-total-no>
   <primary-code>025</primary-code>
  </classification>
 </classifications>
</case-file>

以下是有关如何使用这些文件的更多信息：

所有XML文件的格式都相同。每条记录中可能有几十个元素。这些文件每天由第三方更新（并在第三方网站上以压缩文件的形式提供）。每天的文件代表新的案例文件以及更新的案例文件。

All XML files will be in the same format. There are probably a few dozen elements within each record. The files are updated by a third party on a daily basis (and are available as zipped files on the third-party website). Each day's file represents new case files as well as updated case files.

目标是允许用户搜索信息并在页面上组织这些搜索结果（或在生成的pdf / excel文件中）。例如，用户可能希望查看包含< text> 元素中特定单词的所有案例文件。或者用户可能希望查看包含主代码025（< primary-code> 元素）的所有案例文件，并且这些案例文件是在特定日期之后提交的（< filing-date> 元素）。

The goal is to allow a user to search for information and organize those search results on the page (or in a generated pdf/excel file). For example, a user might want to see all case files that include a particular word within the <text> element. Or a user might want to see all case files that include primary code 025 (<primary-code> element) and that were filed after a particular date (<filing-date> element).

输入数据库的唯一数据来自XML文件 - 用户不会将任何自己的信息添加到数据库中。

The only data entered into the database will be from the XML files--users won't be adding any of their own information to the database.

推荐答案

所有步骤当然可以使用 node.js 来完成。有些模块可以帮助您完成以下任务：

All steps could certainly be accomplished using node.js. There are modules available that will help you with each of these tasks:

- node-cron ：可让您在节点程序中轻松设置cron任务。另一种选择是在您的操作系统上设置一个cron任务（为您喜爱的操作系统提供大量资源）。
- 下载：模块可以轻松地从URL下载文件。

- node-cron: lets you easily set up cron tasks in your node program. Another option would be to set up a cron task on your operating system (lots of resources available for your favourite OS).
- download: module to easily download files from a URL.

xml-stream ：允许您流式传输文件并注册解析器遇到某些XML元素时触发的事件。我已成功使用此模块解析KML文件（授权它们比文件小得多）。

xml-stream: allows you to stream a file and register events that fire when the parser encounters certain XML elements. I have successfully used this module to parse KML files (granted they were significantly smaller than your files).

node-postgres ：PostgreSQL的节点客户端（我确信有许多其他常见RDBMS的客户端，PG是我到目前为止唯一使用过的客户端。）

node-postgres: node client for PostgreSQL (I am sure there are clients for many other common RDBMS, PG is the only one I have used so far).

这些模块中的大多数都有很好的例子可以帮助你入门。以下是您可能设置XML流媒体部分的方法：

Most of these modules have pretty great examples that will get you started. Here's how you would probably set up the XML streaming part:

var XmlStream = require('xml-stream');
var xml = fs.createReadStream('path/to/file/on/disk'); // or stream directly from your online source
var xmlStream = new XmlStream(xml);
xmlStream.on('endElement case-file', function(element) {
    // create and execute SQL query/queries here for this element
});
xmlStream.on('end', function() {
    // done reading elements
    // do further processing / query database, etc.
});

这篇关于将大型XML文件转换为关系数据库的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将大型XML文件转换为关系数据库 [英] Converting large XML file to relational database

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

将大型XML文件转换为关系数据库 [英] Converting large XML file to relational database

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭