如何将XBRL数据导入MySQL? [英] How to import XBRL data to MySQL?

查看:290
本文介绍了如何将XBRL数据导入MySQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个涉及处理大量XBRL文档(> 1m个单独文件)的项目。我对XBRL完全陌生,目前感觉很失落。

I am working on a project involving processing a large volume of XBRL documents (> 1m separate files). I am totally new to XBRL and feeling quite lost at the moment.

我有一个单独的MySQL数据库中有关这些XBRL文档的数据,我想添加XBRL

I have data relating to those XBRL documents in a separate MySQL database and I would like to add the XBRL data into MySQL to store everything in one db.

将数据从XBRL文档传输到MySQL的最佳方法是什么?

What are the best methods to go about transferring data from the XBRL docs into MySQL?

有没有可用的批量处理库?

Are there any bulk processing libraries available for it?

我一直在寻找关于这些问题的教程,但找不到任何提供基本介绍,只是很多高级信息。

I've been looking for tutorials on those issues but couldn't find anything providing a basic introduction, just a lot of high level info.

推荐答案

在数据库中存储XBRL的理论的自然范例将是OLAP ,因为XBRL是关于数据立方体的。在关系数据库之上的OLAP将被称为ROLAP。

The natural paradigm in theory for storing XBRL in a database would be OLAP, because XBRL is about data cubes. OLAP on top of a relational database would be called ROLAP.

这不是一个微不足道的问题,因为从大量分类中得到的事实可能形成一个非常大和稀疏多维数据集(对于SEC文档,它是10k +维度),并且因为创建SQL模式需要在导入之前知道分类。如果出现新的分类法,需要重新ETL一切。这不会使关系数据库适合作为一个一般的解决方案。

This is not a trivial problem, because facts taken from a large number of taxonomies can form a very large and sparse cube (for SEC filings it's 10k+ dimensions), and also because creating an SQL schema requires knowing the taxonomies before any import. If new taxonomies come up, one needs to re-ETL everything. This doesn't make relational databases suitable as a general solution.

如果文件共享相同的分类,并且分类法非常简单,虽然(如: ),可以想出一个ad-hoc映射,以将所有事实存储在具有ROLAP意义上的许多行的单个表中(事实到行,方面到列)。一些供应商专门存储无量纲的XBRL事实,在这种情况下,传统的SQL(或称为后SQL,扩展行)的产品效果很好。

If the filings share the same taxonomy and the taxonomy is very simple though (as in: not too many dimensions), it is possible to come up with an ad-hoc mapping to store all facts in a single table with many rows in the ROLAP sense (facts to rows, aspects to columns). Some vendors are specialized in storing non-dimensional XBRL facts, in which case traditional SQL (or "post-SQL" that scale with rows) offerings work well.

为分类中的每个XBRL超立方体创建一个表,其中从定义网络派生的模式,但是对于每个超立方体不同。这可能导致数据库中有很多表,并且对于涉及多个超级立方体的查询需要很多连接。

Some vendors create a table for each XBRL hypercube in the taxonomy, with a schema derived from the definition network but different for each hypercube. This can lead to a lot of tables in the database, and requires a lot of joins for queries involving multiple hypercubes.

一些其他供应商对底层XBRL结构做出假设,或关于他们的用户需要运行的查询的种类。限制问题的范围允许查找也可以为这些特定需求执行此工作的特定架构或SQL模式。

Some other vendors make assumptions about the underlying XBRL structure, or about the kind of queries that their users need to run. Restricting the scope of the problem allows finding specific architectures or SQL schemas that can also do the job for these specific needs.

要导入大量文件(例如,所有SEC申请),我们(我的雇主)在NoSQL数据存储而不是关系数据库上构建了通用映射。大量的具有不同维数的事实适合大量半结构化文档的集合,并且网络在层次化格式中很好地适用。

To import large amounts of filings (e.g., all SEC filings), we (my employer) built a generic mapping on top of NoSQL data stores rather than relational databases. Large numbers of facts with a varying number of dimensions fit in large collections of semi-structured documents, and networks fit well in a hierarchical format.

这篇关于如何将XBRL数据导入MySQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆