如何将JSON文件转换为SQLite数据库 [英] How to convert a JSON file to an SQLite database

查看:140
本文介绍了如何将JSON文件转换为SQLite数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一些示例数据,如何将其放入SQLite(最好是全自动的)中?

{"uri":"/","user_agent":"example1"}
{"uri":"/foobar","user_agent":"example1"}
{"uri":"/","user_agent":"example2"}
{"uri":"/foobar","user_agent":"example3"}

我发现最简单的方法是使用 @Leo ),原始问题确实显示了以换行符分隔的JSON对象,每个对象都与之相对应 rfc4627 ,但不是全部以该格式一起使用. 通过使用jq '.[]' <input.json >preprocessed.json预处理文件, jq 可以处理对象的单个JSON数组. 如果您碰巧正在处理JSON文本序列( rfc7464 )幸运地 jq --seq参数也使您满意.

编辑2 : 换行符分隔的JSON和JSON文本序列都具有一个重要的优点;它们将内存需求降低到O(1),这意味着您的总内存需求仅取决于最长的输入行,而将整个输入放在单个数组中则要求解析器可以处理后期错误(即,前100k之后)元素存在语法错误),据我所知通常不是这种情况,否则它必须将整个文件解析两次(首先是验证语法,然后进行解析,在此过程中将丢弃先前的元素,如)这在我所知的情况下也很少发生,或者它会尝试立即解析整个输入并一步返回结果(例如,接收包含您说的50G输入数据加上开销的全部内容的Python dict)通常由内存支持,因此将您的内存占用量几乎增加了总数据量.

编辑3 : 如果遇到任何障碍,请尝试使用 keys_unsorted 代替 keys . 我还没有测试过自己(我认为我的列已经排序),但是 @Kyle Barron 报告这是必需的. /p>

获取CSV

首先将数据写入文件. 我将在这里假设 data.json .

然后使用jq构造标头:

% head -1 data.json | jq -r 'keys | @csv'
"uri","user_agent"

head -1是因为我们只需要一行. jq-r使输出为纯字符串,而不是包装CSV的JSON-String. 然后,我们调用内部函数keys以将输入的键作为数组获取. 我们将其发送到@csv格式程序,该格式程序将为我们输出一个带有引号CSV格式的标题的字符串.

然后我们需要构建数据.

% jq -r '[.[]] | @csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"

我们现在将整个输入作为数据,并使用.[]解构关联数组(地图),然后将其放回简单数组[…]中. 这基本上将我们的字典转换为键数组. 发送到@csv格式程序后,我们再次获得一些CSV.

将所有内容放在一起,我们将得到以下形式的单个衬里:

% (head -1 data.json | jq -r 'keys | @csv' && jq -r '[.[]] | @csv' < data.json) > data.csv

如果您需要即时转换数据,即没有文件,请尝试以下操作:

% cat data.json | (read -r first && jq -r '(keys | @csv),( [.[]] | @csv)' <<<"${first}" && jq -r '[.[]] | @csv')

将其加载到SQLite

打开一个SQLite数据库:

sqlite3 somedb.sqlite

现在在交互式外壳程序中执行以下操作(假设您将CSV写入 data.csv ,并希望在名为my_table的表中使用它):

.mode csv
.import data.csv my_table

现在关闭外壳,然后再次打开它以保持干净的环境. 现在,您可以轻松地从数据库中SELECT进行任何您想做的事情.

将它们放在一起

在此处记录 asciinema :

If I have some sample data, how do I put it into SQLite (preferably fully automated)?

{"uri":"/","user_agent":"example1"}
{"uri":"/foobar","user_agent":"example1"}
{"uri":"/","user_agent":"example2"}
{"uri":"/foobar","user_agent":"example3"}

解决方案

I found the easiest way to do this is by using jq and CSV as an intermediary format.

Edit: As pointed out (thanks @Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format. jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json. If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.

Edit 2: Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.

Edit 3: If you hit any obstacles, try using keys_unsorted instead of keys. I haven't tested that myself (I kind of assume my columns were already sorted), however @Kyle Barron reports that this was needed.

Getting the CSV

First write your data to a file. I will assume data.json here.

Then construct the header using jq:

% head -1 data.json | jq -r 'keys | @csv'
"uri","user_agent"

The head -1 is because we only want one line. jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV. We then call the internal function keys to get the keys of the input as an array. This we send to the @csv formatter which outputs us a single string with the headers in quoted CSV format.

We then need to construct the data.

% jq -r '[.[]] | @csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"

We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […]. This basically converts our dictionary to an array of keys. Sent to the @csv formatter, we again get some CSV.

Putting it all together we get a single one-liner in the form of:

% (head -1 data.json | jq -r 'keys | @csv' && jq -r '[.[]] | @csv' < data.json) > data.csv

If you need to convert the data on the fly, i.e. without a file, try this:

% cat data.json | (read -r first && jq -r '(keys | @csv),( [.[]] | @csv)' <<<"${first}" && jq -r '[.[]] | @csv')

Loading it into SQLite

Open an SQLite database:

sqlite3 somedb.sqlite

Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):

.mode csv
.import data.csv my_table

Now close the shell and open it again for a clean environment. You can now easily SELECT from the database and do whatever you want to.

Putting it all together

Have an asciinema recording right there:

这篇关于如何将JSON文件转换为SQLite数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆