如何将 json 文件复制到 postgres 中? [英] How do I COPY IMPORT a json file into postgres?

查看:89
本文介绍了如何将 json 文件复制到 postgres 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将 json 数据导入 postgres.我拥有的数据有 100 万行,大小至少为 700 MB,可扩展到 3 GB.

这是我根据我拥有的数据结构创建的示例数据.我尝试将其导入 postgres,但出现错误.

样本(1)数据

{offers":{offer":[{url":https://some1-value.com",嵌套":{id":4,value":some1 text value"},quotes":5"边"},{url":https://some2-value.com",嵌套":{id":5,value":some2 text value"},quotes":6"边"},{url":https://some3-value.com",嵌套":{id":6,value":some3 文本值"},quotes":7"一边"}]}}

我使用的命令和我得到的错误

# copy contrial from '/home/ubuntu/sample-data.json';错误:类型 json 的输入语法无效详细信息:输入字符串意外结束.上下文:JSON 数据,第 1 行:{offers":{offer":[COPY contrial,第 1 行,列信息:{"offers":{offer":[";

我修改了文件以删除前两个键,并且只有如下所示的 json 列表,但我仍然收到错误消息.

样本(2)数据

<预><代码>[{url":https://some1-value.com",嵌套":{id":4,value":some1 text value"},quotes":5"边"},{url":https://some2-value.com",嵌套":{id":5,value":some2 text value"},quotes":6"边"},{url":https://some3-value.com",嵌套":{id":6,value":some3 文本值"},quotes":7"边"}]

错误

# copy contrial from '/home/ubuntu/sample2-data.json';错误:类型 json 的输入语法无效详细信息:输入字符串意外结束.上下文:JSON 数据,第 1 行:[COPY contrial,第 1 行,列信息:[";

我进一步修改的示例(3)数据

[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\";边"},{url":https://some2-value.com",嵌套":{id":5,value":some2 text value"},quotes":6"边"},{url":https://some3-value.com",嵌套":{id":6,value":some3 文本值"},quotes":7"边"}]

不同的错误

# copy contrial from '/home/ubuntu/sample2-data.json';错误:类型 json 的输入语法无效详细信息:令牌侧面"是无效的.上下文:JSON 数据,第 1 行:...,value":some1 text value"},quotes":5";边...COPY contrial, line 1, column info: "[{"url": "https://some1-value.com", "nested": {"id":4,"value":";some1 text value"}, quotes": 5";你……"

创建表语句

CREATE TABLE public.contrial (信息 json 不为空);

最终目标是创建一个以键为列、以值为记录的表.嵌套的键需要展平.

+-------------------------+-----------+------------------+------------+|网址 |nested_id |嵌套值|报价 |+-------------------------+-----------+------------------+------------+|https://some1-value.com |4 |some1 文本值 |5\"边|+-------------------------+-----------+------------------+------------+|https://some2-value.com |5 |some2 文本值 |6\"边|+-------------------------+-----------+------------------+------------+|https://some3-value.com |6 |some3 文本值 |7\"边|+-------------------------+-----------+------------------+------------+

解决方案

我最终使用了 Andre Dunstan 的博客this SO answer 说以特定方式格式化 json使用复制命令.

由于我的结构是为我正在解析的文件定义的,所以我最终得到了以下脚本.

def file_len(fname):# 查找文件中的行数.# 即使对于数百万条记录也非常有效使用 open(fname) 作为 f:对于 i, l 在 enumerate(f) 中:经过返回 i + 1LEN = file_len('/path/to/input.json')与 open('/path/to/output.json.csv', 'w') 作为 fo:使用 open('23819_part003.json', 'r') 作为 fi:对于 i, l 在 enumerate(fi) 中:# 我跳过第一行如果我 == 0:继续# 去掉末尾的']}}'elif i+1 == LEN: _ = fo.write(l[:-3])# 去掉末尾的','# 并添加 \n 因为 write 不会自行添加换行符否则:_ = fo.write(l[:-2]+'\n')#加载语句导入 sqlalchemyPOSTGRESQL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOSTNAME}/{DB}'引擎 = sqlalchemy.create_engine(POSTGRESQL, echo=True)con = engine.connect()trans = con.begin()LOAD_SQL = "COPY tablename from '/path/to/output.json.csv' with csv delimiter E'\x01' quote E'\x02' null as '';"尝试:con.execute(LOAD_SQL)trans.commit()除了作为 e 的例外:trans.rollback()最后:关闭()

I would like to import json data into postgres. The data I have is of the orders of a million rows, sizes are a minimum of 700 MB and stretches till 3 GB.

Here's a sample data I created based on the structure of data I have. I tried importing this into postgres but I get an error.

Sample(1) data

{"offers":{"offer":[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]}}

Command I used and the error I got

# copy contrial from '/home/ubuntu/sample-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: {"offers":{"offer":[
COPY contrial, line 1, column info: "{"offers":{"offer":["

I modified the file to remove the first two keys and have just a list of jsons like below, but I still get an error.

Sample(2) data

[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

Error

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: [
COPY contrial, line 1, column info: "["

Sample(3) data I modified further

[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

Different Error

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  Token "side" is invalid.
CONTEXT:  JSON data, line 1: ...,"value":"some1 text value"}, "quotes": "5"  side...
COPY contrial, line 1, column info: "[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5"  si..."

Create table statement

CREATE TABLE public.contrial (
    info json NOT NULL
);

The end goal is to create a table with keys as columns and values as records. Nested keys would need to be flattened.

+-------------------------+-----------+------------------+----------+
| url                     | nested_id | nested_value     | quotes   |
+-------------------------+-----------+------------------+----------+
| https://some1-value.com | 4         | some1 text value | 5\" side |
+-------------------------+-----------+------------------+----------+
| https://some2-value.com | 5         | some2 text value | 6\" side |
+-------------------------+-----------+------------------+----------+
| https://some3-value.com | 6         | some3 text value | 7\" side |
+-------------------------+-----------+------------------+----------+

解决方案

I ended up using Andre Dunstan's blog and this SO answer which says to format the json in a specific way to use the copy command.

Since my structure is pretty defined for the files I'm parsing, I ended up with the following script.

def file_len(fname):
    # to find the number of lines in the file.
    # Has been pretty efficient even for millions of records
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

LEN = file_len('/path/to/input.json')
with open('/path/to/output.json.csv', 'w') as fo:
    with open('23819_part003.json', 'r') as fi:
        for i, l in enumerate(fi):
            # I skip the first line
            if i == 0: continue 
            
            # To remove the ']}}' from the end
            elif i+1 == LEN: _ = fo.write(l[:-3])
            
            # To remove the ',' from the end 
            # and add \n since write does not add newline on its own
            else: _ = fo.write(l[:-2]+'\n') 

# load statement

import sqlalchemy
POSTGRESQL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOSTNAME}/{DB}'
engine = sqlalchemy.create_engine(POSTGRESQL, echo=True)
            
con = engine.connect()
trans = con.begin()
LOAD_SQL = "COPY tablename from '/path/to/output.json.csv' with csv delimiter E'\x01' quote E'\x02' null as '';"
try:
    con.execute(LOAD_SQL)
    trans.commit()
except Exception as e:
    trans.rollback()
finally:
    con.close()

这篇关于如何将 json 文件复制到 postgres 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆