适用于Postgres的Python中jsonb数组的正确格式是什么? [英] What is the proper formatting for a jsonb array in Python for Postgres?
问题描述
我有一个看起来像
Column | Type |
-------------------------------------------------------
message_id | integer |
user_id | integer |
body | text |
created_at | timestamp without time zone |
source | jsonb |
symbols | jsonb[] |
我正在尝试使用psycopg2通过psycopg2.Cursor.copy_from()插入数据,但是我遇到了很多问题,试图弄清楚应该如何格式化jsonb []对象。当我直接列出JSON对象时,出现类似
I am trying to use psycopg2 to insert data via psycopg2.Cursor.copy_from() but I am getting numerous issues trying to figure out how a jsonb[] object should be formatted. When I do a straight list of JSON objects, I get an error that looks like
psycopg2.errors.InvalidTextRepresentation: malformed array literal: "[{'id': 13016, 'symbol':
....
DETAIL: "[" must introduce explicitly-specified array dimensions.
我在双引号和花括号中尝试了多种不同的转义符。如果对数据执行json.dumps(),则会收到以下错误。
I've tried numerous different escapes on the double quotes and curly braces. If I do a json.dumps() on my data, I get the below error.
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type json
DETAIL: Token "'" is invalid.
从此代码段收到此错误
messageData = []
symbols = messageObject["symbols"]
newSymbols = []
for symbol in symbols:
toAppend = symbol
toAppend = refineJSON(json.dumps(symbol))
toAppend = re.sub("{", "\{", toAppend)
toAppend = re.sub("}", "\}", toAppend)
toAppend = re.sub('"', '\\"', toAppend)
newSymbols.append(toAppend)
messageData.append(set(newSymbols))
我也愿意将列定义为其他类型(例如,文本),然后尝试进行转换,但我也无法执行。
I'm also open to defining the column as a different type (e.g., text) and then attempting a conversion but I haven't been able to do that either.
messageData是调用psycopg2.Cursor.copy_from()的辅助函数的输入。 p>
messageData is the input to a helper function that calls psycopg2.Cursor.copy_from()
def copy_string_iterator_messages(connection, messages, size: int = 8192) -> None:
with connection.cursor() as cursor:
messages_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (messageData[0], messageData[1], messageData[2], messageData[3], messageData[4], messageData[5], messageData[6], messageData[7], messageData[8], messageData[9], messageData[10],
messageData[11],
))) + '\n'
for messageData in messages
))
# pp.pprint(messages_string_iterator.read())
cursor.copy_from(messages_string_iterator, 'test', sep='|', size=size)
connection.commit()
编辑:基于Mike的输入,我更新了代码以使用execute_batch(),其中message是包含每条消息的messageData的列表。
Based on the input from Mike, I updated the code to use execute_batch() where messages is a list containing messageData for each message.
def insert_execute_batch_iterator_messages(connection, messages, page_size: int = 1000) -> None:
with connection.cursor() as cursor:
iter_messages = ({**message, } for message in messages)
print("inside")
psycopg2.extras.execute_batch(cursor, """
INSERT INTO test VALUES(
%(message_id)s,
%(user_id)s,
%(body)s,
%(created_at)s,
%(source)s::jsonb,
%(symbols)s::jsonb[]
);
""", iter_messages, page_size=page_size)
connection.commit()
推荐答案
您的问题使我感到好奇。下面这对我有用。我怀疑是否可以解决转出CSV的问题。
Your question made me curious. This below works for me. I have doubts whether the escaping going to/from CSV can be resolved.
我的表:
=# \d jbarray
Table "public.jbarray"
Column | Type | Collation | Nullable | Default
---------+---------+-----------+----------+-------------------------------------
id | integer | | not null | nextval('jbarray_id_seq'::regclass)
symbols | jsonb[] | | |
Indexes:
"jbarray_pkey" PRIMARY KEY, btree (id)
完全独立的Python代码:
Completely self-contained Python code:
mport json
import psycopg2
con = psycopg2.connect('dbname=<my database>')
some_objects = [{'id': x, 'array': [x, x+1, x+2, {'inside': x+3}]} for x in range(5)]
insert_array = [json.dumps(x) for x in some_objects]
print(insert_array)
c = con.cursor()
c.execute("insert into jbarray (symbols) values (%s::jsonb[])", (insert_array,))
con.commit()
结果:
=# select * from jbarray;
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id | 1
symbols | {"{\"id\": 0, \"array\": [0, 1, 2, {\"inside\": 3}]}","{\"id\": 1, \"array\": [1, 2, 3, {\"inside\": 4}]}","{\"id\": 2, \"array\": [2, 3, 4, {\"inside\": 5}]}","{\"id\": 3, \"array\": [3, 4, 5, {\"inside\": 6}]}","{\"id\": 4, \"array\": [4, 5, 6, {\"inside\": 7}]}"}
=# select id, unnest(symbols) from jbarray;
-[ RECORD 1 ]----------------------------------------
id | 1
unnest | {"id": 0, "array": [0, 1, 2, {"inside": 3}]}
-[ RECORD 2 ]----------------------------------------
id | 1
unnest | {"id": 1, "array": [1, 2, 3, {"inside": 4}]}
-[ RECORD 3 ]----------------------------------------
id | 1
unnest | {"id": 2, "array": [2, 3, 4, {"inside": 5}]}
-[ RECORD 4 ]----------------------------------------
id | 1
unnest | {"id": 3, "array": [3, 4, 5, {"inside": 6}]}
-[ RECORD 5 ]----------------------------------------
id | 1
unnest | {"id": 4, "array": [4, 5, 6, {"inside": 7}]}
如果插入性能对您来说太慢,则可以使用预处理语句
和 execute_batch()
此处记录。我曾经使用过这种组合,而且速度很快。
If the insert performance is too slow for you, then you can use a prepared statement
with execute_batch()
as documented here. I have used that combination, and it was very fast.
这篇关于适用于Postgres的Python中jsonb数组的正确格式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!