使用请求和psycopg2在Postgres中创建/插入Json [英] Create/Insert Json in Postgres with requests and psycopg2
问题描述
只需使用 PostgreSQL
开始一个项目。我想实现从Excel到数据库的飞跃,但我坚持创建和插入。我相信一旦运行,我将不得不将其切换为更新,因此,我不会继续覆盖当前数据。我知道我的连接正在工作,但是出现以下错误。
Just started a project with PostgreSQL
. I would like to make the leap from Excel to a database and I am stuck on create and insert. Once I run this I will have to switch it to Update I believe so I don't continue to write over the current data. I know my connection is working but i get the following error.
我的错误是: TypeError:并非在格式化字符串时转换了所有参数
#!/usr/bin/env python
import requests
import psycopg2
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
data = req.json()['data']
my_data = []
for item in data:
season = item['seasonId']
player = item['playerName']
first_name = item['playerFirstName']
last_Name = item['playerLastName']
playerId = item['playerId']
height = item['playerHeight']
pos = item['playerPositionCode']
handed = item['playerShootsCatches']
city = item['playerBirthCity']
country = item['playerBirthCountry']
state = item['playerBirthStateProvince']
dob = item['playerBirthDate']
draft_year = item['playerDraftYear']
draft_round = item['playerDraftRoundNo']
draft_overall = item['playerDraftOverallPickNo']
my_data.append([playerId, player, first_name, last_Name, height, pos, handed, city, country, state, dob, draft_year, draft_round, draft_overall, season])
cur = conn.cursor()
cur.execute("CREATE TABLE t_skaters (data json);")
cur.executemany("INSERT INTO t_skaters VALUES (%s)", (my_data,))
数据样本:
[[8468493, 'Ron Hainsey', 'Ron', 'Hainsey', 75, 'D', 'L', 'Bolton', 'USA', 'CT', '1981-03-24', 2000, 1, 13, 20172018], [8471339, 'Ryan Callahan', 'Ryan', 'Callahan', 70, 'R', 'R', 'Rochester', 'USA', 'NY', '1985-03-21', 2004, 4, 127, 20172018]]
推荐答案
似乎您要创建一个包含一个名为<$ c $的列的表c>数据 。此列的类型为JSON。 (我建议为每个字段创建一列,但要由您决定。)
It seems like you want to create a table with one column named "data"
. The type of this column is JSON. (I would recommend creating one column per field, but it's up to you.)
在这种情况下,变量 data
(从请求中读取)是 dict
s的列表
。正如我在评论中提到的那样,您可以循环遍历 data
并一次通过 executemany()
进行一次插入。不会比多次调用 execute()
快。
In this case the variable data
(that is read from the request) is a list
of dict
s. As I mentioned in my comment, you can loop over data
and do the inserts one at a time as executemany()
is not faster than multiple calls to execute()
.
我做了以下工作:
- 创建一个字段列表
- 对
数据的元素
- 每个
数据
中的$ c>项,将字段提取到my_data
- 调用
execute()
并传入json.dumps(my_data)
(转换my_data
从dict
转换为JSON字符串)
- Create a list of fields that you care about.
- Loop over the elements of
data
- For each
item
indata
, extract the fields intomy_data
- Call
execute()
and pass injson.dumps(my_data)
(Convertsmy_data
from adict
into a JSON-string)
尝试一下:
#!/usr/bin/env python
import requests
import psycopg2
import json
conn = psycopg2.connect(database='NHL', user='postgres', password='postgres', host='localhost', port='5432')
req = requests.get('http://www.nhl.com/stats/rest/skaters?isAggregate=false&reportType=basic&isGame=false&reportName=skatersummary&sort=[{%22property%22:%22playerName%22,%22direction%22:%22ASC%22},{%22property%22:%22goals%22,%22direction%22:%22DESC%22},{%22property%22:%22assists%22,%22direction%22:%22DESC%22}]&cayenneExp=gameTypeId=2%20and%20seasonId%3E=20172018%20and%20seasonId%3C=20172018')
# data here is a list of dicts
data = req.json()['data']
cur = conn.cursor()
# create a table with one column of type JSON
cur.execute("CREATE TABLE t_skaters (data json);")
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = {field: item[field] for field in fields}
cur.execute("INSERT INTO t_skaters VALUES (%s)", (json.dumps(my_data),))
# commit changes
conn.commit()
# Close the connection
conn.close()
我不确定100%所有的postgres语法在这里是否正确(我无法访问PG数据库进行测试),但是我认为这种逻辑应该适用于您正在尝试做。
I am not 100% sure if all of the postgres syntax is correct here (I don't have access to a PG database to test), but I believe that this logic should work for what you are trying to do.
更新用于单独的列
您可以修改您的create语句可以处理多个列,但是需要知道每个列的数据类型。下面是一些伪代码,您可以遵循:
You can modify your create statement to handle multiple columns, but it would require knowing the data type of each column. Here's some psuedocode you can follow:
# same boilerplate code from above
cur = conn.cursor()
# create a table with one column per field
cur.execute(
"""CREATE TABLE t_skaters (seasonId INTEGER, playerName VARCHAR, ...);"""
)
fields = [
'seasonId',
'playerName',
'playerFirstName',
'playerLastName',
'playerId',
'playerHeight',
'playerPositionCode',
'playerShootsCatches',
'playerBirthCity',
'playerBirthCountry',
'playerBirthStateProvince',
'playerBirthDate',
'playerDraftYear',
'playerDraftRoundNo',
'playerDraftOverallPickNo'
]
for item in data:
my_data = [item[field] for field in fields]
# need a placeholder (%s) for each variable
# refer to postgres docs on INSERT statement on how to specify order
cur.execute("INSERT INTO t_skaters VALUES (%s, %s, ...)", tuple(my_data))
# commit changes
conn.commit()
# Close the connection
conn.close()
替换 ...
和适合您数据的值。
Replace the ...
with the appropriate values for your data.
这篇关于使用请求和psycopg2在Postgres中创建/插入Json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!