使用Python解析API - 如何使用BOM处理JSON [英] parsing API with Python - how to handle JSON with BOM
问题描述
import pdb
import json
import urllib2
import csv
pdb.set_trace()
url =https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b -8bed-3b7eb81a95ba
myfile ='C:/dane/drzewa.csv'
csv_myfile = csv.writer(open(myfile,'wb'))
cols = ['numer_adres' ,'stan_zdrowia','y_wgs84','dzielnica','adres','lokalizacja','wiek_w_dni','srednica_k','pnie_obwod','miasto','jednostka','x_pl2000','wysokosc' y_pl2000','numer_inw','x_wgs84','_id','gatunek_1','gatunek','data_wyk_pom']
csv_myfile.writerow(cols)
def api_iterate(myfile):
while True:
全局url
打印url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data ['result'] ['records']:
csv_myfile.writerow([data_object [col] for col in cols])
try:
url =数据['_ links'] ['next']
除了KeyError作为e:
break
with open(myfile,'wb'):
api_iterate(myfile )
我是一个非常新鲜的Python用户,所以我一直很困惑。现在我来到这一点,当在json字典中读取对象时,我得到一个与'x_wgs84'元素相关联的Keyerror消息。我想这与这个事实有关,在源URL中,这个元素前面是一个U + FEFF unicode字符。我试图解决这个问题,但是我遇到困难,不胜感激。
我怀疑代码可能以其他几种方式腐败 - 正如我所说,我是一个非常不熟练的程序员(还)。
您需要使用unicode字符键:
要知道如何做,一个简单的方法是打印密钥:
>>>导入请求
>>> res = requests.get('https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba')
>>> data = res.json()
>>> records = data ['result'] ['records']
>>>记录[0]
{u'numer_adres':u'',u'stan_zdrowia':u'dobry',u'y_wgs84':u'52.21865',u'y_pl2000':u'5787241.04475524' adres':我们ALPEJSKA',u'x_pl2000':u'7511793.96937063',u'lokalizacja':u'Ulica ALPEJSKA',u'wiek_w_dni':u'60',u'miasto':u'Warszawa',u'jednostka':u 'Dzielnica Wawer',u'pnie_obwod':u'73',u'wysokosc':u'14',u'data_wyk_pom':u'20130709',u'dzielnica':u'Wawer',u'\\\x_wgs84 ':u'21.172584',u'numer_inw':u'D386200',u'_id':125435,你的'妳' ',u'srednica_k':u'7'}
>>>记录[0] .keys()
[u'numer_adres',u'stan_zdrowia',u'y_wgs84',u'y_pl2000',u'adres',u'x_pl2000',u'lokalizacja',u' w'k_w_dni',u'miasto',u'jednostka',u'pnie_obwod',u'wysokosc',u'data_wyk_pom',u'dzielnica',u'\\\x_wgs84',u'numer_inw',u'_id' u'gatunek_1',u'gatunek',u'srednica_k']
>>>记录[0] [u'\\\x_wgs84']
u'21.172584'
As您可以看到,要获得密钥,您需要将其写为'\\\x_wgs84'
与导致问题的unicode字符。
注意:我不知道你是否使用python2或3,但你可能需要在你的字符串声明之前加一个
u
在python2中将其声明为unicode字符串。 I'm using Python 2.7.11 on windows to get JSON data from API (data on trees in Warsaw, Poland, but nevermind that). I want to generate output csv file with all the data provided by the api, for further analysis. I started with a script I used for another project (also discussed here on Stackoverflow and corrected for me by @Martin Taylor).That script didn't work so I tried to modify it using my very basic understanding, googling around and applying pdb debugger. At the moment, the result looks like this:
import pdb
import json
import urllib2
import csv
pdb.set_trace()
url = "https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba"
myfile = 'C:/dane/drzewa.csv'
csv_myfile = csv.writer(open(myfile, 'wb'))
cols = ['numer_adres', 'stan_zdrowia', 'y_wgs84', 'dzielnica', 'adres', 'lokalizacja', 'wiek_w_dni', 'srednica_k', 'pnie_obwod', 'miasto', 'jednostka', 'x_pl2000', 'wysokosc', 'y_pl2000', 'numer_inw', 'x_wgs84', '_id', 'gatunek_1', 'gatunek', 'data_wyk_pom']
csv_myfile.writerow(cols)
def api_iterate(myfile):
while True:
global url
print url
json_page = urllib2.urlopen(url)
data = json.load(json_page)
json_page.close()
for data_object in data ['result']['records']:
csv_myfile.writerow([data_object[col] for col in cols])
try:
url = data['_links']['next']
except KeyError as e:
break
with open(myfile, 'wb'):
api_iterate(myfile)
I'm a very fresh Python user so I get confused all the time. Now I got to the point when, while reading the objects in json dictionary, I get a Keyerror message associated with the 'x_wgs84' element. I suppose it has something to do with the fact that in the source url this element is preceded by a U+FEFF unicode character. I tried to get around this but I got stuck and would appreciate assistance.
I suspect the code may be corrupt in several other ways - as I mentioned, I'm a very unskilled programmer (yet).
You need to put the key with the unicode character:
To know how to do it, one easy way is to print the keys:
>>> import requests
>>> res = requests.get('https://api.um.warszawa.pl/api/action/datastore_search/?resource_id=ed6217dd-c8d0-4f7b-8bed-3b7eb81a95ba')
>>> data = res.json()
>>> records = data['result']['records']
>>> records[0]
{u'numer_adres': u'', u'stan_zdrowia': u'dobry', u'y_wgs84': u'52.21865', u'y_pl2000': u'5787241.04475524', u'adres': u'ul. ALPEJSKA', u'x_pl2000': u'7511793.96937063', u'lokalizacja': u'Ulica ALPEJSKA', u'wiek_w_dni': u'60', u'miasto': u'Warszawa', u'jednostka': u'Dzielnica Wawer', u'pnie_obwod': u'73', u'wysokosc': u'14', u'data_wyk_pom': u'20130709', u'dzielnica': u'Wawer', u'\ufeffx_wgs84': u'21.172584', u'numer_inw': u'D386200', u'_id': 125435, u'gatunek_1': u'Quercus robur', u'gatunek': u'd\u0105b szypu\u0142kowy', u'srednica_k': u'7'}
>>> records[0].keys()
[u'numer_adres', u'stan_zdrowia', u'y_wgs84', u'y_pl2000', u'adres', u'x_pl2000', u'lokalizacja', u'wiek_w_dni', u'miasto', u'jednostka', u'pnie_obwod', u'wysokosc', u'data_wyk_pom', u'dzielnica', u'\ufeffx_wgs84', u'numer_inw', u'_id', u'gatunek_1', u'gatunek', u'srednica_k']
>>> records[0][u'\ufeffx_wgs84']
u'21.172584'
As you can see, to get your key, you need to write it as '\ufeffx_wgs84'
with the unicode character that is causing trouble.
Note: I don't know if you are using python2 or 3, but you might need to put a u
before your string declaration in python2 to declare it as unicode string.
这篇关于使用Python解析API - 如何使用BOM处理JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!