使用Python从API中提取JSON数据 [英] Extract data from JSON from an API with Python

查看:222
本文介绍了使用Python从API中提取JSON数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正在考虑的数据来自API,这意味着它是非常不一致的 - 有时它会拉出意想不到的内容,有时它没有什么等等。



我是什么感兴趣的是与每个记录的 ISO 3166-2 有关的数据。



数据(当没有遇到错误时)通常看起来像这样:

  {countryCode:GB,adminCode1:ENG,countryName:英国,距离:0,codes:[{type ISO3166-2,code:ENG}],adminName1:England} 
{countryCode:GB,adminCode1:ENG,countryName王国,距离:0,代码:[{type:ISO3166-2,code:ENG}],adminName1:England}
{countryCode :GB,adminCode1:ENG,countryName:英国,距离:0,代码:[{type:ISO3166-2,code ENG,countryName:英国,距离:0,admin代码:[{type:ISO3166-2,code:ENG}],adminName1:England}
{countryCode:RO,adminCode1 10,countryName:罗马尼亚,距离:0,代码:[{type:FIPS10-4,code:10},{type:ISO3166 -2,代码:B}],adminName1:Bucure \\$${ FIPS10-4,code:07},{type:ISO3166-2,code:NW}],adminName1:北莱茵 - 威斯特法伦州 {countryCode:DE,adminCode1:01,countryName:德国,距离:0,代码:[{type:FIPS10-4 :01},{type:ISO3166-2,code:BW}],adminName1:Baden-W\\\ürttemberg}
{countryCode DE,adminCode1:02,countryName:Germany,distance:0,codes:[{type:FIPS10-4,code:02 {type:ISO3166-2,code:BY}],adminName1:Bavaria}

我们以一个记录为例:

  {countryCode:DE,adminCode1 :01,countryName:德国,距离:0,代码:[{type:FIPS10-4,code:01},{type ISO3166-2,code:BW}],adminName1:Baden-W\\\ürttemberg} 

从这个我'有意提取 ISO 3166-2 表示,即 DE-BW



我一直在尝试使用python提取此信息的不同方式,一次尝试如下所示:



<$ p $ {code> coord = response.get('codes',{})get('type',{})get('ISO3166-2',None)

另一个尝试看起来像这样:

  print(json.dumps(response [codes] [ISO3166-2]))

然而这两种方法都不行。



如何获取记录:

  { countryCode:DE,adminCode1:01,countryName:Germany,distance:0,codes:[{type:FIPS10-4,code 01},{type:ISO3166-2,code:BW}],adminName1:Baden-W\\\ürttemberg} 
/ pre>

并仅使用python提取 DE-BW ,同时控制不正确的实例例如也从

 中提取 GB-ENG  countryCode:GB,adminCode1:ENG,countryName:英国,距离:0,代码:[{type:ISO3166-2 :ENG}],adminName1:England} 

当然不会崩溃它得到的东西看起来不像那些,即异常处理。






FULL FILE

  import json 
导入请求
从集合import defaultdict
从pprint import pprint

#打开data- processing.py'
with open('job-numbers-by-location.txt')as data_file:

data_file中的行:
标识符,名称,coords,number_of_jobs = line.split(|)
coords = coords [1:-1]
lat,lng = coords.split(,)
#print(lat:+ lat,lng:+ lng)
response = requests.get(http://api.geonames.org/countrySubdivisionJSON?lat=+ lat +& lng =+ lng +& s $ s

$ b code = response.get('codes',[])
代码中的代码:
if code.get('type')=='ISO3166-2':
print('{} - {}'format(response.get('countryCode','UNKNOWN'),code.get代码','UNKNOWN'))


解决方案

'ISO3166-2'是字典值,不是键

  codes = response.get('codes',[])
代码中的代码:
如果code.get ('type')=='ISO3166-2':
print('{} - {}'。format(response.get('countryCode','UNKNOWN'),code.get('code' 'UNKNOWN')))


The data under consideration is coming from an API, which means that it's highly inconsistent- sometimes it pulls unexpected content, sometimes it pulls nothing, etc.

What I'm interested in is the data associated with ISO 3166-2 for each record.

The data (when it doesn't encounter an error) generally looks something like this:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}
{"countryCode": "RO", "adminCode1": "10", "countryName": "Romania", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "10"}, {"type": "ISO3166-2", "code": "B"}], "adminName1": "Bucure\u015fti"}
{"countryCode": "DE", "adminCode1": "07", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "07"}, {"type": "ISO3166-2", "code": "NW"}], "adminName1": "North Rhine-Westphalia"}
{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}
{"countryCode": "DE", "adminCode1": "02", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "02"}, {"type": "ISO3166-2", "code": "BY"}], "adminName1": "Bavaria"}

Let's take one record for example:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

From this I'm interested to extract the ISO 3166-2 representation, i.e. DE-BW.

I've been trying different ways of extracting this information with python, one attempt looked like this:

coord = response.get('codes', {}).get('type', {}).get('ISO3166-2', None)

another attempt looked like this:

print(json.dumps(response["codes"]["ISO3166-2"]))

However neither of those methods worked.

How can I take a record such as:

{"countryCode": "DE", "adminCode1": "01", "countryName": "Germany", "distance": 0, "codes": [{"type": "FIPS10-4", "code": "01"}, {"type": "ISO3166-2", "code": "BW"}], "adminName1": "Baden-W\u00fcrttemberg"}

and extract only DE-BW using python, while simultaneously controlling for instances that don't look exactly like that, for instance also extracting GB-ENG from:

{"countryCode": "GB", "adminCode1": "ENG", "countryName": "United Kingdom", "distance": 0, "codes": [{"type": "ISO3166-2", "code": "ENG"}], "adminName1": "England"}

and of course not crashing if it gets something that doesn't look like either of those, i.e. exception handling.


FULL FILE

import json
import requests
from collections import defaultdict
from pprint import pprint

# open up the output of 'data-processing.py'
with open('job-numbers-by-location.txt') as data_file:

    for line in data_file:
        identifier, name, coords, number_of_jobs = line.split("|")
        coords = coords[1:-1]
        lat, lng = coords.split(",")
        # print("lat: " + lat, "lng: " + lng)
        response = requests.get("http://api.geonames.org/countrySubdivisionJSON?lat="+lat+"&lng="+lng+"&username=s.matthew.english").json()


        codes = response.get('codes', [])
        for code in codes:
            if code.get('type') == 'ISO3166-2':
                print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN'))

解决方案

'ISO3166-2' is dictionary value, not key

codes = response.get('codes', [])
for code in codes:
    if code.get('type') == 'ISO3166-2':
        print('{}-{}'.format(response.get('countryCode', 'UNKNOWN'), code.get('code', 'UNKNOWN')))

这篇关于使用Python从API中提取JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆