使用Pandas读取JSON时出现“预期的字符串或Unicode" [英] 'Expected String or Unicode' when reading JSON with Pandas

查看:204
本文介绍了使用Pandas读取JSON时出现“预期的字符串或Unicode"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试读取

I try to read an Openstreetmaps API output JSON string, which is valid.

我正在使用以下代码:

import pandas as pd
import requests

# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()

osmdataframe = pd.read_json(osmdata)

会引发以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()
    199 
    200     if typ == 'series' or obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
    264 
    265         else:
--> 266             self._parse_no_numpy()
    267 
    268         if self.obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
    481         if orient == "columns":
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":
    485             decoded = dict((str(k), v)

TypeError: Expected String or Unicode

如何修改请求或熊猫read_json,以避免发生错误?顺便问一下,有什么问题吗?

How to modify the request or Pandas read_json, to avoid an error? By the way, what's the problem?

推荐答案

如果将json字符串打印到文件中,

If you print the json string to a file,

content = osm.read()
with open('/tmp/out', 'w') as f:
    f.write(content)

您会看到类似这样的内容:

you'll see something like this:

{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2014-07-20T07:52:02Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
},
...]}

如果要将JSON字符串转换为Python对象,则它将是一个dict,其elements键是一个dict列表.绝大多数数据都在该字典列表中.

If the JSON string were to be converted to a Python object, it would be a dict whose elements key is a list of dicts. The vast majority of the data is inside this list of dicts.

此JSON字符串不能直接转换为Pandas对象.什么是索引,什么是列? 当然,您不希望[u'elements', u'version', u'osm3s', u'generator']作为列,因为几乎所有信息都在elements -dicts列表中.

This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator'] to be the columns, since almost all the information is in the elements list-of-dicts.

但是,如果您希望DataFrame仅包含elements -dicts列表中的数据,那么您必须指定它,因为Pandas不能为您做这个假设.

But if you want the DataFrame to consist of the data only in the elements list-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.

更复杂的是,elements中的每个字典都是嵌套字典.考虑elements中的第一个字典:

Further complicating things is that each dict in elements is a nested dict. Consider the first dict in elements:

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
}

['lat', 'lon', 'type', 'id', 'tags']应该是列吗? 这似乎是合理的,除了tags列最终将是一列dicts.通常这不是很有用.如果将tags dict中的键分成列,那会更好.我们可以做到这一点,但由于熊猫无法知道那就是我们想要的,因此我们还是必须自己编写代码.

Should ['lat', 'lon', 'type', 'id', 'tags'] be the columns? That seems plausible, except that the tags column would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tags dict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.

import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
    for key, val in dct['tags'].iteritems():
        dct[key] = val
    del dct['tags']

osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())

收益

         lat        lon                        name
0  50.984926  13.682178  Niederhäslich Bergmannsweg
1  51.123623  13.782789                Sagarder Weg
2  51.065752  13.895734     Weißig, Einkaufszentrum
3  51.007140  13.698498          Stuttgarter Straße
4  51.010199  13.701411          Heilbronner Straße

这篇关于使用Pandas读取JSON时出现“预期的字符串或Unicode"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆