在Pandas中将嵌套的JSON数据作为数据框访问 [英] Accessing nested JSON data as dataframes in Pandas
问题描述
我有以下数据
{ "results": [
{
"company": "XYZ",
"createdAt": "2014-03-27T23:21:48.758Z",
"email": "abc@gmail.com",
"firstName": "abc",
"lastName": "xyz",
"linkedinAccount": "",
"location": "",
"profilePicture": {
"__type": "File",
"name": "ab0e-profilePicture",
"url": "url.url.com"
},
"registrationGate": "normal",
"telephone": "",
"title": "AA",
"updatedAt": "2014-03-27T23:24:20.220Z",
"username": "abc@gmail.com",
"zipcode": "00000"
}
]
}
我使用以下代码导入json数据
I import the json data using the following code
import json
import pandas as pd
from pandas import DataFrame
json_data = pd.read_json('data.json')
print json_data[:2]
此打印
results
0 {u'linkedinAccount': u'', u'username': u'abc...
1 {u'linkedinAccount': u'zxcflcnv', u'username...
[2 rows x 1 columns]
当我尝试使用
print df['linkedinAccount']
我收到以下错误
KeyError: u'no item named linkedinAccount'
如何根据列名访问数据框中的数据?
How do I access data in the dataframes based on column names?
推荐答案
不确定在json
中如何组织多个观察.但是很明显,引起问题的原因是您在"profilePicture"
字段中使用了嵌套结构.因此,每个观察结果都表示为嵌套字典.您需要按照此解决方案将每个观察结果转换为dataframe
,然后将它们concat
转换为最终的dataframe
.
Not sure how your multiple observations are organized in json
. But it is clear that what is causing problem is you are having a nested structure for the "profilePicture"
field. Therefore each observation is expressed as a nested dictionary. You need to convert each observation to a dataframe
and concat
them into the final dataframe
as in this solution.
In [3]:
print df
results
0 {u'linkedinAccount': u'', u'username': u'abc@g...
1 {u'linkedinAccount': u'', u'username': u'abc@g...
[2 rows x 1 columns]
In [4]:
print pd.concat([pd.DataFrame.from_dict(item, orient='index').T for item in df.results])
linkedinAccount username registrationGate firstName title lastName \
0 abc@gmail.com normal abc AA xyz
0 abc@gmail.com normal abc AA xyz
company telephone profilePicture \
0 XYZ {u'url': u'url.url.com', u'__type': u'File', u...
0 ABC {u'url': u'url.url.com', u'__type': u'File', u...
location updatedAt email createdAt \
0 2014-03-27T23:24:20.220Z abc@gmail.com 2014-03-27T23:21:48.758Z
0 2014-03-27T23:24:20.220Z abc@gmail.com 2014-03-27T23:21:48.758Z
zipcode
0 00000
0 00000
[2 rows x 14 columns]
然后,您可能需要考虑如何处理profilePicture
列.您可以执行链接中建议的@ U2EF1.但是我可能会将该列分为三列pfPIC_url
,pfPIC_type
,pfPIC_name
Then you may want to think about how to deal the the profilePicture
column. You can do what @U2EF1 suggested in the link. But I would probably just break that column into three columns pfPIC_url
, pfPIC_type
, pfPIC_name
这篇关于在Pandas中将嵌套的JSON数据作为数据框访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!