挖掘json文件 [英] Digging down json file
问题描述
我一直在尝试以多种方式(以及stackoverflow中的许多问题)对深层json文件进行规范化.
我尝试使用.apply(pd.Series)
,但在许多级别的词典中都不太好.
I have been trying in many ways (and by many questions in stackoverflow) to normalize a deep json file.
I have tried with .apply(pd.Series)
, not great with many levels of dictionary.
我目前正在尝试使用json_normalize
,它已经给出了一些结果.我想我知道该功能的工作原理,但是我的问题是我不知道如何浏览字典.
I am currently trying with json_normalize
and it has given some results. I think I know how the function works and that my problem is that I don't know how to navigate through a dictionary.
到目前为止,我已经能够深入到2个级别.
So far, I have been able to dig into 2 levels.
import json
import pandas as pd
from pandas.io.json import json_normalize
raw = json.load(open('authors.json'))
raw2 = json_normalize(raw['hits']['hits'])
它给了我我所需要的(至少是第一个级别).但是我不知道该怎么做.
And it gives me what I need (at least the first levels). But I don't know how to go deeper.
我尝试过:
raw2 = json_normalize(raw['hits']['hits'][0])
raw2 = json_normalize(raw['hits']['hits']['_source.authors'])
TypeError: string indices must be integers
还有更多,但是仅仅在不了解的情况下随机尝试是不正确的方法.我想我的问题是:
And many more, but just randomly trying stuff without understanding is not the right way. I guess my questions are:
- 我如何知道如何包含下一个级别(json中的
{}
与[]
)? - 有没有视觉上的方式来表达这一点?
- How do I know how to include the next level (
{}
vs[]
in the json)? - Is there any visual way to represent this?
奇怪的是,这个主题没有在网上发展.我越来越多地使用json
数据.
It is weird that this topic is not developed more online. Day by day I work more and more with json
data.
_id _index _score _source.authors _source.deleted _source.description _source.doi _source.is_valid _source.issue _source.journal ... _source.rating_versatility_weighted _source.review_count _source.tag _source.title _source.userAvg _source.user_id _source.venue_name _source.views_count _source.volume _type
0 7CB3F2AD scibase_listings 1 None 0 None 1 None Physical Review Letters ... 0 0 [mass spectra, elementary particles, bound sta... Evidence for a new meson: A quasinuclear NN-ba... 0 None Physical Review Letters 0 None listing
1 7AF8EBC3 scibase_listings 1 [{'affiliations': ['Punjabi University'], 'aut... 0 None 1 None Journal of Industrial Microbiology & Biotechno... ... 0 0 [flow rate, operant conditioning, packed bed r... Development of a stable continuous flow immobi... 0 None Journal of Industrial Microbiology & Biotechno... 0 None listing
2 7521A721 scibase_listings 1 [{'author_id': '7FF872BC', 'author_name': 'bar... 0 None 1 None The American Historical Review ... 0 0 [social movements] Feminism and the women's movement : dynamics o... 0 None The American Historical Review 0 None listing
这是文件的一部分(这是3级,1级和2级是命中,命中).
This is a chunk of the file (this is level 3, level 1 and 2 are, hits, hits).
{'_shards': {'failed': 0, 'successful': 5, 'total': 5},
'hits': {'hits': [{'_id': '7CB3F2AD',
'_index': 'scibase_listings',
"_type": "listing",
"_id": "7FDFEB02",
"_score": 1,
"_source": {
"userAvg": 0,
"meta_keywords": null,
"views_count": 0,
"rating_reproducability": 0,
"rating_versatility": 0,
"rating_innovation": 0,
"tag": null,
"rating_reproducibility_weighted": 0,
"meta_description": null,
"review_count": 0,
"rating_avg_weighted": 0,
"venue_name": "The American Historical Review",
"rating_num_weighted": 0,
"is_valid": 1,
"normalized_venue_name": "american historical review",
"rating_clarity": 0,
"description": null,
"deleted": 0,
"journal": "The American Historical Review",
"volume": null,
"link": null,
"authors": [
{
"author_id": "166468F4",
"author_name": "a bowdoin van riper"
},
{
"author_id": "81070854",
"author_name": "jeffrey h schwartz"
}
],
"user_id": null,
"pub_date": "1994-01-01 00:00:00",
"pages": null,
"doi": "",
"issue": null,
"rating_versatility_weighted": 0,
"pubtype": null,
"title": "Men Among the Mammoths: Victorian Science and the Discovery of Human Prehistory",
"rating_clarity_weighted": 0,
"rating_innovation_weighted": 0
}
},
{
"_index": "scibase_listings",
"_type": "listing",
"_id": "7538108B",
"_score": 1,
"_source": {
"userAvg": 0,
"meta_keywords": null,
"views_count": 0,
"rating_reproducability": 0,
"rating_versatility": 0,
"rating_innovation": 0,
"tag": null,
"rating_reproducibility_weighted": 0,
"meta_description": null,
"review_count": 0,
"rating_avg_weighted": 0,
"venue_name": "The American Historical Review",
"rating_num_weighted": 0,
"is_valid": 1,
"normalized_venue_name": "american historical review",
"rating_clarity": 0,
"description": null,
"deleted": 0,
"journal": "The American Historical Review",
"volume": null,
"link": null,
"authors": [
{
"affiliations": [
"Pennsylvania State University"
],
"author_id": "7E15BDFA",
"author_name": "roger l geiger"
}
],
"user_id": null,
"pub_date": "2013-06-01 00:00:00",
"pages": null,
"doi": "10.1093/ahr/118.3.896a",
"issue": null,
"rating_versatility_weighted": 0,
"pubtype": null,
"title": "Elizabeth Popp Berman. Creating the Market University: How Academic Science Became an Economic Engine.",
"rating_clarity_weighted": 0,
"rating_innovation_weighted": 0
}
}
]
推荐答案
可以尝试一下:
json_normalize(raw['hits'],'hits','_source','authors','affiliations')
这篇关于挖掘json文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!