挖掘json文件 [英] Digging down json file

查看:69
本文介绍了挖掘json文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试以多种方式(以及stackoverflow中的许多问题)对深层json文件进行规范化. 我尝试使用.apply(pd.Series),但在许多级别的词典中都不太好.

I have been trying in many ways (and by many questions in stackoverflow) to normalize a deep json file. I have tried with .apply(pd.Series), not great with many levels of dictionary.

我目前正在尝试使用json_normalize,它已经给出了一些结果.我想我知道该功能的工作原理,但是我的问题是我不知道如何浏览字典.

I am currently trying with json_normalize and it has given some results. I think I know how the function works and that my problem is that I don't know how to navigate through a dictionary.

到目前为止,我已经能够深入到2个级别.

So far, I have been able to dig into 2 levels.

import json
import pandas as pd
from pandas.io.json import json_normalize
raw = json.load(open('authors.json'))
raw2 = json_normalize(raw['hits']['hits'])

它给了我我所需要的(至少是第一个级别).但是我不知道该怎么做.

And it gives me what I need (at least the first levels). But I don't know how to go deeper.

我尝试过:

raw2 = json_normalize(raw['hits']['hits'][0])
raw2 = json_normalize(raw['hits']['hits']['_source.authors'])
TypeError: string indices must be integers

还有更多,但是仅仅在不了解的情况下随机尝试是不正确的方法.我想我的问题是:

And many more, but just randomly trying stuff without understanding is not the right way. I guess my questions are:

  • 我如何知道如何包含下一个级别(json中的{}[])?
  • 有没有视觉上的方式来表达这一点?
  • How do I know how to include the next level ({} vs [] in the json)?
  • Is there any visual way to represent this?

奇怪的是,这个主题没有在网上发展.我越来越多地使用json数据.

It is weird that this topic is not developed more online. Day by day I work more and more with json data.

_id _index  _score  _source.authors _source.deleted _source.description _source.doi _source.is_valid    _source.issue   _source.journal ... _source.rating_versatility_weighted _source.review_count    _source.tag _source.title   _source.userAvg _source.user_id _source.venue_name  _source.views_count _source.volume  _type   
0   7CB3F2AD    scibase_listings    1   None    0   None        1   None    Physical Review Letters ... 0   0   [mass spectra, elementary particles, bound sta...   Evidence for a new meson: A quasinuclear NN-ba...   0   None    Physical Review Letters 0   None    listing
1   7AF8EBC3    scibase_listings    1   [{'affiliations': ['Punjabi University'], 'aut...   0   None        1   None    Journal of Industrial Microbiology & Biotechno...   ... 0   0   [flow rate, operant conditioning, packed bed r...   Development of a stable continuous flow immobi...   0   None    Journal of Industrial Microbiology & Biotechno...   0   None    listing
2   7521A721    scibase_listings    1   [{'author_id': '7FF872BC', 'author_name': 'bar...   0   None        1   None    The American Historical Review  ... 0   0   [social movements]  Feminism and the women's movement : dynamics o...   0   None    The American Historical Review  0   None    listing

这是文件的一部分(这是3级,1级和2级是命中,命中).

This is a chunk of the file (this is level 3, level 1 and 2 are, hits, hits).

{'_shards': {'failed': 0, 'successful': 5, 'total': 5},
 'hits': {'hits': [{'_id': '7CB3F2AD',
    '_index': 'scibase_listings',
            "_type": "listing",
            "_id": "7FDFEB02",
            "_score": 1,
            "_source": {
                "userAvg": 0,
                "meta_keywords": null,
                "views_count": 0,
                "rating_reproducability": 0,
                "rating_versatility": 0,
                "rating_innovation": 0,
                "tag": null,
                "rating_reproducibility_weighted": 0,
                "meta_description": null,
                "review_count": 0,
                "rating_avg_weighted": 0,
                "venue_name": "The American Historical Review",
                "rating_num_weighted": 0,
                "is_valid": 1,
                "normalized_venue_name": "american historical review",
                "rating_clarity": 0,
                "description": null,
                "deleted": 0,
                "journal": "The American Historical Review",
                "volume": null,
                "link": null,
                "authors": [
                    {
                        "author_id": "166468F4",
                        "author_name": "a bowdoin van riper"
                    },
                    {
                        "author_id": "81070854",
                        "author_name": "jeffrey h schwartz"
                    }
                ],
                "user_id": null,
                "pub_date": "1994-01-01 00:00:00",
                "pages": null,
                "doi": "",
                "issue": null,
                "rating_versatility_weighted": 0,
                "pubtype": null,
                "title": "Men Among the Mammoths: Victorian Science and the Discovery of Human Prehistory",
                "rating_clarity_weighted": 0,
                "rating_innovation_weighted": 0
            }
        },
        {
            "_index": "scibase_listings",
            "_type": "listing",
            "_id": "7538108B",
            "_score": 1,
            "_source": {
                "userAvg": 0,
                "meta_keywords": null,
                "views_count": 0,
                "rating_reproducability": 0,
                "rating_versatility": 0,
                "rating_innovation": 0,
                "tag": null,
                "rating_reproducibility_weighted": 0,
                "meta_description": null,
                "review_count": 0,
                "rating_avg_weighted": 0,
                "venue_name": "The American Historical Review",
                "rating_num_weighted": 0,
                "is_valid": 1,
                "normalized_venue_name": "american historical review",
                "rating_clarity": 0,
                "description": null,
                "deleted": 0,
                "journal": "The American Historical Review",
                "volume": null,
                "link": null,
                "authors": [
                    {
                        "affiliations": [
                            "Pennsylvania State University"
                        ],
                        "author_id": "7E15BDFA",
                        "author_name": "roger l geiger"
                    }
                ],
                "user_id": null,
                "pub_date": "2013-06-01 00:00:00",
                "pages": null,
                "doi": "10.1093/ahr/118.3.896a",
                "issue": null,
                "rating_versatility_weighted": 0,
                "pubtype": null,
                "title": "Elizabeth Popp Berman. Creating the Market University: How Academic Science Became an Economic Engine.",
                "rating_clarity_weighted": 0,
                "rating_innovation_weighted": 0
            }
        }
    ]

推荐答案

可以尝试一下:

json_normalize(raw['hits'],'hits','_source','authors','affiliations')

这篇关于挖掘json文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆