解析json文件 [英] Parsing json files

查看:108
本文介绍了解析json文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是用于可视化分析.json文件中获得的一组推文的代码.解释后,会在map()函数中显示错误.有什么办法解决吗?

Below is the code for visual analysis of a set of tweets obtained in a .json file. Upon interpreting , an error is shown at the map() function. Any way to fix it?

import json
import pandas as pd
import matplotlib.pyplot as plt


tweets_data_path = 'import_requests.txt'

tweets_data = []
tweets_file = open(tweets_data_path, "r")

for line in tweets_file:
   try:
    tweet = json.loads(line)
    tweets_data.append(tweet)
   except:
      continue

print(len(tweets_data))

tweets = pd.DataFrame()

tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)

这些是导致我获得上述代码的"ValueError"消息的行:

These are the lines leading up to the 'ValueError' message I am getting for the above code :

回溯(最近通话最近): 在第21行的文件"tweet_len.py"中 tweets ['text'] = map(lambda tweet:tweet ['text'],tweets_data)
setitem 中的文件"/usr/lib/python3/dist-packages/pandas/core/frame.py",行1887 self._set_item(键,值)
_set_item
中第1966行的文件"/usr/lib/python3/dist-packages/pandas/core/frame.py" self._ensure_valid_index(值) _ensure_valid_index
中的文件"/usr/lib/python3/dist-packages/pandas/core/frame.py",行1943 引发ValueError('无法设置没有定义索引的框架' ValueError:无法设置没有定义索引和无法转换为系列的值的框架

Traceback (most recent call last): File "tweet_len.py", line 21, in tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1887, in setitem self._set_item(key, value)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1966, in _set_item
self._ensure_valid_index(value) File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1943, in _ensure_valid_index
raise ValueError('Cannot set a frame with no defined index ' ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

我正在使用Python3.

I am using Python3.

以下是收集的Twitter数据的示例(.json格式).

EDIT : Below is a sample of the twitter data collected ( .json format).

{
    "created_at": "Sat Mar 05 05:47:23 +0000 2016",
    "id": 705993088574033920,
    "id_str": "705993088574033920",
    "text": "Tumi Inc. civil war: Staff manning US ceasefire hotline 'can't speak Arabic' #fakeheadlinebot #learntocode #makeatwitterbot #javascript",
    "source": "\u003ca href=\"http://javascriptiseasy.com\" rel=\"nofollow\"\u003eJavaScript is Easy\u003c/a\u003e",
    "truncated": false,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
        "id": 4382400263,
        "id_str": "4382400263",
        "name": "JavaScript is Easy",
        "screen_name": "javascriptisez",
        "location": "Your Console",
        "url": "http://javascriptiseasy.com",
        "description": "Get learning!",
        "protected": false,
        "verified": false,
        "followers_count": 167,
        "friends_count": 68,
        "listed_count": 212,
        "favourites_count": 11,
        "statuses_count": 55501,
        "created_at": "Sat Dec 05 11:18:00 +0000 2015",
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "profile_background_color": "000000",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
        "profile_background_tile": false,
        "profile_link_color": "FFCC4D",
        "profile_sidebar_border_color": "000000",
        "profile_sidebar_fill_color": "000000",
        "profile_text_color": "000000",
        "profile_use_background_image": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/4382400263/1449314370",
        "default_profile": false,
        "default_profile_image": false,
        "following": null,
        "follow_request_sent": null,
        "notifications": null
    },
    "geo": null,
    "coordinates": null,
    "place": null,
    "contributors": null,
    "is_quote_status": false,
    "retweet_count": 0,
    "favorite_count": 0,
    "entities": {
        "hashtags": [{
            "text": "fakeheadlinebot",
            "indices": [77, 93]
        }, {
            "text": "learntocode",
            "indices": [94, 106]
        }, {
            "text": "makeatwitterbot",
            "indices": [107, 123]
        }, {
            "text": "javascript",
            "indices": [124, 135]
        }],
        "urls": [],
        "user_mentions": [],
        "symbols": []
    },
    "favorited": false,
    "retweeted": false,
    "filter_level": "low",
    "lang": "en",
    "timestamp_ms": "1457156843690"
}

推荐答案

我认为您可以使用

I think you can use read_json:

import pandas as pd

df = pd.read_json('file.json')
print df.head()

                       contributors  coordinates          created_at entities  \
contributors_enabled            NaN          NaN 2016-03-05 05:47:23      NaN   
created_at                      NaN          NaN 2016-03-05 05:47:23      NaN   
default_profile                 NaN          NaN 2016-03-05 05:47:23      NaN   
default_profile_image           NaN          NaN 2016-03-05 05:47:23      NaN   
description                     NaN          NaN 2016-03-05 05:47:23      NaN   

                       favorite_count favorited filter_level  geo  \
contributors_enabled                0     False          low  NaN   
created_at                          0     False          low  NaN   
default_profile                     0     False          low  NaN   
default_profile_image               0     False          low  NaN   
description                         0     False          low  NaN   

                                       id              id_str  \
contributors_enabled   705993088574033920  705993088574033920   
created_at             705993088574033920  705993088574033920   
default_profile        705993088574033920  705993088574033920   
default_profile_image  705993088574033920  705993088574033920   
description            705993088574033920  705993088574033920   

                                    ...                is_quote_status  lang  \
contributors_enabled                ...                          False    en   
created_at                          ...                          False    en   
default_profile                     ...                          False    en   
default_profile_image               ...                          False    en   
description                         ...                          False    en   

                       place  retweet_count  retweeted  \
contributors_enabled     NaN              0      False   
created_at               NaN              0      False   
default_profile          NaN              0      False   
default_profile_image    NaN              0      False   
description              NaN              0      False   

                                                                  source  \
contributors_enabled   <a href="http://javascriptiseasy.com" rel="nof...   
created_at             <a href="http://javascriptiseasy.com" rel="nof...   
default_profile        <a href="http://javascriptiseasy.com" rel="nof...   
default_profile_image  <a href="http://javascriptiseasy.com" rel="nof...   
description            <a href="http://javascriptiseasy.com" rel="nof...   

                                                                    text  \
contributors_enabled   Tumi Inc. civil war: Staff manning US ceasefir...   
created_at             Tumi Inc. civil war: Staff manning US ceasefir...   
default_profile        Tumi Inc. civil war: Staff manning US ceasefir...   
default_profile_image  Tumi Inc. civil war: Staff manning US ceasefir...   
description            Tumi Inc. civil war: Staff manning US ceasefir...   

                                 timestamp_ms  truncated  \
contributors_enabled  2016-03-05 05:47:23.690      False   
created_at            2016-03-05 05:47:23.690      False   
default_profile       2016-03-05 05:47:23.690      False   
default_profile_image 2016-03-05 05:47:23.690      False   
description           2016-03-05 05:47:23.690      False   

                                                 user  
contributors_enabled                            False  
created_at             Sat Dec 05 11:18:00 +0000 2015  
default_profile                                 False  
default_profile_image                           False  
description                             Get learning!  

[5 rows x 25 columns]

这篇关于解析json文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆