无法解析JSON文件,持续获取ValueError:额外数据 [英] Unable to parse JSON file, keep getting ValueError: Extra Data

查看:203
本文介绍了无法解析JSON文件,持续获取ValueError:额外数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,从我先前的问题[在这里找到] [1]开始,我试图解析一个在@SiHa的帮助下成功下载的JSON文件. JSON的结构如下:

So, leading on from my prior issue [found here][1], I'm attempting to parse a JSON file that I've managed to download with @SiHa's help. The JSON is structured like so:

{"properties": [{"property": "name", "value": "A random company name"}, {"property": "companyId", "value": 123456789}]}{"properties": [{"property": "name", "value": "Another random company name"}, {"property": "companyId", "value": 31415999}]}{"properties": [{"property": "name", "value": "Yet another random company"}, {"property": "companyId", "value": 10101010}]}

我已经能够通过稍微修改@SiHa的代码来获得它:

I've been able to get this by slightly modifiying @SiHa's code:

def get_companies():
            create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
            headers = {'content-type': 'application/json'}
            create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
            if create_get_recent_companies_response.status_code == 200:
                while True:
                    for i in create_get_recent_companies_response.json()[u'companies']:

                        all_the_companies = { "properties": [
                                                    { "property": "name", "value": i[u'properties'][u'name'][u'value'] },
                                                    { "property": "companyId", "value": i[u'companyId'] }
                                                ]
                                            }

                        with open("all_the_companies.json", "a") as myfile:
                            myfile.write(json.dumps(all_the_companies))
                        #print(companyProperties)
                    offset = create_get_recent_companies_response.json()[u'offset']
                    hasMore = create_get_recent_companies_response.json()[u'has-more']
                    if not hasMore:
                        break
                    else:
                        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)


            else:
                print("Something went wrong, check the supplied field values.\n")
                print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

那是第一部分.现在,我尝试使用下面的代码提取两件事:1)name和2)companyId.

So that was part one. Now I'm trying to use the code below to extract two things: 1) the name and 2) the companyId.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os.path
import requests
import json
import csv
import glob2
import shutil
import time
import time as howLong
from time import sleep
from time import gmtime, strftime

# Local Testing Version
findCSV = glob2.glob('*contact*.csv')

theDate = time=strftime("%Y-%m-%d", gmtime())
theTime = time=strftime("%H:%M:%S", gmtime())

# Exception handling
try:
    testData = findCSV[0]
except IndexError:
    print ("\nSyncronisation attempted on {date} at {time}: There are no \"contact\" CSVs, please upload one and try again.\n").format(date=theDate, time=theTime)
    print("====================================================================================================================\n")
    sys.exit()

for theCSV in findCSV:

    def process_companies():
        with open('all_the_companies.json') as data_file:
            data = json.load(data_file)
            for i in data:
                company_name = data[i][u'name']
                #print(company_name)
                if row[0].lower() == company_name.lower():
                    contact_company_id = data[i][u'companyId']
                    #print(contact_company_id)
                    return contact_company_id

                else:
                    print("Something went wrong, check the \"get_companies()\" function.\n")
                    print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

    if __name__ == "__main__":
        start_time = howLong.time()
        process_companies()
        print("This operation took %s seconds.\n" % (howLong.time() - start_time))
        sys.exit()

不幸的是,它不起作用-我得到了以下追溯:

Unfortunately, its not working - I'm getting the following traceback:

Traceback (most recent call last):
  File "wta_parse_json.py", line 62, in <module>
    process_companies()
  File "wta_parse_json.py", line 47, in process_companies
    data = json.load(data_file)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 130 - line 1 column 1455831 (char 129 - 1455830)

我确保我使用的是json.dumps而不是json.dump来打开文件,但仍然无法正常工作. :(

I've made sure that i'm using json.dumps not json.dump to open the file, but still its not working. :(

我现在已经放弃了JSON,并尝试使用以下代码导出简单的CSV:

I've now given up on JSON, and am trying to export a simple CSV with the code below:

    def get_companies():
            create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
            headers = {'content-type': 'application/json'}
            create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
            if create_get_recent_companies_response.status_code == 200:
                while True:
                    for i in create_get_recent_companies_response.json()[u'companies']:

                        all_the_companies = "{name},{id}\n".format(name=i[u'properties'][u'name'][u'value'], id=i[u'companyId'])
                        all_the_companies.encode('utf-8')

                        with open("all_the_companies.csv", "a") as myfile:
                            myfile.write(all_the_companies)
                        #print(companyProperties)
                    offset = create_get_recent_companies_response.json()[u'offset']
                    hasMore = create_get_recent_companies_response.json()[u'has-more']
                    if not hasMore:
                        break
                    else:
                        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
  [1]: http://stackoverflow.com/questions/36148346/unable-to-loop-through-paged-api-responses-with-python

但是看起来这也不对-即使我已经阅读了格式问题,并添加了.encode('utf-8')添加项.我仍然最终得到以下回溯:

But it looks like this isn't right either - even though i've read up on the formatting issues, and have added the .encode('utf-8') additions. I still end up getting the following traceback:

Traceback (most recent call last):
  File "wta_get_companies.py", line 78, in <module>
    get_companies()
  File "wta_get_companies.py", line 57, in get_companies
    all_the_companies = "{name},{id}\n".format(name=i[u'properties'][u'name'][u'value'], id=i[u'companyId'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 3: ordinal not in range(128)

推荐答案

JSON数据一个接一个地包含三个Object;简化:

The JSON data has three Objects one after the other; simplified:

{ .. }{ .. }{ .. }

JSON标准不支持该功能. Python应该如何解析呢?自动将其包装在数组中?将其分配给三个不同的变量?只需使用第一个?

That's not something that's supported by the JSON standard. How is Python supposed to parse that? Automatically wrap it in an array? Assign it to three different variables? Just use the first one?

您可能想将其包装成一个数组,简化如下:

You probably want to wrap it in an array, simplified:

[{ .. },{ .. },{ .. }]

或完整:

[{"properties": [{"property": "name", "value": "A random company name"}, {"property": "companyId", "value": 123456789}]},{"properties": [{"property": "name", "value": "Another random company name"}, {"property": "companyId", "value": 31415999}]},{"properties": [{"property": "name", "value": "Yet another random company"}, {"property": "companyId", "value": 10101010}]}]

这篇关于无法解析JSON文件,持续获取ValueError:额外数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆