如何将响应从request.get转换为DataFrame? [英] how to convert response from request.get to DataFrame?

查看:428
本文介绍了如何将响应从request.get转换为DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            out[name[:-1]] = x
        else:
            out[name[:-1]] = x
    flatten(y)
    return out 

def importdata(data):
    responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
    return responsedata


def generatejson(response):
    # Generate flat json file
    sample_object = pd.DataFrame(response.json())['results'].to_dict()
    flat = {k: flat_json(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

response = importdata(data)
flat_json = generatejson(response)

importdata(data)返回的示例: https://textuploader.com/dz30p

此代码向API发送获取请求,获取解析结果并生成JSON文件.

This code send get request to API get the result parse them and generate a JSON file.

这很好.

现在,我想修改importdata函数以支持分页(合并在一起的多个调用).

Now, I want to modify the importdata function to support pagination (Multiple calls that are merged together).

所以我写了这段代码:

def impordatatnew():
...
is_valid = True
value_offset = 0
value_limit = 100
datarALL = []
while is_valid:
        is_valid = False
        urlApi = 'http://....?offset={1}&limit={2}&startDate={0}'.format(
            requestedDate,value_offset,value_limit)
        responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
        if responsedata.status_code == 200:  # Use status code to check request status, 200 for successful call
            responsedata = responsedata.text   
            value_offset = value_offset + value_limit
            # to do: merge the result of the get request
            jsondata = json.loads(responsedata)
            if "results" in jsondata:
                if jsondata["results"]:
                    is_valid = True
            if is_valid:
                # concat array by + operand
                datarALL = datarALL + jsondata["results"]
        else:
            #TODO handle other codes
            print responsedata.status_code
return datarALL

此代码使用分页.它连接到API并逐页获取结果,并将它们组合在一起成为一个列表.如果我这样做:

This code using pagination. It connects to API gets results page by page and combine them together into a list. If I do:

print json.dumps(datarALL) 我看到了组合的JSON,因此效果很好. 转储示例: https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89

print json.dumps(datarALL) I see the combined JSON so this works great. Example for the dump: https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89

return datarALL的示例:

https://textuploader.com/dz39d

我的问题:

我似乎无法使impordatatnew()的返回值与generatejson()一起使用.如何使impordatatnew()的返回值与 generatejson()?我尝试进行如下修改:

I can't seems to make the return value of impordatatnew() to work with generatejson(). How can I make the return value of impordatatnew() compatible with generatejson() ? I tried to modify as follows:

def generatejsonnew(response):
    #Generate flat json file
    sample_object = pd.DataFrame(response.json()).to_dict()
    flat = {k: flat_json(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

它给出:

sample_object = pd.DataFrame(response.json()).to_dict()AttributeError:列表"对象没有属性"json" 我了解这一点,但我不知道该如何解决.我似乎无法实现这种转换.

sample_object = pd.DataFrame(response.json()).to_dict() AttributeError: 'list' object has no attribute 'json' I understand that but I don't know how to solve this. I can't seems to make this conversion works.

推荐答案

它不起作用,因为您这样做:

It's not working because you do this:

responsedata = responsedata.text   
jsondata = json.loads(responsedata)
datarALL = datarALL + jsondata["results"]

您在这里似乎要逐步建立一个列表.您可以将其简化为:

It seems like what you're doing here is to incrementally build a list. You could simplify it to:

dataALL += responsedata.json()

问题稍后出现:

pd.DataFrame(response.json())

这是因为您要对已经从JSON解析到Python列表的内容再次调用json().因此出现错误消息.

This is because you are calling json() again on something which has already been parsed from JSON to a Python list. Hence the error message.

但是真正的难题是为什么要这样做:

But the real head-scratcher is why you're doing this:

sample_object = pd.DataFrame(response.json()).to_dict()

除了将列表重新格式化为字典以外,实际上并没有使用熊猫".当然,还有一种更直接的方法,例如使用for循环来构建dict(确切地说,没有样本数据我们就无法分辨).

Which isn't really "using Pandas" other than to reformulate a list into a dict. Surely there is a more direct way of doing that, such as using a for loop to build the dict (exactly how, we can't tell without sample data).

无论如何,如果要填充DataFrame,只需删除.json()部分,它的工作方式应与原始的非分页代码类似.

Anyway, if you want to populate a DataFrame, simply remove the .json() part and it should work similarly to your original non-paginating code.

但是,更有效的方法是使用原始代码在每个页面上简单地构造一个DataFrame,然后调用pd.concat(pages),其中pages是这些DataFrame的列表.然后无需构建dataALL.

But the far more efficient way to simply construct a DataFrame per page using your original code, and then call pd.concat(pages) where pages is the list of those DataFrames. No need to build dataALL then.

最终,您的代码可以简化得更多,最终像这样:

Ultimately your code can be simplified so much more, to end up like this:

pd.concat(pd.read_json(url, ...) for url in all_page_urls)

也就是说,首先使用for循环来构建all_page_urls,然后使用上述单行代码将所有数据收集到单个DataFrame中.

That is, first you use a for loop to build all_page_urls, then you use the above one-liner to collect all the data into a single DataFrame.

参考: https://pandas .pydata.org/pandas-docs/stable/generation/pandas.read_json.html#pandas.read_json

这篇关于如何将响应从request.get转换为DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆