如何将响应从request.get转换为DataFrame? [英] how to convert response from request.get to DataFrame?
问题描述
我有以下代码:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
out[name[:-1]] = x
else:
out[name[:-1]] = x
flatten(y)
return out
def importdata(data):
responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
return responsedata
def generatejson(response):
# Generate flat json file
sample_object = pd.DataFrame(response.json())['results'].to_dict()
flat = {k: flat_json(v) for k, v in sample_object.items()}
return json.dumps(flat, sort_keys=True)
response = importdata(data)
flat_json = generatejson(response)
importdata(data)
返回的示例:
https://textuploader.com/dz30p
此代码向API发送获取请求,获取解析结果并生成JSON文件.
This code send get request to API get the result parse them and generate a JSON file.
这很好.
现在,我想修改importdata
函数以支持分页(合并在一起的多个调用).
Now, I want to modify the importdata
function to support pagination (Multiple calls that are merged together).
所以我写了这段代码:
def impordatatnew():
...
is_valid = True
value_offset = 0
value_limit = 100
datarALL = []
while is_valid:
is_valid = False
urlApi = 'http://....?offset={1}&limit={2}&startDate={0}'.format(
requestedDate,value_offset,value_limit)
responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
if responsedata.status_code == 200: # Use status code to check request status, 200 for successful call
responsedata = responsedata.text
value_offset = value_offset + value_limit
# to do: merge the result of the get request
jsondata = json.loads(responsedata)
if "results" in jsondata:
if jsondata["results"]:
is_valid = True
if is_valid:
# concat array by + operand
datarALL = datarALL + jsondata["results"]
else:
#TODO handle other codes
print responsedata.status_code
return datarALL
此代码使用分页.它连接到API并逐页获取结果,并将它们组合在一起成为一个列表.如果我这样做:
This code using pagination. It connects to API gets results page by page and combine them together into a list. If I do:
print json.dumps(datarALL)
我看到了组合的JSON,因此效果很好.
转储示例:
https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89
print json.dumps(datarALL)
I see the combined JSON so this works great.
Example for the dump:
https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89
return datarALL
的示例:
https://textuploader.com/dz39d
我的问题:
我似乎无法使impordatatnew()
的返回值与generatejson()
一起使用.如何使impordatatnew()
的返回值与
generatejson()
?我尝试进行如下修改:
I can't seems to make the return value of impordatatnew()
to work with generatejson()
. How can I make the return value of impordatatnew()
compatible with
generatejson()
? I tried to modify as follows:
def generatejsonnew(response):
#Generate flat json file
sample_object = pd.DataFrame(response.json()).to_dict()
flat = {k: flat_json(v) for k, v in sample_object.items()}
return json.dumps(flat, sort_keys=True)
它给出:
sample_object = pd.DataFrame(response.json()).to_dict()AttributeError:列表"对象没有属性"json" 我了解这一点,但我不知道该如何解决.我似乎无法实现这种转换.
sample_object = pd.DataFrame(response.json()).to_dict() AttributeError: 'list' object has no attribute 'json' I understand that but I don't know how to solve this. I can't seems to make this conversion works.
推荐答案
它不起作用,因为您这样做:
It's not working because you do this:
responsedata = responsedata.text
jsondata = json.loads(responsedata)
datarALL = datarALL + jsondata["results"]
您在这里似乎要逐步建立一个列表.您可以将其简化为:
It seems like what you're doing here is to incrementally build a list. You could simplify it to:
dataALL += responsedata.json()
问题稍后出现:
pd.DataFrame(response.json())
这是因为您要对已经从JSON解析到Python列表的内容再次调用json()
.因此出现错误消息.
This is because you are calling json()
again on something which has already been parsed from JSON to a Python list. Hence the error message.
但是真正的难题是为什么要这样做:
But the real head-scratcher is why you're doing this:
sample_object = pd.DataFrame(response.json()).to_dict()
除了将列表重新格式化为字典以外,实际上并没有使用熊猫".当然,还有一种更直接的方法,例如使用for
循环来构建dict(确切地说,没有样本数据我们就无法分辨).
Which isn't really "using Pandas" other than to reformulate a list into a dict. Surely there is a more direct way of doing that, such as using a for
loop to build the dict (exactly how, we can't tell without sample data).
无论如何,如果要填充DataFrame,只需删除.json()
部分,它的工作方式应与原始的非分页代码类似.
Anyway, if you want to populate a DataFrame, simply remove the .json()
part and it should work similarly to your original non-paginating code.
但是,更有效的方法是使用原始代码在每个页面上简单地构造一个DataFrame,然后调用pd.concat(pages)
,其中pages
是这些DataFrame的列表.然后无需构建dataALL
.
But the far more efficient way to simply construct a DataFrame per page using your original code, and then call pd.concat(pages)
where pages
is the list of those DataFrames. No need to build dataALL
then.
最终,您的代码可以简化得更多,最终像这样:
Ultimately your code can be simplified so much more, to end up like this:
pd.concat(pd.read_json(url, ...) for url in all_page_urls)
也就是说,首先使用for
循环来构建all_page_urls
,然后使用上述单行代码将所有数据收集到单个DataFrame中.
That is, first you use a for
loop to build all_page_urls
, then you use the above one-liner to collect all the data into a single DataFrame.
参考: https://pandas .pydata.org/pandas-docs/stable/generation/pandas.read_json.html#pandas.read_json
这篇关于如何将响应从request.get转换为DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!