无法使用Python遍历分页的API响应 [英] Unable to loop through paged API responses with Python

查看：126 发布时间：2019/11/23 22:09:02 python json python-2.7 loops hubspot

本文介绍了无法使用Python遍历分页的API响应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，我正在抓这个.使用HubSpot的API，我需要获取客户的门户"(帐户)中所有公司的列表.可悲的是，标准API调用一次仅返回100家公司.当它确实返回响应时，它包含两个参数，这些参数使通过响应进行分页成为可能.

So, i'm scratching my head with this one. Using HubSpot's API, i need to get a list of ALL the companies in my client's "portal" (account). Sadly, the standard API call only returns 100 companies at a time. When it does return a response, it includes two parameters which make paging through responses possible.

其中一个是"has-more": True(这让您知道是否可以再看到更多页面)，另一个是"offset":12345678(用于抵消请求的时间戳.)

One of those is "has-more": True (this lets you know if you can expect any more pages) and the other is "offset":12345678 (the timestamp to offset the request by.)

这两个参数是您可以传递回下一个API调用以获取下一页的内容.因此，例如，最初的API调用可能类似于:

These two parameters are things you can pass back into the next API call to get the next page. So for example, the initial API call might look like:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)

后续通话可能如下:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)

所以这是我到目前为止尝试过的:

So this is what i've tried so far:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os.path
import requests
import json
import csv
import glob2
import shutil
import time
import time as howLong
from time import sleep
from time import gmtime, strftime

HubSpot_Customer_Portal_ID = "XXXXXX"

wta_hubspot_api_key = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

findCSV = glob2.glob('*contact*.csv')

theDate = time=strftime("%Y-%m-%d", gmtime())
theTime = time=strftime("%H:%M:%S", gmtime())

try:
    testData = findCSV[0]
except IndexError:
    print ("\nSyncronisation attempted on {date} at {time}: There are no \"contact\" CSVs, please upload one and try again.\n").format(date=theDate, time=theTime)
    print("====================================================================================================================\n")
    sys.exit()

for theCSV in findCSV:

    def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            offset = create_get_recent_companies_response.json()[u'offset']
            hasMore = create_get_recent_companies_response.json()[u'has-more']

            while hasMore == True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    get_more_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    get_more_companies_call_response = requests.get(get_more_companies_call, headers=headers)
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

    if __name__ == "__main__":
        get_companies()
        sys.exit()

问题在于，它只会不断返回相同的初始100个结果；发生这种情况是因为参数"has-more":True在初始调用时为true，因此它将继续返回相同的值...

The problem is that it just keeps returning the same intitial 100 results; this is happening because the parameter "has-more":True is true on the initial call, so it'll just keep returning the same ones...

我的理想情况是，我能够解析大约120个响应页面中的所有公司(大约有12000个公司).当我浏览每个页面时，我想将其JSON内容追加到列表中，以便最终得到包含所有120个页面的JSON响应的列表，以便我可以解析该列表以用于其他功能

My ideal scenario is that I'm able to parse ALL the companies across approximately 120 response pages (there are around 12000 companies). As I pass through each page, i'd like to append it's JSON content to a list, so that eventually I have this list which contains the JSON responses of all 120 pages, so that I can parse that list for use in a different function.

我迫切需要一个解决方案:(

I am in desperate need of a solution :(

这是我要在主脚本中替换的功能:

            def get_companies():

                create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/recent/modified?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
                headers = {'content-type': 'application/json'}
                create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
                if create_get_recent_companies_response.status_code == 200:

                    for i in create_get_recent_companies_response.json()[u'results']:
                        company_name = i[u'properties'][u'name'][u'value']
                        #print(company_name)
                        if row[0].lower() == str(company_name).lower():
                            contact_company_id = i[u'companyId']
                            #print(contact_company_id)
                            return contact_company_id
                else:
                    print("Something went wrong, check the supplied field values.\n")
                    #print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

推荐答案

问题似乎是:

您在第一个电话中获得了抵销，但对此电话返回的实际公司数据不做任何处理.
然后在while循环中使用相同的偏移量；您再也不会在后续通话中使用新的了.这就是为什么您每次都会得到同一家公司的原因.

我认为get_companies()的这段代码应该适合您.显然，我无法对其进行测试，但希望可以:

I think this code for get_companies() should work for you. I can't test it, obviously, but hopefully it is OK:

def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            while True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))
                offset = create_get_recent_companies_response.json()[u'offset']
                hasMore = create_get_recent_companies_response.json()[u'has-more']
                if not hasMore:
                    break
                else:
                    create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

严格来说，不需要break之后的else，但它与 Python禅显式优于隐式"

Strictly, the else after the break isn't required but it is in keeping with the Zen of Python "Explicit is better than implicit"

请注意，您只需要检查一次200响应代码，如果循环中出现问题，您将错过它.您可能应该将所有调用放入循环中，并每次都检查是否有正确的响应.

Note that you are only checking for a 200 response code once, if something goes wrong inside your loop you will miss it. You should probably put all your calls inside the loop and check for a proper response every time.

这篇关于无法使用Python遍历分页的API响应的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法使用Python遍历分页的API响应 [英] Unable to loop through paged API responses with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

无法使用Python遍历分页的API响应 [英] Unable to loop through paged API responses with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭