使用循环的上一次迭代的结果更新循环的输入 [英] Update the input of a loop with a result from the previous iteration of the loop

查看:77
本文介绍了使用循环的上一次迭代的结果更新循环的输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(我已经添加了google-analytics api标签,但我怀疑我的问题更像是我的循环方法的基本缺陷,如下所述)

我正在使用Python查询Google Analytics(分析)API(V4).已经使用我的凭据成功连接到API,我正试图遍历API返回的每个10k结果集,以获取完整的结果集.

查询API时,您传递的命令看起来像这样:

  {'reportRequests':[{'viewId':'1234567',#我的实际视图ID当然在这里'pageToken':'go',#最初可以是任何字符串(我认为吗?)'pageSize':10000,'samplingLevel':'LARGE','dateRanges':[{'startDate':'2018-06-01','endDate':'2018-07-13'}],'dimensions':[{'name':'ga:date'},{'name':'ga:dimension1'},{'name':'ga:dimension2'},{'name':'ga:userType'},{'name':'ga:landingpagePath'},{'name':'ga:deviceCategory'}]],'metrics':[{'expression':'ga:sessions'},{'expression':'ga:bounces'},{'expression':'ga:goal1Completions'}]}]}} 

根据

解决方案

这是我要解决的方法:

  def main(查询):全局pageToken,store_response而pageToken!=":#调试,希望在每次迭代中都能看到打印输出(我没有)打印(pageToken)analytics = initialize_analyticsreporting()响应= get_report(分析,查询)#注意这已经改变了-您使用的是'pageToken'作为密钥#会覆盖每个响应store_response [pageToken] =响应pageToken = response ['reports'] [0] ['nextPageToken']#更新pageTokenquery ['reportRequests'] [0] ['pageToken'] = pageToken#更新查询return(False)#实际上不需要该函数返回任何内容,只需将其追加到全局store_response即可. 

即手动更新查询数据结构,并使用 pageToken 作为字典关键字存储每个响应.

大概最后一页具有''作为 nextPageToken ,因此您的循环将停止.

(I've added the google-analytics api tags but I suspect that my issue is more a fundamental flaw in my approach to a loop, detailed below)

I'm using Python to query the Google Analytics API (V4). Having already successfully connected to the API with my credentials, I'm trying to loop over each 10k result set returned by the API to get the full results set.

When querying the API you pass a dict that looks something like this:

{'reportRequests':[{'viewId': '1234567', # my actual view id goes here of course
    'pageToken': 'go', # can be any string initially (I think?)
    'pageSize': 10000,
    'samplingLevel': 'LARGE',
    'dateRanges': [{'startDate': '2018-06-01', 'endDate': '2018-07-13'}],
    'dimensions': [{'name': 'ga:date'}, {'name': 'ga:dimension1'}, {'name': 'ga:dimension2'}, {'name': 'ga:userType'}, {'name': 'ga:landingpagePath'}, {'name': 'ga:deviceCategory'}],
    'metrics': [{'expression': 'ga:sessions'}, {'expression': 'ga:bounces'}, {'expression': 'ga:goal1Completions'}]}]}

According to the documentation on Google Analytics API V4 on the pageToken parameter:

"A continuation token to get the next page of the results. Adding this to the request will return the rows after the pageToken. The pageToken should be the value returned in the nextPageToken parameter in the response to the reports.batchGet request. "

My understanding is that I need to query the API in chunks of 10,000 (max query result size allowed) and that to do this I must pass the value of nextPageToken field returned in each query result into the new query.

In researching, it sounds like the nextPageToken field will be a empty string when all the results have been returned.

So, I tried a while loop. To get to the loop stage I built some functions:

## generates the dimensions in the right format to use in the query
def generate_dims(dims):
    dims_ar = []
    for i in dims:
        d = {'name': i}
        dims_ar.append(d)
    return(dims_ar)

## generates the metrics in the right format to use in the query
def generate_metrics(mets):
    mets_ar = []
    for i in mets:
        m = {'expression': i}
        mets_ar.append(m)
    return(mets_ar)

## generate the query dict
def query(pToken, dimensions, metrics, start, end):
    api_query = {
            'reportRequests': [
                    {'viewId': VIEW_ID,
                     'pageToken': pToken,          
                     'pageSize': 10000,
                     'samplingLevel': 'LARGE',
                     'dateRanges': [{'startDate': start, 'endDate': end}],
                     'dimensions': generate_dims(dimensions),
                     'metrics': generate_metrics(metrics)
                     }]
    }
    return(api_query)

Example output of the above 3 functions:

sessions1_qr = query(pToken = pageToken,
                     dimensions = ['ga:date', 'ga:dimension1', 'ga:dimension2',
                                   'ga:userType', 'ga:landingpagePath',
                                   'ga:deviceCategory'],
                     metrics = ['ga:sessions', 'ga:bounces', 'ga:goal1Completions'],
                     start = '2018-06-01',
                     end = '2018-07-13')

The results of this look like the first code block in this post.

So far so good. Here's the loop I attempted:

def main(query):
    global pageToken, store_response

    # debugging, was hoping to see print output on each iteration (I didn't)
    print(pageToken)

    while pageToken != "":
        analytics = initialize_analyticsreporting()
        response = get_report(analytics, query)
        pageToken = response['reports'][0]['nextPageToken'] # < IT ALL COMES DOWN TO THIS LINE HERE
        store_response['pageToken'] = response

    return(False) # don't actually need the function to return anything, just append to global store_response.

Then I tried to run it:

pageToken = "go" # can be any string to get started
store_response = {}
sessions1 = main(sessions1_qr)

The following happens:

  • The console remains busy
  • The line print(pageToken) print's once to the console, the initial value of pageToken
  • store_response dict has one item in it, not many as was hoped for

So, it looks like my loop runs once only.

Having stared at the code I suspect it has something to do with the value of query parameter that I pass to main(). When I initially call main() the value of query is the same as the first code block above (variable sessions1_qr, the dict with all the API call parameters). On each loop iteration this is supposed to update so that the value of pageToken is replaced with the responses nextPageToken value.

Put another way and in short, I need to update the input of the loop with a result from the previous iteration of the loop. My logic is clearly flawed so any help very much appreciated.

Adding some screen shots per comments discussion:

解决方案

This is the approach I would take to solve this:

def main(query):
    global pageToken, store_response

    while pageToken != "":
        # debugging, was hoping to see print output on each iteration (I didn't)
        print(pageToken)
        analytics = initialize_analyticsreporting()
        response = get_report(analytics, query)

        # note that this has changed -- you were using 'pageToken' as a key
        # which would overwrite each response
        store_response[pageToken] = response

        pageToken = response['reports'][0]['nextPageToken'] # update the pageToken
        query['reportRequests'][0]['pageToken'] = pageToken # update the query


    return(False) # don't actually need the function to return anything, just append to global store_response.

i.e. update the query data structure manually, and store each of the responses with the pageToken as the dictionary key.

Presumably the last page has '' as the nextPageToken so your loop will stop.

这篇关于使用循环的上一次迭代的结果更新循环的输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆