Youtube 数据 Api 页面令牌问题(python) [英] Youtube Data Api Page Token Question (python)

查看:51
本文介绍了Youtube 数据 Api 页面令牌问题(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试下载 2019 年的视频元数据.每次运行我的代码时都会超出配额限制.在那段时间里,我有不到 100 个视频.谁能告诉我一种更好的编写代码的方法?

I try to download the video metadata for year 2019. It exceeds the quota limit every time I run my codes. I have under 100 videos for that period of time. Can anyone show me a better way to write the codes?

   try: 
    request = youtube.search().list(
        part = 'id, snippet',
        type = 'video',
        publishedAfter = '2018-12-31T23:59:59Z',
        publishedBefore = '2020-01-01T00:00:00Z',
        order = 'date',
        fields = 'nextPageToken,items(id,snippet)',
        pageToken = None,
        maxResults = 50
    )
    response = request.execute()
    nextPageToken = None

    while True:
        request = youtube.search().list(
        pageToken = nextPageToken,
        part = 'id, snippet',
        type = 'video',
        fields = 'nextPageToken,items(id,snippet)',
        maxResults = 50
        )

        response = request.execute()
        nextPageToken = response['nextPageToken']
        items = response['items']
        if response['nextPageToken'] == None:
            break
        for each_item in items:
            video_id = each_item['id']['videoId']
            sub_items = each_item['snippet']
            for sub_item in sub_items:
                video_item[sub_item] = sub_items[sub_item ]

            video_data[video_id] = video_item
except Exception as e:
    print('Error in get_video_data: {0}'.format(e))

谢谢!

推荐答案

请确认您对 Search.list 端点正在针对那一年的整个 YouTube 视频集运行;您的 API 调用未指定任何其他过滤条件,这意味着您的查询(在分页时)可能会返回数百万个视频条目.

Please acknowledge that your API call to the Search.list endpoint is running against the whole set of YouTube videos of that one year period; your API call doesn't specify any other filtering criteria, which means that your query (upon pagination) would potentially return millions of video entries.

如果实际上您正在寻找自己的视频,那么您的 Search.list 端点调用应包括 forMinechannelId 请求参数:

If in fact you're looking for your own videos, then your Search.list endpoint call should include either the forMine or the channelId request parameters:

  • 当您从 discovery.build 方法使用其参数 credentials(即您发出授权请求),然后使用请求参数 forMine 如下图:
  • when you've constructed your youtube object from the discovery.build method using its parameter credentials (that is you're issuing an authorized request), then use the request parameter forMine as shown below:
request = youtube.search().list(
    forMine = True,
    part = 'id,snippet',
    type = 'video',
    publishedAfter = '2018-12-31T23:59:59Z',
    publishedBefore = '2020-01-01T00:00:00Z',
    order = 'date',
    fields = 'nextPageToken,items(id,snippet)',
    maxResults = 50
)

请注意,根据下面更新和修复部分中记录的调查结果,此替代方案被证明是不可行的.

  • 当您从 discovery.build 方法使用它的参数 developerKey(也就是说你不是 发出授权请求),然后使用请求参数 channelId如下图:
  • when you've constructed your youtube object from the discovery.build method using its parameter developerKey (that is you're not issuing an authorized request), then use the request parameter channelId as shown below:
request = youtube.search().list(
    channelId = CHANNEL_ID,
    part = 'id,snippet',
    type = 'video',
    publishedAfter = '2018-12-31T23:59:59Z',
    publishedBefore = '2020-01-01T00:00:00Z',
    order = 'date',
    fields = 'nextPageToken,items(id,snippet)',
    maxResults = 50
)

请注意,CHANNEL_ID 是您的频道(或与此相关的任何其他频道)的 ID.

Note that CHANNEL_ID is the ID of your channel (or any other channel for that matter).

上述两种 API 调用的区别如下:在发出授权请求(上面的第一个项目符号)时,您将获得您频道的所有视频,包括非公开的(即具有他们的 privacyStatus 设置为 privateunlisted);另一方面,当使用 API 密钥(上面的第二个项目符号)时,您只会获得公共视频(即那些具有 privacyStatus 设置为 public),即使 CHANNEL_ID 是您自己频道的 ID.

The difference between the two kinds of API calls above is the following: when issuing an authorized request (first bullet above), you'll get all videos of your channel, including those that are non-public (i.e. those that have their privacyStatus set to private or unlisted); on the other hand, when using an API key (the second bullet above), you'll get only the public videos (i.e. those that have their privacyStatus set to public), even if CHANNEL_ID is the ID of your own channel.

现在,不幸的是,您上面的代码有另一个问题:您的两个 Search.list 端点调用不相同,取模 pageToken 请求参数.那是因为第二次调用没有得到请求参数 publishedAfterpublishedBefore.

Now, unfortunately, your code above has another issue: your two Search.list endpoint calls are not identical, modulo the pageToken request parameter. That's because the second call does not get the request parameters publishedAfter and publishedBefore.

这种差异意味着您没有正确地对第一个 API 调用的结果集进行分页(实际上,即使将参数 pageToken 传递给第二个 API 调用).

This difference implies that you're not paginating correctly the result set of your first API call (indeed, even if passing the parameter pageToken to the second API call).

幸运的是,您使用的适用于 Python 的 Google API 客户端库实现了 API 结果集分页以简单的pythonic方式(我将在上面第二个项目符号的情况下举例说明):

Fortunately, the Google's APIs Client Library for Python that you're using implements API result set pagination in a simple pythonic way (I'll exemplify below the case of the second bullet above):

request = youtube.search().list(
    channelId = CHANNEL_ID,
    part = 'id,snippet',
    type = 'video',
    publishedAfter = '2018-12-31T23:59:59Z',
    publishedBefore = '2020-01-01T00:00:00Z',
    order = 'date',
    fields = 'nextPageToken,items(id,snippet)',
    maxResults = 50
)
video_data = {}

while request:
    response = request.execute()

    for item in response['items']:
        video_id = item['id']['videoId']
        video_item = item['snippet']
        video_data[video_id] = video_item

    request = youtube.search().list_next(
        request, response)

上面的代码表明没有必要完全重复第一个 API 调用,只需添加一个 pageToken 参数;有更简单的语句就足够了:

The code above shows that is not necessary to repeat the first API call in its entirety, with an added pageToken parameter; suffices to have the simpler statement:

    request = youtube.search().list_next(
        request, response)

该语句使用 response 对象的 nextPageToken 属性的值来从旧的 request 对象构造一个具有正确设置 pageToken 属性.

This statement uses the value of the nextPageToken property of the response object for to construct from the old request object a new one having a properly set pageToken property.

更新和修复

根据对带有请求参数 forMinepublishedAfterpublishedBefore 的 Search.list 调用的进一步测试和调查 如上,我得出以下结论:

Upon further tests and investigations with respect to the invocation of Search.list with the request parameters forMine, publishedAfter and publishedBefore as above, I came to the following conclusion:

  • 在没有任何参数publishedAfterpublishedBefore 的情况下给出的参数forMine=True 使API 调用按预期工作;

  • the parameter forMine=True given without any of the parameters publishedAfter and publishedBefore makes the API call to work as expected;

参数 forMine=True 与任何参数 publishedAfterpublishedBefore 一起给出,或者两者都产生 HTTP 错误 400 Bad Request 以及 JSON 错误响应:

the parameter forMine=True given along with any of the parameters publishedAfter and publishedBefore or with both produces the HTTP error 400 Bad Request along with the JSON error response:

{
  "error": {
    "code": 400,
    "message": "Request contains an invalid argument.",
    "errors": [
      {
        "message": "Request contains an invalid argument.",
        "domain": "global",
        "reason": "badRequest"
      }
    ],
    "status": "INVALID_ARGUMENT"
  }
}

Google 自己的问题跟踪器记录了最近的错误报告,其中准确描述了上述行为.Google 员工的官方回复如下:

Google's own issue tracker records a very recent bug report that describes precisely the behavior above. The official response from Google's staff was the following:

状态:无法修复(预期行为)

这是按预期工作的.基本上,如果是 for_content_owner 请求,您只能设置其中一个资源过滤器,但频道 ID 和发布后都是资源过滤器.开发者网站上好像没有规定这个要求:https://developer.google.com/youtube/v3/docs/search/list.

This is working as intended. Basically you can only set one of the resource filters if it's a for_content_owner request, but both channel ID and published after are resource filters. This requirement doesn't seem to be specified on the developer website: https://developers.google.com/youtube/v3/docs/search/list.

这篇关于Youtube 数据 Api 页面令牌问题(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆