Youtube 数据 Api 页面令牌问题(python) [英] Youtube Data Api Page Token Question (python)
问题描述
我尝试下载 2019 年的视频元数据.每次运行我的代码时都会超出配额限制.在那段时间里,我有不到 100 个视频.谁能告诉我一种更好的编写代码的方法?
I try to download the video metadata for year 2019. It exceeds the quota limit every time I run my codes. I have under 100 videos for that period of time. Can anyone show me a better way to write the codes?
try:
request = youtube.search().list(
part = 'id, snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
pageToken = None,
maxResults = 50
)
response = request.execute()
nextPageToken = None
while True:
request = youtube.search().list(
pageToken = nextPageToken,
part = 'id, snippet',
type = 'video',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
response = request.execute()
nextPageToken = response['nextPageToken']
items = response['items']
if response['nextPageToken'] == None:
break
for each_item in items:
video_id = each_item['id']['videoId']
sub_items = each_item['snippet']
for sub_item in sub_items:
video_item[sub_item] = sub_items[sub_item ]
video_data[video_id] = video_item
except Exception as e:
print('Error in get_video_data: {0}'.format(e))
谢谢!
推荐答案
请确认您对 Search.list
端点正在针对那一年的整个 YouTube 视频集运行;您的 API 调用未指定任何其他过滤条件,这意味着您的查询(在分页时)可能会返回数百万个视频条目.
Please acknowledge that your API call to the Search.list
endpoint is running against the whole set of YouTube videos of that one year period; your API call doesn't specify any other filtering criteria, which means that your query (upon pagination) would potentially return millions of video entries.
如果实际上您正在寻找自己的视频,那么您的 Search.list
端点调用应包括 forMine
或 channelId
请求参数:
If in fact you're looking for your own videos, then your Search.list
endpoint call should include either the forMine
or the channelId
request parameters:
- 当您从
discovery.build
方法使用其参数credentials
(即您发出授权请求),然后使用请求参数forMine
如下图:
- when you've constructed your
youtube
object from thediscovery.build
method using its parametercredentials
(that is you're issuing an authorized request), then use the request parameterforMine
as shown below:
request = youtube.search().list(
forMine = True,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,根据下面更新和修复部分中记录的调查结果,此替代方案被证明是不可行的.
- 当您从
discovery.build
方法使用它的参数developerKey
(也就是说你不是 发出授权请求),然后使用请求参数channelId
如下图:
- when you've constructed your
youtube
object from thediscovery.build
method using its parameterdeveloperKey
(that is you're not issuing an authorized request), then use the request parameterchannelId
as shown below:
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
请注意,CHANNEL_ID
是您的频道(或与此相关的任何其他频道)的 ID.
Note that CHANNEL_ID
is the ID of your channel (or any other channel for that matter).
上述两种 API 调用的区别如下:在发出授权请求(上面的第一个项目符号)时,您将获得您频道的所有视频,包括非公开的(即具有他们的 privacyStatus
设置为 private
或 unlisted
);另一方面,当使用 API 密钥(上面的第二个项目符号)时,您只会获得公共视频(即那些具有 privacyStatus
设置为 public
),即使 CHANNEL_ID
是您自己频道的 ID.
The difference between the two kinds of API calls above is the following: when issuing an authorized request (first bullet above), you'll get all videos of your channel, including those that are non-public (i.e. those that have their privacyStatus
set to private
or unlisted
); on the other hand, when using an API key (the second bullet above), you'll get only the public videos (i.e. those that have their privacyStatus
set to public
), even if CHANNEL_ID
is the ID of your own channel.
现在,不幸的是,您上面的代码有另一个问题:您的两个 Search.list
端点调用不相同,取模 pageToken
请求参数.那是因为第二次调用没有得到请求参数 publishedAfter
和 publishedBefore
.
Now, unfortunately, your code above has another issue: your two Search.list
endpoint calls are not identical, modulo the pageToken
request parameter. That's because the second call does not get the request parameters publishedAfter
and publishedBefore
.
这种差异意味着您没有正确地对第一个 API 调用的结果集进行分页(实际上,即使将参数 pageToken
传递给第二个 API 调用).
This difference implies that you're not paginating correctly the result set of your first API call (indeed, even if passing the parameter pageToken
to the second API call).
幸运的是,您使用的适用于 Python 的 Google API 客户端库实现了 API 结果集分页以简单的pythonic方式(我将在上面第二个项目符号的情况下举例说明):
Fortunately, the Google's APIs Client Library for Python that you're using implements API result set pagination in a simple pythonic way (I'll exemplify below the case of the second bullet above):
request = youtube.search().list(
channelId = CHANNEL_ID,
part = 'id,snippet',
type = 'video',
publishedAfter = '2018-12-31T23:59:59Z',
publishedBefore = '2020-01-01T00:00:00Z',
order = 'date',
fields = 'nextPageToken,items(id,snippet)',
maxResults = 50
)
video_data = {}
while request:
response = request.execute()
for item in response['items']:
video_id = item['id']['videoId']
video_item = item['snippet']
video_data[video_id] = video_item
request = youtube.search().list_next(
request, response)
上面的代码表明没有必要完全重复第一个 API 调用,只需添加一个 pageToken
参数;有更简单的语句就足够了:
The code above shows that is not necessary to repeat the first API call in its entirety, with an added pageToken
parameter; suffices to have the simpler statement:
request = youtube.search().list_next(
request, response)
该语句使用 response
对象的 nextPageToken
属性的值来从旧的 request
对象构造一个具有正确设置 pageToken
属性.
This statement uses the value of the nextPageToken
property of the response
object for to construct from the old request
object a new one having a properly set pageToken
property.
更新和修复
根据对带有请求参数 forMine
、publishedAfter
和 publishedBefore 的
如上,我得出以下结论:Search.list
调用的进一步测试和调查
Upon further tests and investigations with respect to the invocation of Search.list
with the request parameters forMine
, publishedAfter
and publishedBefore
as above, I came to the following conclusion:
在没有任何参数
publishedAfter
和publishedBefore
的情况下给出的参数forMine=True
使API 调用按预期工作;
the parameter
forMine=True
given without any of the parameterspublishedAfter
andpublishedBefore
makes the API call to work as expected;
参数 forMine=True
与任何参数 publishedAfter
和 publishedBefore
一起给出,或者两者都产生 HTTP 错误 400 Bad Request
以及 JSON 错误响应:
the parameter forMine=True
given along with any of the parameters publishedAfter
and publishedBefore
or with both produces the HTTP error 400 Bad Request
along with the JSON error response:
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"errors": [
{
"message": "Request contains an invalid argument.",
"domain": "global",
"reason": "badRequest"
}
],
"status": "INVALID_ARGUMENT"
}
}
Google 自己的问题跟踪器记录了最近的错误报告,其中准确描述了上述行为.Google 员工的官方回复如下:
Google's own issue tracker records a very recent bug report that describes precisely the behavior above. The official response from Google's staff was the following:
状态:无法修复(预期行为)
这是按预期工作的.基本上,如果是 for_content_owner 请求,您只能设置其中一个资源过滤器,但频道 ID 和发布后都是资源过滤器.开发者网站上好像没有规定这个要求:https://developer.google.com/youtube/v3/docs/search/list.
This is working as intended. Basically you can only set one of the resource filters if it's a for_content_owner request, but both channel ID and published after are resource filters. This requirement doesn't seem to be specified on the developer website: https://developers.google.com/youtube/v3/docs/search/list.
这篇关于Youtube 数据 Api 页面令牌问题(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!