使用YouTube Data API时如何避免视频信息获取的遗漏? [英] How to avoid omissions in video information acquisition when using the YouTube Data API?
问题描述
假设/我想要达到的目标
我想使用YouTube Data API V3 获取视频ID,没有任何遗漏,并找出问题的原因是在代码中还是在YouTube(API 端)的视频设置中.
I want to use YouTube Data API V3 to get the video ID without any omissions, and find out if the cause of the trouble is in the code or in the video settings of YouTube (API side).
问题
以下代码用于从YouTube Data API获取视频信息,但是我得到的ID数量与实际发布的视频数量不匹配.
The following code is used to get the video information from YouTube Data API, but the number of IDs I got did not match the number of videos that are actually posted.
from apiclient.discovery
import build
id = "UCD-miitqNY3nyukJ4Fnf4_A" #sampleID
token_check = None
nextPageToken = None
id_info = []
while True:
if token_check != None:
nextPageToken = token_check
Search_Video = youtube.search().list(
part = "id",
channelId = id,
maxResults = 50,
order = 'date',
safeSearch = "none",
pageToken = nextPageToken
).execute()
for ID_check in Search_Video.get("items", []):
if ID_check["id"]["kind"] == "youtube#video":
id_info.append(ID_check["id"]["videoId"])
try:
token_check = Search_Video["nextPageToken"]
except:
print(len(id_info)) #check number of IDs
break
我还使用了YouTube Data API函数来获取频道的videoCount
信息,发现videoCount
的值与通过获取的ID数量不匹配上面的代码,这就是我发布这个的原因.
I also used the YouTube Data API function to get the videoCount
information of the channel, and noticed that the value of videoCount
did not match the number of IDs obtained by the code above, which is why I posted this.
根据 channels()
API,这个频道有 440 个视频,但上面的代码只有 412 个视频(JST 上午 10:30).
According to channels()
API, this channel have 440 videos, but the above code gets only 412 videos (at 10:30 a.m. JST).
补充信息
・Python 3.9.0
・Python 3.9.0
・YouTube 数据 API v3
・YouTube Data API v3
推荐答案
您必须承认 Search.list
API 端点没有清晰的行为.这意味着您不应该期望从中获得精确的结果.Google 并没有记录这种行为,但该论坛上有很多用户发的帖子.
You have to acknowledge that the Search.list
API endpoint does not have a crisp behavior. That means you should not expect precise results from it. Google does not document this behavior as such, but this forum has many posts from users experiencing that.
如果您想获取给定频道上传的所有视频的 ID,则应采用以下两步程序:
If you want to obtain all the IDs of videos uploaded by a given channel then you should employ the following two-step procedure:
第一步:获取频道上传播放列表的ID.
调用 Channels.list
API 端点,使用其请求参数查询 id
设置为您感兴趣的频道的 ID(或者,使用其请求参数 mine
设置为 true
) 以获取该频道的上传播放列表 ID,contentDetails.relatedPlaylists.uploads
.
Invoke the Channels.list
API endpoint, queried with its request parameter id
set to the ID of the channel of your interest (or, otherwise, with its request parameter mine
set to true
) for to obtain that channel's uploads playlist ID, contentDetails.relatedPlaylists.uploads
.
def get_channel_uploads_playlist_id(youtube, channel_id):
response = youtube.channels().list(
fields = 'items/contentDetails/relatedPlaylists/uploads',
part = 'contentDetails',
id = channel_id,
maxResults = 1
).execute()
items = response.get('items')
if items:
return items[0] \
['contentDetails'] \
['relatedPlaylists'] \
.get('uploads')
else:
return None
请注意函数get_channel_uploads_playlist_id
应该只调用一次以获取上传的播放列表给定频道的 ID;随后根据需要多次使用该 ID.
Do note that the function get_channel_uploads_playlist_id
should only be called once for to obtain the uploads playlist
ID of a given channel; subsequently use that ID as many times as needed.
第 2 步:检索播放列表的所有视频 ID.
调用 PlaylistItems.list
API 端点,使用其请求参数查询 playlistId
设置为从get_channel_uploads_playlist_id
获取的ID:
def get_playlist_video_ids(youtube, playlist_id):
request = youtube.playlistItems().list(
fields = 'nextPageToken,items/snippet/resourceId',
playlistId = playlist_id,
part = 'snippet',
maxResults = 50
)
videos = []
is_video = lambda item: \
item['snippet']['resourceId']['kind'] == 'youtube#video'
video_id = lambda item: \
item['snippet']['resourceId']['videoId']
while request:
response = request.execute()
items = response.get('items', [])
assert len(items) <= 50
videos.extend(map(video_id, filter(is_video, items)))
request = youtube.playlistItems().list_next(
request, response)
return videos
请注意,在使用 Google 的 Python API 客户端库时(正如您所做的那样),API 结果集分页 非常简单:只需使用对应于各个分页 API 端点的 Python API 对象的 list_next
方法(如上所示):
Do note that, when using the Google's APIs Client Library for Python (as you do), API result set pagination is trivially simple: just use the list_next
method of the Python API object corresponding to the respective paginated API endpoint (as was shown above):
request = API_OBJECT.list(...)
while request:
response = request.execute()
...
request = API_OBJECT.list_next(
request, response)
另请注意,上面我使用了两次 fields
请求参数.这是一个很好的做法:仅从 API 询问实际使用的信息.
Also note that above I used twice the fields
request parameter. This is good practice: ask from the API only the info that is of actual use.
还有一个重要的注意事项:当使用 API 密钥调用时,PlaylistItems.list
端点不会返回与频道的私有视频相对应的项目.当您的 youtube
对象是通过调用函数 apiclient.discovery.build
构造它时,将参数 developerKey
传递给它时会发生这种情况.
Yet an important note: the PlaylistItems.list
endpoint would not return items that correspond to private videos of a channel when invoked with an API key. This happens when your youtube
object was constructed by calling the function apiclient.discovery.build
upon passing to it the parameter developerKey
.
PlaylistItems.list
仅向频道所有者返回与私人视频对应的项目.这发生在 youtube
对象是通过调用函数 apiclient.discovery.build
在向它传递参数 credentials
并且如果 凭据
指的是拥有相应播放列表的频道.
PlaylistItems.list
returns items corresponding to private videos only to the channel owner. This happens when the youtube
object is constructed by calling the function apiclient.discovery.build
upon passing to it the parameter credentials
and if credentials
refer to the channel that owns the respective playlist.
另外一个重要说明:根据 Google 工作人员,上限为 20000在查询给定频道的上传播放列表时,按设计设置通过 PlaylistItems.list
端点返回的项目数.这是不幸的,但却是事实.
An additional important note: according to Google staff, there's an upper 20000 limit set by design for the number of items returned via PlaylistItems.list
endpoint when queried for a given channel's uploads playlist. This is unfortunate, but a fact.
这篇关于使用YouTube Data API时如何避免视频信息获取的遗漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!