使用YouTube Data API时如何避免视频信息获取的遗漏? [英] How to avoid omissions in video information acquisition when using the YouTube Data API?

查看:36
本文介绍了使用YouTube Data API时如何避免视频信息获取的遗漏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设/我想要达到的目标

我想使用YouTube Data API V3 获取视频ID,没有任何遗漏,并找出问题的原因是在代码中还是在YouTube(API 端)的视频设置中.

I want to use YouTube Data API V3 to get the video ID without any omissions, and find out if the cause of the trouble is in the code or in the video settings of YouTube (API side).

问题

以下代码用于从YouTube Data API获取视频信息,但是我得到的ID数量与实际发布的视频数量不匹配.

The following code is used to get the video information from YouTube Data API, but the number of IDs I got did not match the number of videos that are actually posted.

from apiclient.discovery 
import build
id = "UCD-miitqNY3nyukJ4Fnf4_A" #sampleID

token_check = None
nextPageToken = None
id_info = []

while True:
    if token_check != None:
        nextPageToken = token_check

    Search_Video = youtube.search().list(
        part = "id",
        channelId = id,
        maxResults = 50,
        order = 'date',
        safeSearch = "none",
        pageToken = nextPageToken
    ).execute()

    for ID_check in Search_Video.get("items", []):
        if ID_check["id"]["kind"] == "youtube#video":
            id_info.append(ID_check["id"]["videoId"])

    try:
        token_check = Search_Video["nextPageToken"]
    except:
        print(len(id_info)) #check number of IDs
        break

我还使用了YouTube Data API函数来获取频道的videoCount信息,发现videoCount的值与通过获取的ID数量不匹配上面的代码,这就是我发布这个的原因.

I also used the YouTube Data API function to get the videoCount information of the channel, and noticed that the value of videoCount did not match the number of IDs obtained by the code above, which is why I posted this.

根据 channels() API,这个频道有 440 个视频,但上面的代码只有 412 个视频(JST 上午 10:30).

According to channels() API, this channel have 440 videos, but the above code gets only 412 videos (at 10:30 a.m. JST).

补充信息

・Python 3.9.0

・Python 3.9.0

・YouTube 数据 API v3

・YouTube Data API v3

推荐答案

您必须承认 Search.list API 端点没有清晰的行为.这意味着您不应该期望从中获得精确的结果.Google 并没有记录这种行为,但该论坛上有很多用户发的帖子.

You have to acknowledge that the Search.list API endpoint does not have a crisp behavior. That means you should not expect precise results from it. Google does not document this behavior as such, but this forum has many posts from users experiencing that.

如果您想获取给定频道上传的所有视频的 ID,则应采用以下两步程序:

If you want to obtain all the IDs of videos uploaded by a given channel then you should employ the following two-step procedure:

第一步:获取频道上传播放列表的ID.

调用 Channels.list API 端点,使用其请求参数查询 id 设置为您感兴趣的频道的 ID(或者,使用其请求参数 mine 设置为 true) 以获取该频道的上传播放列表 ID,contentDetails.relatedPlaylists.uploads.

Invoke the Channels.list API endpoint, queried with its request parameter id set to the ID of the channel of your interest (or, otherwise, with its request parameter mine set to true) for to obtain that channel's uploads playlist ID, contentDetails.relatedPlaylists.uploads.

def get_channel_uploads_playlist_id(youtube, channel_id):
    response = youtube.channels().list(
        fields = 'items/contentDetails/relatedPlaylists/uploads',
        part = 'contentDetails',
        id = channel_id,
        maxResults = 1
    ).execute()

    items = response.get('items')
    if items:
        return items[0] \
            ['contentDetails'] \
            ['relatedPlaylists'] \
            .get('uploads')
    else:
        return None

请注意函数get_channel_uploads_playlist_id 应该只调用一次以获取上传的播放列表给定频道的 ID;随后根据需要多次使用该 ID.

Do note that the function get_channel_uploads_playlist_id should only be called once for to obtain the uploads playlist ID of a given channel; subsequently use that ID as many times as needed.

第 2 步:检索播放列表的所有视频 ID.

调用 PlaylistItems.list API 端点,使用其请求参数查询 playlistId 设置为从get_channel_uploads_playlist_id获取的ID:

def get_playlist_video_ids(youtube, playlist_id):
    request = youtube.playlistItems().list(
        fields = 'nextPageToken,items/snippet/resourceId',
        playlistId = playlist_id,
        part = 'snippet',
        maxResults = 50
    )
    videos = []

    is_video = lambda item: \
        item['snippet']['resourceId']['kind'] == 'youtube#video'
    video_id = lambda item: \
        item['snippet']['resourceId']['videoId']

    while request:
        response = request.execute()

        items = response.get('items', [])
        assert len(items) <= 50

        videos.extend(map(video_id, filter(is_video, items)))

        request = youtube.playlistItems().list_next(
            request, response)

    return videos

请注意,在使用 Google 的 Python API 客户端库时(正如您所做的那样),API 结果集分页 非常简单:只需使用对应于各个分页 API 端点的 Python API 对象的 list_next 方法(如上所示):

Do note that, when using the Google's APIs Client Library for Python (as you do), API result set pagination is trivially simple: just use the list_next method of the Python API object corresponding to the respective paginated API endpoint (as was shown above):

request = API_OBJECT.list(...)

while request:
    response = request.execute()
    ...
    request = API_OBJECT.list_next(
        request, response)

另请注意,上面我使用了两次 fields 请求参数.这是一个很好的做法:仅从 API 询问实际使用的信息.

Also note that above I used twice the fields request parameter. This is good practice: ask from the API only the info that is of actual use.

还有一个重要的注意事项:当使用 API 密钥调用时,PlaylistItems.list 端点不会返回与频道的私有视频相对应的项目.当您的 youtube 对象是通过调用函数 apiclient.discovery.build 构造它时,将参数 developerKey 传递给它时会发生这种情况.

Yet an important note: the PlaylistItems.list endpoint would not return items that correspond to private videos of a channel when invoked with an API key. This happens when your youtube object was constructed by calling the function apiclient.discovery.build upon passing to it the parameter developerKey.

PlaylistItems.list 仅向频道所有者返回与私人视频对应的项目.这发生在 youtube 对象是通过调用函数 apiclient.discovery.build 在向它传递参数 credentials 并且如果 凭据指的是拥有相应播放列表的频道.

PlaylistItems.list returns items corresponding to private videos only to the channel owner. This happens when the youtube object is constructed by calling the function apiclient.discovery.build upon passing to it the parameter credentials and if credentials refer to the channel that owns the respective playlist.

另外一个重要说明:根据 Google 工作人员,上限为 20000在查询给定频道的上传播放列表时,按设计设置通过 PlaylistItems.list 端点返回的项目数.这是不幸的,但却是事实.

An additional important note: according to Google staff, there's an upper 20000 limit set by design for the number of items returned via PlaylistItems.list endpoint when queried for a given channel's uploads playlist. This is unfortunate, but a fact.

这篇关于使用YouTube Data API时如何避免视频信息获取的遗漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆