无法在python中使用youtube API v3下载视频字幕 [英] Can't download video captions using youtube API v3 in python

查看:118
本文介绍了无法在python中使用youtube API v3下载视频字幕的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为此公共youtube视频下载隐藏字幕(仅用于测试) https://www.youtube.com/watch?v=Txvud7wPbv4

I am trying to download closed captions for this public youtube video (just for testing) https://www.youtube.com/watch?v=Txvud7wPbv4

我正在使用下面从此链接中获得的代码示例(captions.py)

I am using the code sample(captions.py) below that i got from this link https://developers.google.com/youtube/v3/docs/captions/download

我已经将client-secrets.json(oauth2 authentification)和youtube-v3-api-captions.json存储在同一目录中(如示例代码所示)

I have already stored the client-secrets.json(oauth2 authentification) and youtube-v3-api-captions.json in the same directory (asked in the sample code)

我将此代码行放在cmd中:python captions.py --videoid ='Txvud7wPbv4'--action ='download'

I put this code line in cmd : python captions.py --videoid='Txvud7wPbv4' --action='download'

我收到此错误: 我不知道为什么它无法识别此公共视频的视频ID.

I get this error: I don't know why it doesn't recognise the video id of this public video.

有人遇到过类似的问题吗?

Anyone had the a similar issue ?

谢谢大家.

代码示例:

# Usage example:
# python captions.py --videoid='<video_id>' --name='<name>' --file='<file>' --language='<language>' --action='action'

import httplib2
import os
import sys

from apiclient.discovery import build_from_document
from apiclient.errors import HttpError
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client.tools import argparser, run_flow


# The CLIENT_SECRETS_FILE variable specifies the name of a file that contains

# the OAuth 2.0 information for this application, including its client_id and
# client_secret. You can acquire an OAuth 2.0 client ID and client secret from
# the {{ Google Cloud Console }} at
# {{ https://cloud.google.com/console }}.
# Please ensure that you have enabled the YouTube Data API for your project.
# For more information about using OAuth2 to access the YouTube Data API, see:
#   https://developers.google.com/youtube/v3/guides/authentication
# For more information about the client_secrets.json file format, see:
#   https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
CLIENT_SECRETS_FILE = "client_secrets.json"

# This OAuth 2.0 access scope allows for full read/write access to the
# authenticated user's account and requires requests to use an SSL connection.
YOUTUBE_READ_WRITE_SSL_SCOPE = "https://www.googleapis.com/auth/youtube.force-ssl"
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"

# This variable defines a message to display if the CLIENT_SECRETS_FILE is
# missing.
MISSING_CLIENT_SECRETS_MESSAGE = """
WARNING: Please configure OAuth 2.0

To make this sample run you will need to populate the client_secrets.json file
found at:
   %s
with information from the APIs Console
https://console.developers.google.com

For more information about the client_secrets.json file format, please visit:
https://developers.google.com/api-client-library/python/guide/aaa_client_secrets
""" % os.path.abspath(os.path.join(os.path.dirname(__file__),
                                   CLIENT_SECRETS_FILE))

# Authorize the request and store authorization credentials.
def get_authenticated_service(args):
  flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, scope=YOUTUBE_READ_WRITE_SSL_SCOPE,
    message=MISSING_CLIENT_SECRETS_MESSAGE)

  storage = Storage("%s-oauth2.json" % sys.argv[0])
  credentials = storage.get()

  if credentials is None or credentials.invalid:
    credentials = run_flow(flow, storage, args)

  # Trusted testers can download this discovery document from the developers page
  # and it should be in the same directory with the code.
  with open("youtube-v3-api-captions.json", "r") as f:
    doc = f.read()
    return build_from_document(doc, http=credentials.authorize(httplib2.Http()))


# Call the API's captions.list method to list the existing caption tracks.
def list_captions(youtube, video_id):
  results = youtube.captions().list(
    part="snippet",
    videoId=video_id
  ).execute()

  for item in results["items"]:
    id = item["id"]
    name = item["snippet"]["name"]
    language = item["snippet"]["language"]
    print "Caption track '%s(%s)' in '%s' language." % (name, id, language)

  return results["items"]


# Call the API's captions.insert method to upload a caption track in draft status.
def upload_caption(youtube, video_id, language, name, file):
  insert_result = youtube.captions().insert(
    part="snippet",
    body=dict(
      snippet=dict(
        videoId=video_id,
        language=language,
        name=name,
        isDraft=True
      )
    ),
    media_body=file
  ).execute()

  id = insert_result["id"]
  name = insert_result["snippet"]["name"]
  language = insert_result["snippet"]["language"]
  status = insert_result["snippet"]["status"]
  print "Uploaded caption track '%s(%s) in '%s' language, '%s' status." % (name,
      id, language, status)


# Call the API's captions.update method to update an existing caption track's draft status
# and publish it. If a new binary file is present, update the track with the file as well.
def update_caption(youtube, caption_id, file):
  update_result = youtube.captions().update(
    part="snippet",
    body=dict(
      id=caption_id,
      snippet=dict(
        isDraft=False
      )
    ),
    media_body=file
  ).execute()

  name = update_result["snippet"]["name"]
  isDraft = update_result["snippet"]["isDraft"]
  print "Updated caption track '%s' draft status to be: '%s'" % (name, isDraft)
  if file:
    print "and updated the track with the new uploaded file."


# Call the API's captions.download method to download an existing caption track.
def download_caption(youtube, caption_id, tfmt):
  subtitle = youtube.captions().download(
    id=caption_id,
    tfmt=tfmt
  ).execute()

  print "First line of caption track: %s" % (subtitle)

# Call the API's captions.delete method to delete an existing caption track.
def delete_caption(youtube, caption_id):
  youtube.captions().delete(
    id=caption_id
  ).execute()

  print "caption track '%s' deleted succesfully" % (caption_id)


if __name__ == "__main__":
  # The "videoid" option specifies the YouTube video ID that uniquely
  # identifies the video for which the caption track will be uploaded.
  argparser.add_argument("--videoid",
    help="Required; ID for video for which the caption track will be uploaded.")
  # The "name" option specifies the name of the caption trackto be used.
  argparser.add_argument("--name", help="Caption track name", default="YouTube for Developers")
  # The "file" option specifies the binary file to be uploaded as a caption track.
  argparser.add_argument("--file", help="Captions track file to upload")
  # The "language" option specifies the language of the caption track to be uploaded.
  argparser.add_argument("--language", help="Caption track language", default="en")
  # The "captionid" option specifies the ID of the caption track to be processed.
  argparser.add_argument("--captionid", help="Required; ID of the caption track to be processed")
  # The "action" option specifies the action to be processed.
  argparser.add_argument("--action", help="Action", default="all")


  args = argparser.parse_args()

  if (args.action in ('upload', 'list', 'all')):
    if not args.videoid:
          exit("Please specify videoid using the --videoid= parameter.")

  if (args.action in ('update', 'download', 'delete')):
    if not args.captionid:
          exit("Please specify captionid using the --captionid= parameter.")

  if (args.action in ('upload', 'all')):
    if not args.file:
      exit("Please specify a caption track file using the --file= parameter.")
    if not os.path.exists(args.file):
      exit("Please specify a valid file using the --file= parameter.")

  youtube = get_authenticated_service(args)
  try:
    if args.action == 'upload':
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
    elif args.action == 'list':
      list_captions(youtube, args.videoid)
    elif args.action == 'update':
      update_caption(youtube, args.captionid, args.file);
    elif args.action == 'download':
      download_caption(youtube, args.captionid, 'srt')
    elif args.action == 'delete':
      delete_caption(youtube, args.captionid);
    else:
      # All the available methods are used in sequence just for the sake of an example.
      upload_caption(youtube, args.videoid, args.language, args.name, args.file)
      captions = list_captions(youtube, args.videoid)

      if captions:
        first_caption_id = captions[0]['id'];
        update_caption(youtube, first_caption_id, None);
        download_caption(youtube, first_caption_id, 'srt')
        delete_caption(youtube, first_caption_id);
  except HttpError, e:
    print "An HTTP error %d occurred:\n%s" % (e.resp.status, e.content)
  else:
    print "Created and managed caption tracks."

推荐答案

您的应用似乎过于复杂...它的结构使其能够通过字幕来完成一切只需下载.这使得调试变得更加困难,因此我编写了一个仅下载字幕的精简版本(Python 2或3):

Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:

# Usage example: $ python captions-download.py Txvud7wPbv4

from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))

def process(vid):
    caption_info = YOUTUBE.captions().list(
            part='id', videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(
            id=caption_info[0]['id'], tfmt='srt').execute()
    caption_data = caption_str.split('\n\n')
    for line in caption_data:
        if line.count('\n') > 1:
            i, cap_time, caption = line.split('\n', 2)
            print('%02d) [%s] %s' % (
                    int(i), cap_time, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        VID = sys.argv[1]
    process(VID)

它的工作方式是这样的:

The way it works is this:

  1. 您传入视频ID(VID)作为唯一参数(sys.argv[1])
  2. 它使用该VID来查找字幕ID YOUTUBE.captions().list()
  3. 假设视频具有(至少)一个字幕轨道,我抓取其ID(caption_info[0]['id'])
  4. 然后,它使用该标题ID调用YOUTUBE.captions().download(),以请求srt
  1. You pass in the video ID (VID) as the only argument (sys.argv[1])
  2. It uses that VID to look up the caption IDs with YOUTUBE.captions().list()
  3. Assuming the video has (at least) one caption track, I grab its ID (caption_info[0]['id'])
  4. Then it calls YOUTUBE.captions().download() with that caption ID requesting the srt track format
  5. All individual captions are delimited by double NEWLINEs, so split on 'em
  6. Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only split() on the 1st pair
  7. Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces

运行它时,我得到了预期的结果……在我拥有的视频中:

When I run it, I get the expected result... here on a video I own:

$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
    :

事物的结合...

  1. 我认为您需要成为要为其下载字幕的视频的所有者.
  1. I think you need to be the owner of the video you're trying to download the captions for.
    • I tried my script on your video, and I get a 403 HTTP Forbidden error
    • Here are other errors you may get from the API
  • 它认为您正在给它<code></code>(请注意十六进制的0x3c和0x3e值)...富文本格式?
  • 无论如何,这就是为什么我编写了自己的较短版本的原因……所以我有一个更受控制的环境来进行实验.
  • It thinks you're giving it <code> and </code> (notice the hex 0x3c & 0x3e values)... rich text?
  • Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.

FWIW,由于您不熟悉使用Google API,因此我制作了一些介绍性视频,以使开发人员可以在

FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.

尽管我确实有一部 Google Apps脚本示例(播放列表中的视频22);如果您不熟悉Apps脚本,则需要查看JavaScript,然后先观看视频5.希望这会有所帮助!

I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!

这篇关于无法在python中使用youtube API v3下载视频字幕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆