获取YouTube视频列表的评论数 [英] Obtaining the number of comments of a list of youtube videos

查看:102
本文介绍了获取YouTube视频列表的评论数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在编写一个简单的python脚本,以获取视频列表的观看次数和评论次数.使用csv,我已将制表符分隔的表转换为列表列表,然后尝试获取这两个元素.检查视图数,该元素为"div", {"class":"watch-view-count"}.它按预期工作

I have been coding a simple python script for obtaining number of views and number of comments of a list of videos. Using csv, I have converted a tab-separated table into a list of lists, and then I tried to obtain both elements. Inspecting number of views, the element is "div", {"class":"watch-view-count"}. It works as intended

r = requests.get(list_youtube_reading[n][0]) # it retrieves each video URL from a csv
soup = BeautifulSoup(r.text)
for element in soup.findAll("div", {"class":"watch-view-count"}): 
    patternviews = re.compile('^(.*?) .*') 
    scissorviews = patternviews.match(element.text.encode("utf-8")) 
    views = re.sub('\.','', tijeraviews.group(1))

但是,评论数的元素是 <h2 class="comment-section-header-renderer" tabindex="0"> <b>Comments</b> " • 6" <span class="alternate-content-link"></span>
</h2>

However, element for number of comments is <h2 class="comment-section-header-renderer" tabindex="0"> <b>Comments</b> " • 6" <span class="alternate-content-link"></span>
</h2>

当我尝试获取它时,

for element in soup.findAll("h2", {"class":"comment-section-header-renderer"}):
    comments = element.text.encode("utf-8")
    print comments

什么也没有发生,实际上soup不包含任何<h2 class="comment-section-header-renderer" tabindex="0">标签

nothing happens, and actually soupdoesn't contain any <h2 class="comment-section-header-renderer" tabindex="0"> tag

我该怎么做才能检索评论数?我尝试使用youtube v3数据API,但无济于事

What can I do in order to retrieve number of comments? I tried to use youtube v3 data API, but for no avail

预先感谢

推荐答案

一种简单的方法是使用Selenium WebDriver来模拟Web浏览器.我观察到,当我们向下滚动时,只有YouTube会加载评论部分.因此,我的解决方案是使Web驱动程序向下滚动并等待,直到找到所需的元素.找到它之后,以下脚本将其抓取并获取值.

One simple way would be using the Selenium WebDriver to simulate a web browser. I have observed that when we scroll down, only then YouTube loads the comments section. So my solution is to make the web-driver to scroll down and wait until the desired element is found. After it has been located, the following script grabs it and gets the value.

要使用Selenium,我们需要从此页面下载第三方驱动程序之一.我已经使用了Mozilla GeckoDriver.并且我们还需要将该可执行文件的路径放入系统环境变量中.就像我在Ubuntu计算机上一样,我将下载的文件(解压缩后)放在/usr/local/bin/中,不需要任何其他功能.正确设置路径后,我们可以运行以下脚本来获取所需的值. 而且我们还需要安装Selenium本身.指示位于此处.

For using Selenium, we need to download one of the third party drivers from this page. I have used the Mozilla GeckoDriver. And we also need to put the path to this executable file in the system environment variables. As I am on an Ubuntu machine, I put the downloaded file (after extracting it) in /usr/local/bin/and I didn't need anything more. After setting the path properly, we can run the following script to get our desired values. And we also do need to install Selenium itself. The instructions are here.

# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

video_url = 'https://www.youtube.com/watch?v=NP189MPfR7Q'
driver = webdriver.Firefox()
driver.set_page_load_timeout(30)
driver.get(video_url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
for view_num in driver.find_elements_by_class_name("watch-view-count"):
    print 'Number of views: ' + view_num.text.replace(' views', '')

try:
    element = WebDriverWait(driver, 30).until(
        EC.presence_of_element_located((By.CLASS_NAME, "comment-section-header-renderer")))
    for comment_num in driver.find_elements_by_class_name("comment-section-header-renderer"):
        print u'Number of comments: ' + comment_num.text.replace(u'COMMENTS • ', '')
finally:
    driver.quit()

输出:

Number of views: 3,555
Number of comments: 3

注意 由于DOM元素(包含注释计数)内部包含一些NON-ASCII字符,因此我需要在脚本的第一行放置

NOTE Since the DOM element (that contains the comment-count) has some NON-ASCII character inside, I needed to put the very first line of the script.

如果您不喜欢Selenium来显示GUI,请遵循

And if you don't like Selenium to show the GUI, follow these instructions. I did not do this but the instructions should be enough.

这篇关于获取YouTube视频列表的评论数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆