如何使用Selenium在youtube中获得所有评论? [英] How to get all comments in youtube with selenium?

查看:70
本文介绍了如何使用Selenium在youtube中获得所有评论?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

网页显示有702条评论.

我写了一个函数 get_total_youtube_comments(url),很多代码是从github上的项目复制过来的.

The webpage shows that there are 702 Comments.
target youtube sample

I write a function get_total_youtube_comments(url) ,many codes copied from the project on github.

在github上的项目

def get_total_youtube_comments(url):
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import time
    options = webdriver.ChromeOptions()
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument("--headless")
    driver = webdriver.Chrome(options=options,executable_path='/usr/bin/chromedriver')
    wait = WebDriverWait(driver,60)
    driver.get(url)
    SCROLL_PAUSE_TIME = 2
    CYCLES = 7
    html = driver.find_element_by_tag_name('html')
    html.send_keys(Keys.PAGE_DOWN)   
    html.send_keys(Keys.PAGE_DOWN)   
    time.sleep(SCROLL_PAUSE_TIME * 3)
    for i in range(CYCLES):
        html.send_keys(Keys.END)
        time.sleep(SCROLL_PAUSE_TIME)
    comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
    all_comments = [elem.text for elem in comment_elems]
    return  all_comments

尝试解析示例网页 https://www.youtube.com/watch?v=N0lxfilGfak 上的所有评论.

Try to parse all comments on a sample webpage https://www.youtube.com/watch?v=N0lxfilGfak.

url='https://www.youtube.com/watch?v=N0lxfilGfak'
list = get_total_youtube_comments(url)

它可以得到一些评论,只有所有评论的一小部分.

It can get some comments ,only small party of all comments.

len(list)
60

60 远少于 702 ,如何在YouTube中使用硒获取所有评论?
@supputuri,我可以用您的代码提取所有注释.

60 is much less than 702,how to get all comments in youtube with selenium?
@supputuri,i can extract all comments with your code.

comments_list = driver.find_elements_by_xpath("//*[@id='content-text']")
len(comments_list)
709
print(driver.find_element_by_xpath("//h2[@id='count']").text)
717 Comments
comments_list[-1].text
'mistake at 23:11 \nin NOT it should return false if x is true.'
comments_list[0].text
'Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Course curriculum, Visit our Website:  Use code "YOUTUBE20" to get Flat 20% off on this training.'

为什么评论号是709,而不是页面上显示的717?

Why the comments number is 709 instead of 717 shown in page?

推荐答案

您收到的评论数量有限,因为当您向下滚动时YouTube会加载这些评论.该视频上还有大约394条评论,您必须首先确保所有评论都已加载,然后再展开所有 View Replies ,这样您才能达到评论总数上限.

You are getting a limited number of comments as YouTube will load the comments as you keep scrolling down. There are around 394 comments left on that video you have to first make sure all the comments are loaded and then also expand all View Replies so that you will reach the max comments count.

注意:使用下面的代码行,我可以获得700条注释.

Note: I was able to get 700 comments using the below lines of code.

# get the last comment
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
# scroll to the last comment currently loaded
lastEle.location_once_scrolled_into_view
# wait until the comments loading is done
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# load all comments
while lastEle != driver.find_element_by_xpath("(//*[@id='content-text'])[last()]"):
    lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
    driver.find_element_by_xpath("(//*[@id='content-text'])[last()]").location_once_scrolled_into_view
    time.sleep(2)
    WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))

# open all replies
for reply in driver.find_elements_by_xpath("//*[@id='replies']//paper-button[@class='style-scope ytd-button-renderer'][contains(.,'View')]"):
    reply.location_once_scrolled_into_view
    driver.execute_script("arguments[0].click()",reply)
time.sleep(5)
WebDriverWait(driver, 30).until(
        EC.invisibility_of_element((By.CSS_SELECTOR, "div.active.style-scope.paper-spinner")))
# print the total number of comments
print(len(driver.find_elements_by_xpath("//*[@id='content-text']")))

这篇关于如何使用Selenium在youtube中获得所有评论?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆