如何从烂番茄上抓取超过一页的评论? [英] How to scrape more than one page of critic reviews from Rotten Tomatoes?

查看:32
本文介绍了如何从烂番茄上抓取超过一页的评论?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用此抓取工具来抓取此 URL 的评论家评论:https://www.rottentomatoes.com/m/avengers_endgame/reviews尽管如此,我一直在努力研究如何浏览其他页面,因为这目前会刮掉第一页的评论家评论.有谁知道我会怎么做?

I've been using this scraper to scrape critic reviews for this URL: https://www.rottentomatoes.com/m/avengers_endgame/reviews Although, I've been struggling with how to go through additional pages as this currently scrapes critic reviews of the first page. Does anyone know how I would go about this?

import selenium
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
driver.get("https://www.rottentomatoes.com/m/avengers_endgame/reviews")
review_1df = pd.DataFrame(columns=['Date', 'Reviewer', 'Website', 'Review', 'Score'])
dates = []
reviews = []
scores = []
newscores = []
names = []
sites = []
results = driver.find_elements_by_class_name("review_area")
reviewnum = 1
reviewers = driver.find_elements_by_class_name("col-xs-8")

for r in results:
    dates.append(r.find_element_by_class_name('subtle').text)
    reviews.append(r.find_element_by_class_name('the_review').text)
    revs = r.find_element_by_class_name('review_desc')
    scores.append(revs.find_element_by_class_name('subtle').text)
    
    for r in reviewers:
        names.append(r.find_element_by_xpath('//*[@id="reviews"]/div[2]/div[4]/div[' +str(reviewnum)+ ']/div[1]/div[3]/a[1]').text)
        sites.append(r.find_element_by_xpath('//*[@id="reviews"]/div[2]/div[4]/div[' +str(reviewnum)+']/div[1]/div[3]/a[2]/em').text)
        reviewnum+=1

for score in scores:
    if score == ('Full Review'):
        newscores.append('no score')
    else:
        score2 = score[14:]
        newscores.append(score2)
        
review_1df['Date'] = dates
review_1df['Review'] = reviews
review_1df['Score'] = newscores
review_1df['Reviewer'] = names
review_1df['Website'] = sites

推荐答案

您可以使用 URL 参数进入下一页评论并重复相同的步骤.例如,以下网址会将您带到评论的第二页:

You can use URL parameters to get to the next page of reviews and repeat the same steps. For example, the following url will take you to the second page of reviews:

https://www.rottentomatoes.com/m/avengers_endgame/reviews?type=&sort=&page=2

注意参数是 type=&sort=&page=2,您还可以在其中指定排序和类型.将其更改为 page=3 以进入第三页.

Note the parameters are type=&sort=&page=2 where you can also specify the sorting and type. Change it to page=3 to get to the third page.

您还必须添加检查以查看该页面是否存在.例如,您将不会收到有关此网址的评论:

You'll also have to add a check to see if the page even exists. For example, you'll get no reviews on this URL:

https://www.rottentomatoes.com/m/avengers_endgame/reviews?type=&sort=&page=200000

这篇关于如何从烂番茄上抓取超过一页的评论?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆