selenium python webscrape 在第一次迭代后失败 [英] selenium python webscrape fails after first iteration
问题描述
我正在遍历tripadvisor以保存评论(未翻译的、原始的)和翻译的评论(从葡萄牙语到英语).所以scraper首先选择要显示的葡萄牙语评论,然后像往常一样将它们一一转换成英文并将翻译后的评论保存在com_中,而扩展的未翻译评论则保存在expanded_comments中.
Im iterating through tripadvisor to save comments(non-translated, original) and translated comments (from portuguese to english). So the scraper first selects portuguese comments to be displayed , then as usual it converts them into english one by one and saves the translated comments in com_, whereas the expanded non-translated comments in expanded_comments.
代码在第一页上工作正常,但从第二页开始它无法保存翻译的注释.奇怪的是,它只翻译每个页面的第一条评论,甚至不保存它们.
The code works fine with first page but from second page onward it fails to save translated comments. Strangely it just translates only first comment from each of the pages and doesnt even save them.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
com_=[]
expanded_comments=[]
date_=[]
driver = webdriver.Chrome("C:\Users\shalini\Downloads\chromedriver_win32\chromedriver.exe")
driver.maximize_window()
from bs4 import BeautifulSoup
def expand_reviews(driver):
# TRYING TO EXPAND REVIEWS (& CLOSE A POPUP)
try:
driver.find_element_by_class_name("moreLink").click()
except:
print "err"
try:
driver.find_element_by_class_name("ui_close_x").click()
except:
print "err2"
try:
driver.find_element_by_class_name("moreLink").click()
except:
print "err3"
def save_comments(driver):
expand_reviews(driver)
# SELECTING ALL EXPANDED COMMENTS
#xpanded_com_elements=driver.find_elements_by_class_name("entry")
time.sleep(3)
#or i in expanded_com_elements:
# expanded_comments.append(i.text)
spi=driver.page_source
sp=BeautifulSoup(spi)
for t in sp.findAll("div",{"class":"entry"}):
if not t.findAll("p",{"class":"partial_entry"}):
#print t
expanded_comments.append(t.getText())
# Saving review date
for d in sp.findAll("span",{"class":"recommend-titleInline"}) :
date=d.text
date_.append(date_)
# SELECTING ALL GOOGLE-TRANSLATOR links
gt= driver.find_elements(By.CSS_SELECTOR,".googleTranslation>.link")
# NOW PRINTING TRANSLATED COMMENTS
for i in gt:
try:
driver.execute_script("arguments[0].click()",i)
#com=driver.find_element_by_class_name("ui_overlay").text
com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
com_.append(com.text)
time.sleep(5)
driver.find_element_by_class_name("ui_close_x").click().perform()
time.sleep(5)
except Exception as e:
pass
# ITERATING THROIGH ALL 200 tripadvisor webpages and saving comments & translated comments
for i in range(200):
page=i*10
url="https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or"+str(page)+"-TAP-Portugal#REVIEWS"
driver.get(url)
wait = WebDriverWait(driver, 10)
if i==0:
# SELECTING PORTUGUESE COMMENTS ONLY # Run for one time then iterate over pages
try:
langselction = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.sprite-date_picker-triangle")))
langselction.click()
driver.find_element_by_xpath("//div[@class='languageList']//li[normalize-space(.)='Portuguese first']").click()
time.sleep(5)
except Exception as e:
print e
save_comments(driver)
推荐答案
你的代码有3个问题
- 内部方法
save_comments()
,在driver.find_element_by_class_name("ui_close_x").click().perform()
中,方法click()一个 webelement 的
不是 ActionChain,所以你不能调用perform()
.因此,该行应该是这样的:
- Inside method
save_comments()
, at thedriver.find_element_by_class_name("ui_close_x").click().perform()
, the methodclick()
of a webelement is not an ActionChain so you cannot callperform()
. Therefore, that line should be like this:
driver.find_element_by_class_name("ui_close_x").click()
- 内部方法
save_comments()
,位于com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
,当元素还未出现时,你会找到它.所以你必须在这一行之前添加等待.你的代码应该是这样的:
- Inside method
save_comments()
, at thecom= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
, you find the element when it doesn't appear yet. So you have to add wait before this line. Your code should be like this:
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, ".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")))
com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
- 有2个按钮可以打开评论,一个是显示的,一个是隐藏的.所以你必须跳过隐藏的按钮.
if not i.is_displayed():
continue
driver.execute_script("arguments[0].click()",i)
这篇关于selenium python webscrape 在第一次迭代后失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!