selenium python webscrape 在第一次迭代后失败 [英] selenium python webscrape fails after first iteration

查看:34
本文介绍了selenium python webscrape 在第一次迭代后失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在遍历tripadvisor以保存评论(未翻译的、原始的)和翻译的评论(从葡萄牙语到英语).所以scraper首先选择要显示的葡萄牙语评论,然后像往常一样将它们一一转换成英文并将翻译后的评论保存在com_中,而扩展的未翻译评论则保存在expanded_comments中.

Im iterating through tripadvisor to save comments(non-translated, original) and translated comments (from portuguese to english). So the scraper first selects portuguese comments to be displayed , then as usual it converts them into english one by one and saves the translated comments in com_, whereas the expanded non-translated comments in expanded_comments.

代码在第一页上工作正常,但从第二页开始它无法保存翻译的注释.奇怪的是,它只翻译每个页面的第一条评论,甚至不保存它们.

The code works fine with first page but from second page onward it fails to save translated comments. Strangely it just translates only first comment from each of the pages and doesnt even save them.

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
com_=[]
expanded_comments=[]
date_=[]
driver = webdriver.Chrome("C:\Users\shalini\Downloads\chromedriver_win32\chromedriver.exe")
driver.maximize_window()
from bs4 import BeautifulSoup

def expand_reviews(driver):
    # TRYING TO EXPAND REVIEWS (& CLOSE A POPUP)    
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err"
    try:
        driver.find_element_by_class_name("ui_close_x").click()
    except:
        print "err2"
    try:
        driver.find_element_by_class_name("moreLink").click()
    except:
        print "err3"




def save_comments(driver):
    expand_reviews(driver)
    # SELECTING ALL EXPANDED COMMENTS
    #xpanded_com_elements=driver.find_elements_by_class_name("entry")
    time.sleep(3)
    #or i in expanded_com_elements:
    #   expanded_comments.append(i.text)
    spi=driver.page_source
    sp=BeautifulSoup(spi)
    for t in sp.findAll("div",{"class":"entry"}):
        if not t.findAll("p",{"class":"partial_entry"}):
            #print t
            expanded_comments.append(t.getText())
    # Saving review date
    for d in sp.findAll("span",{"class":"recommend-titleInline"}) :
        date=d.text
        date_.append(date_)


    # SELECTING ALL GOOGLE-TRANSLATOR links
    gt= driver.find_elements(By.CSS_SELECTOR,".googleTranslation>.link")

    # NOW PRINTING TRANSLATED COMMENTS
    for i in gt:
        try:
            driver.execute_script("arguments[0].click()",i)

            #com=driver.find_element_by_class_name("ui_overlay").text
            com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")
            com_.append(com.text)
            time.sleep(5)
            driver.find_element_by_class_name("ui_close_x").click().perform()
            time.sleep(5)
        except Exception as e:
            pass

# ITERATING THROIGH ALL 200 tripadvisor webpages and saving comments & translated comments             
for i in range(200):
    page=i*10
    url="https://www.tripadvisor.com/Airline_Review-d8729164-Reviews-Cheap-Flights-or"+str(page)+"-TAP-Portugal#REVIEWS"
    driver.get(url)
    wait = WebDriverWait(driver, 10)
    if i==0:
        # SELECTING PORTUGUESE COMMENTS ONLY # Run for one time then iterate over pages
        try:
            langselction = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "span.sprite-date_picker-triangle")))
            langselction.click()
            driver.find_element_by_xpath("//div[@class='languageList']//li[normalize-space(.)='Portuguese first']").click()
            time.sleep(5)
        except Exception as e:
            print e

    save_comments(driver)

推荐答案

你的代码有3个问题

  1. 内部方法 save_comments(),在 driver.find_element_by_class_name("ui_close_x").click().perform() 中,方法 click()一个 webelement 的 不是 ActionChain,所以你不能调用 perform().因此,该行应该是这样的:
  1. Inside method save_comments(), at the driver.find_element_by_class_name("ui_close_x").click().perform(), the method click() of a webelement is not an ActionChain so you cannot call perform(). Therefore, that line should be like this:

driver.find_element_by_class_name("ui_close_x").click()

  1. 内部方法 save_comments(),位于 com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']"),当元素还未出现时,你会找到它.所以你必须在这一行之前添加等待.你的代码应该是这样的:
  1. Inside method save_comments(), at the com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']"), you find the element when it doesn't appear yet. So you have to add wait before this line. Your code should be like this:

wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, ".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")))
com= driver.find_element_by_xpath(".//span[@class = 'ui_overlay ui_modal ']//div[@class='entry']")

  1. 有2个按钮可以打开评论,一个是显示的,一个是隐藏的.所以你必须跳过隐藏的按钮.

if not i.is_displayed():
    continue
driver.execute_script("arguments[0].click()",i)

这篇关于selenium python webscrape 在第一次迭代后失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆