Python单击“更多”按钮不起作用 [英] Python click 'More' button is not working

查看:75
本文介绍了Python单击“更多”按钮不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试为每个评论单击更多按钮,以便可以将这些文本评论扩展为完整内容,然后尝试刮取这些文本评论。不用单击更多按钮,我最终得到的就像是

这个房间很好,很干净。位置很好。更多信息。

I tried to click "More" button for each review so that I can expand these text reviews to the full contents and then I try to scrape those text reviews. Without clicking "More" button, what I end up retrieving is something like
"This room was nice and clean. The location...More".

我尝试了几种不同的功能来解决它​​,例如硒按钮单击和ActionChain,但我想我没有正确使用它们。有人可以帮我解决这个问题吗?

I tried a few different functions to figure it out such as selenium button click and ActionChain but I guess I'm not using these properly. Could someone help me out with this issue?

下面是我当前的代码:
我没有上传整个代码以避免一些不必要的输出(试图简单点)。

Below is my current code: I didn't upload the whole code to avoid some unnecessary outputs (tried to make it simple).

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains

#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

#url I want to visit.
lists=['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for k in lists:

    driver.get(k)
    html =driver.page_source
    soup=BeautifulSoup(html,"html.parser")
    time.sleep(3)
    listing=soup.find_all("div", class_="review-container")

    for i in range(len(listing)):

        try:
            #First, I tried this but didn't work.
            #link = driver.find_element_by_link_text('More')
            #driver.execute_script("arguments[0].click();", link)

            #Second, I tried ActionaChains but didn't work.
            ActionChains(driver).move_to_element(i).click().perform()
        except:
            pass

        text_review=soup.find_all("div", class_="prw_rup prw_reviews_text_summary_hsx")
        text_review_inside=text_review[i].find("p", class_="partial_entry")
        review_text=text_review_inside.text

        print (review_text)


推荐答案

您在所有这些代码中最大的错误是例外:通过。如果没有这个,您将可以解决很久以前的问题。代码引发错误消息,其中包含所有信息,但您看不到它。您至少可以使用

Your the biggest mistake in all this code is except: pass. Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. You could at least use

except Exception as ex:
    print(ex)






问题是 move_to_element()不适用于 BeautifulSoup 元素。我必须是Selenium的元素-像


Problem is that move_to_element() will not work with BeautifulSoup elements. I has to be Selenium's element - like

link = driver.find_element_by_link_text('More')

ActionChains(driver).move_to_element(link)

但是在执行某些功能后,Selenium需要一些时间来完成-并且Python必须等待唤醒。

But after executing some functions Selenium needs some time to do it - and Python has to wait awaile.

我不使用 BeautifulSoup 来获取数据,但是如果您想使用它,然后在单击所有链接后获取 driver.page_source 。否则,每次单击后您都必须一次又一次地获得 driver.page_source

I don't use BeautifulSoup to get data but if you want to use it then get driver.page_source after clicking all links. Or you will have to get again and again driver.page_source after every click.

有时单击后可能会再次获得Selenium元素-因此,我首先获得条目以单击更多,然后获得 partial_entry 以获得评论。

Sometimes after clicking you may have to get again even Selenium elements - so I first get entry to click More and later I get partial_entry to get reviews.

我发现在第一条评论中单击更多,它会显示所有评论的文本,因此无需单击在所有更多上。

I found that clicking More in first review it shows text for all reviews so it doesn't need to click on all More.

已在Firefox 69,Linux Mint 19.2,Python 3.7.5,Selenium 3.141上进行了测试

Tested with Firefox 69, Linux Mint 19.2, Python 3.7.5, Selenium 3.141

#from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
import time

#Incognito Mode
option = webdriver.ChromeOptions()
option.add_argument("--incognito")

#Open Chrome
#driver = webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)

driver = webdriver.Firefox()

#url I want to visit.
lists = ['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']

for url in lists:

    driver.get(url)
    time.sleep(3)

    link = driver.find_element_by_link_text('More')

    try:
        ActionChains(driver).move_to_element(link)
        time.sleep(1) # time to move to link

        link.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    description = driver.find_element_by_class_name('vr-overview-Overview__propertyDescription--1lhgd')
    print('--- description ---')
    print(description.text)
    print('--- end ---')

    # first "More" shows text in all reviews - there is no need to search other "More"
    first_entry = driver.find_element_by_class_name('entry')
    more = first_entry.find_element_by_tag_name('span')

    try:
        ActionChains(driver).move_to_element(more)
        time.sleep(1) # time to move to link

        more.click()
        time.sleep(1) # time to update HTML
    except Exception as ex:
        print(ex)

    all_reviews = driver.find_elements_by_class_name('partial_entry')
    print('all_reviews:', len(all_reviews))

    for i, review in enumerate(all_reviews, 1):
        print('--- review', i, '---')
        print(review.text)
        print('--- end ---')






编辑:

要跳过响应我搜索了所有 class = wrap ,然后在每个包装中搜索了 class = partial_entry 。我的每篇评论只能是一篇评论,最终是一篇回应。评论的索引为 [0] 。有些包装不会继续审核,因此会给出空白列表-在从列表中获取元素 [0] 之前,我必须进行检查。

To skip responses I search all class="wrap" and then inside every wrap I search class="partial_entry". I every wrap can be only one review and eventually one response. Review has alwasy index [0]. Some wraps don't keep review so they will gives empty list - and I have to check it before I can get element [0] from list.

all_reviews = driver.find_elements_by_class_name('wrap')
#print('all_reviews:', len(all_reviews))

for review in all_reviews:
    all_entries = review.find_elements_by_class_name('partial_entry')
    if all_entries:
        print('--- review ---')
        print(all_entries[0].text)
        print('--- end ---')

这篇关于Python单击“更多”按钮不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆