Python单击“更多”按钮不起作用 [英] Python click 'More' button is not working
问题描述
我尝试为每个评论单击更多按钮,以便可以将这些文本评论扩展为完整内容,然后尝试刮取这些文本评论。不用单击更多按钮,我最终得到的就像是
这个房间很好,很干净。位置很好。更多信息。
I tried to click "More" button for each review so that I can expand these text reviews to the full contents and then I try to scrape those text reviews. Without clicking "More" button, what I end up retrieving is something like
"This room was nice and clean. The location...More".
我尝试了几种不同的功能来解决它,例如硒按钮单击和ActionChain,但我想我没有正确使用它们。有人可以帮我解决这个问题吗?
I tried a few different functions to figure it out such as selenium button click and ActionChain but I guess I'm not using these properly. Could someone help me out with this issue?
下面是我当前的代码:
我没有上传整个代码以避免一些不必要的输出(试图简单点)。
Below is my current code: I didn't upload the whole code to avoid some unnecessary outputs (tried to make it simple).
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
#Incognito Mode
option=webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
driver=webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)
#url I want to visit.
lists=['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']
for k in lists:
driver.get(k)
html =driver.page_source
soup=BeautifulSoup(html,"html.parser")
time.sleep(3)
listing=soup.find_all("div", class_="review-container")
for i in range(len(listing)):
try:
#First, I tried this but didn't work.
#link = driver.find_element_by_link_text('More')
#driver.execute_script("arguments[0].click();", link)
#Second, I tried ActionaChains but didn't work.
ActionChains(driver).move_to_element(i).click().perform()
except:
pass
text_review=soup.find_all("div", class_="prw_rup prw_reviews_text_summary_hsx")
text_review_inside=text_review[i].find("p", class_="partial_entry")
review_text=text_review_inside.text
print (review_text)
推荐答案
您在所有这些代码中最大的错误是例外:通过。
如果没有这个,您将可以解决很久以前的问题。代码引发错误消息,其中包含所有信息,但您看不到它。您至少可以使用
Your the biggest mistake in all this code is except: pass.
Without this you would resolve problem long time ago. Code raise error message with all information but you can't see it. You could at least use
except Exception as ex:
print(ex)
问题是 move_to_element()
不适用于 BeautifulSoup
元素。我必须是Selenium的元素-像
Problem is that move_to_element()
will not work with BeautifulSoup
elements. I has to be Selenium's element - like
link = driver.find_element_by_link_text('More')
ActionChains(driver).move_to_element(link)
但是在执行某些功能后,Selenium需要一些时间来完成-并且Python必须等待唤醒。
But after executing some functions Selenium needs some time to do it - and Python has to wait awaile.
我不使用 BeautifulSoup
来获取数据,但是如果您想使用它,然后在单击所有链接后获取 driver.page_source
。否则,每次单击后您都必须一次又一次地获得 driver.page_source
。
I don't use BeautifulSoup
to get data but if you want to use it then get driver.page_source
after clicking all links. Or you will have to get again and again driver.page_source
after every click.
有时单击后可能会再次获得Selenium元素-因此,我首先获得条目以单击更多
,然后获得 partial_entry
以获得评论。
Sometimes after clicking you may have to get again even Selenium elements - so I first get entry to click More
and later I get partial_entry
to get reviews.
我发现在第一条评论中单击更多
,它会显示所有评论的文本,因此无需单击在所有更多
上。
I found that clicking More
in first review it shows text for all reviews so it doesn't need to click on all More
.
已在Firefox 69,Linux Mint 19.2,Python 3.7.5,Selenium 3.141上进行了测试
Tested with Firefox 69, Linux Mint 19.2, Python 3.7.5, Selenium 3.141
#from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import ActionChains
import time
#Incognito Mode
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
#Open Chrome
#driver = webdriver.Chrome(executable_path="C:/Users/chromedriver.exe",chrome_options=option)
driver = webdriver.Firefox()
#url I want to visit.
lists = ['https://www.tripadvisor.com/VacationRentalReview-g30196-d6386734-Hot_51st_St_Walk_to_Mueller_2BDR_Modern_sleeps_7-Austin_Texas.html']
for url in lists:
driver.get(url)
time.sleep(3)
link = driver.find_element_by_link_text('More')
try:
ActionChains(driver).move_to_element(link)
time.sleep(1) # time to move to link
link.click()
time.sleep(1) # time to update HTML
except Exception as ex:
print(ex)
description = driver.find_element_by_class_name('vr-overview-Overview__propertyDescription--1lhgd')
print('--- description ---')
print(description.text)
print('--- end ---')
# first "More" shows text in all reviews - there is no need to search other "More"
first_entry = driver.find_element_by_class_name('entry')
more = first_entry.find_element_by_tag_name('span')
try:
ActionChains(driver).move_to_element(more)
time.sleep(1) # time to move to link
more.click()
time.sleep(1) # time to update HTML
except Exception as ex:
print(ex)
all_reviews = driver.find_elements_by_class_name('partial_entry')
print('all_reviews:', len(all_reviews))
for i, review in enumerate(all_reviews, 1):
print('--- review', i, '---')
print(review.text)
print('--- end ---')
编辑:
要跳过响应我搜索了所有 class = wrap
,然后在每个包装中搜索了 class = partial_entry
。我的每篇评论只能是一篇评论,最终是一篇回应。评论的索引为 [0]
。有些包装不会继续审核,因此会给出空白列表-在从列表中获取元素 [0]
之前,我必须进行检查。
To skip responses I search all class="wrap"
and then inside every wrap I search class="partial_entry"
. I every wrap can be only one review and eventually one response. Review has alwasy index [0]
. Some wraps don't keep review so they will gives empty list - and I have to check it before I can get element [0]
from list.
all_reviews = driver.find_elements_by_class_name('wrap')
#print('all_reviews:', len(all_reviews))
for review in all_reviews:
all_entries = review.find_elements_by_class_name('partial_entry')
if all_entries:
print('--- review ---')
print(all_entries[0].text)
print('--- end ---')
这篇关于Python单击“更多”按钮不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!