Python Webscraping Selenium 和 BeautifulSoup(模态窗口内容) [英] Python Webscraping Selenium and BeautifulSoup (Modal window content)

查看:27
本文介绍了Python Webscraping Selenium 和 BeautifulSoup(模态窗口内容)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习网页抓取(我是一个新手).我注意到在某些网站上(例如 Quora),当我单击一个按钮时,屏幕上会出现一个新元素.我似乎无法获取新元素的页面源.我希望能够获取新弹出窗口的页面源并获取所有元素.请注意,您需要拥有 Quora 帐户才能了解我的问题.

我有一段代码,您可以使用 beautifulsoup、selenium 和 chromedriver:

I am trying to learn webscraping (I am a total novice). I noticed that on some websites (for eg. Quora), when I click a button and a new element comes up on screen. I cannot seem to get the page source of the new element. I want to be able to get the page source of the new popup and get all the elements. Note that you need to have a Quora account in order to understand my problem.

I have a part of a code that you can use using beautifulsoup, selenium and chromedriver:

from selenium import webdriver
from bs4 import BeautifulSoup
from unidecode import unidecode 
import time

sleep = 10
USER_NAME = 'Insert Account name' #Insert Account name here
PASS_WORD = 'Insert Account Password' #Insert Account Password here
url = 'Insert url' 
url2 = ['insert url']
#Logging in to your account
driver = webdriver.Chrome('INSERT PATH TO CHROME DRIVER')
driver.get(url)
page_source=driver.page_source
if 'Continue With Email' in page_source:
    try:
        username = driver.find_element(By.XPATH, '//input[@placeholder="Email"]')
        password = driver.find_element(By.XPATH, '//input[@placeholder="Password"]')
        login= driver.find_element(By.XPATH, '//input[@value="Login"]')
        username.send_keys(USER_NAME)
        password.send_keys(PASS_WORD)
        time.sleep(sleep)
        login.click()
        time.sleep(sleep)
    except:
        print ('Did not work :( .. Try again')
else:
    print ('Did not work :( .. Try different page')


下一部分将转到相关网页并(尝试")收集有关特定问题的关注者的信息.


Next part will go to the concerned webpage and ("try to") collect information about the followers of a particular question.

for url1 in url2:        
    driver.get(url1)
    source = driver.page_source
    soup1 = BeautifulSoup(source,"lxml")  
    Follower_button = soup1.find('a',{'class':'FollowerListModalLink QuestionFollowerListModalLink'})
    Follower_button2 = unidecode(Follower_button.text)
    driver.find_element_by_link_text(Follower_button2).click()

####Does not gives me correct page source in the next line####
    source2=driver.page_source
    soup2=BeautifulSoup(source2,"lxml")

    follower_list = soup2.findAll('div',{'class':'FollowerListModal QuestionFollowerListModal Modal'})
    if len(follower_list)>0:
        print 'It worked :)'
    else:
        print 'Did not work :('

但是,当我尝试获取 follower 元素的页面源时,我最终获得了主页的页面源,而不是 follower 元素.谁能帮我获取弹出的follower元素的页面源?我没有得到什么.

However when I try to get the page source of the followers element, I end up getting the page source of the main page rather than the follower element. Can anyone help me to get the page source of the follower element that pops up?? What am I not getting here.

注意:重新创建或查看我的问题的另一种方法是登录您的 Quora 帐户(如果您有),然后向关注者提出任何问题.如果您单击屏幕右下方的关注者按钮,则会弹出一个窗口.我的问题本质上是获取此弹出窗口的元素.

NOTE: Another way of recreating or looking at my problem is to log in to your Quora account (if you have one) and then go to any question with followers. If you click the followers button on the lower right side of the screen, that will result in a popup. My problem is essentially to get the elements of this popup.


更新 -好的,所以我一直在阅读,看起来该窗口是一个模态窗口.有没有人帮我获取模态窗口的内容?

Update - Okay so I have been reading a bit and it seems like the window is a modal window. Does anyone help me with getting contents of a modal window?

推荐答案

问题已解决.我所要做的就是添加一行:

Problem resolved. All I had to do was to add one line:

time.sleep(sleep_time)

产生点击后.问题是因为最初没有等待时间,页面源没有得到更新.然而,随着 time.sleep 足够长(可能因网站而异),页面源终于更新了,我能够获得所需的元素.:) 学习到教训了.耐心是网络抓取的关键.花了一整天的时间试图解决这个问题.

after generating the click. The problem was because there was no wait time initially, the page source was not getting updated. However with time.sleep sufficiently long (may vary from website to website), the page source finally got updated and I was able to get the required elements. :) Lesson learnt. Patience is the key to web scraping. Spent the entire day trying to figure this out.

这篇关于Python Webscraping Selenium 和 BeautifulSoup(模态窗口内容)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆