无法从网页获取动态生成的内容 [英] Unable to get a dynamically generated content from a webpage

查看:99
本文介绍了无法从网页获取动态生成的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用python用selenium编写了一个脚本,以获取位于页面右下角标题Company profile下的网页右下角的business summary(位于p标记内).该网页是高度​​动态的,因此我认为使用浏览器模拟器.我创建了一个CSS选择器,如果我直接从该网页复制html elements并在本地尝试,则可以解析摘要.出于某种原因,当我在下面的脚本中尝试使用相同的选择器时,它不会成功.而是抛出timeout exception错误.我该如何获取?

I have written a script in python using selenium to fetch the business summary (which is within p tag) located at the bottom right corner of a webpage under the header Company profile. The webpage is heavily dynamic, so I thought to use a browser simulator. I have created a css selector, which is able to parse the summary if I copy the html elements directly from that webpage and try on it locally. For some reason, when I tried the same selector within my below script, it doesn't do the trick. It throws timeout exception error instead. How can I fetch it?

这是我的尝试:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

link = "https://in.finance.yahoo.com/quote/AAPL?p=AAPL"

def get_information(driver, url):
    driver.get(url)
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
    driver.execute_script("arguments[0].scrollIntoView();", item)
    print(item.text)

if __name__ == "__main__":
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver, 20)
    try:
        get_information(driver,link)
    finally:
        driver.quit()

推荐答案

最初似乎没有业务摘要"块,但它是在向下滚动页面后生成的.请尝试以下解决方案:

It seem that there is no Business Summary block initially, but it is generated after you scroll page down. Try below solution:

from selenium.webdriver.common.keys import Keys

def get_information(driver, url):
    driver.get(url)
    driver.find_element_by_tag_name("body").send_keys(Keys.END)
    item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
    print(item.text)

这篇关于无法从网页获取动态生成的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆