无法从网页获取动态生成的内容 [英] Unable to get a dynamically generated content from a webpage
问题描述
我已经用python用selenium编写了一个脚本,以获取位于页面右下角标题Company profile
下的网页右下角的business summary
(位于p
标记内).该网页是高度动态的,因此我认为使用浏览器模拟器.我创建了一个CSS选择器,如果我直接从该网页复制html elements
并在本地尝试,则可以解析摘要.出于某种原因,当我在下面的脚本中尝试使用相同的选择器时,它不会成功.而是抛出timeout exception
错误.我该如何获取?
I have written a script in python using selenium to fetch the business summary
(which is within p
tag) located at the bottom right corner of a webpage under the header Company profile
. The webpage is heavily dynamic, so I thought to use a browser simulator. I have created a css selector, which is able to parse the summary if I copy the html elements
directly from that webpage and try on it locally. For some reason, when I tried the same selector within my below script, it doesn't do the trick. It throws timeout exception
error instead. How can I fetch it?
这是我的尝试:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
link = "https://in.finance.yahoo.com/quote/AAPL?p=AAPL"
def get_information(driver, url):
driver.get(url)
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
driver.execute_script("arguments[0].scrollIntoView();", item)
print(item.text)
if __name__ == "__main__":
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)
try:
get_information(driver,link)
finally:
driver.quit()
推荐答案
最初似乎没有业务摘要"块,但它是在向下滚动页面后生成的.请尝试以下解决方案:
It seem that there is no Business Summary block initially, but it is generated after you scroll page down. Try below solution:
from selenium.webdriver.common.keys import Keys
def get_information(driver, url):
driver.get(url)
driver.find_element_by_tag_name("body").send_keys(Keys.END)
item = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "[id$='-QuoteModule'] p[class^='businessSummary']")))
print(item.text)
这篇关于无法从网页获取动态生成的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!