如何使用Python从网页的检查元素获取数据 [英] How to get data from inspect element of a webpage using Python
问题描述
我想使用Python从inspect元素中获取数据。我可以使用BeautifulSoup下载源代码,但现在我需要来自网页的检查元素的文本。如果你能告诉我如何去做,我会很感激。
编辑:
通过检查元素我的意思是,在谷歌浏览器中,右键单击给我们一个选项,名为inspect元素,它具有与该特定元素相关的代码页。我想提取这些代码/只是它的文本字符串。 以运行Javascript的方式,你应该看看Selenium。它可以自动驱动一个网页浏览器(甚至是像PhantomJS这样的无头网页浏览器,所以你不必打开一个窗口)。
为了获得HTML,你需要评估一些JavaScript。简单的示例代码,改变以适应:来自selenium的
import webdriver
$ p注意1:如果你想要一个特定的元素或元素,你实际上有两个选择 - 用Python解析HTML,或者编写更具体的JavaScript来返回你注意2:如果您确实需要来自Chrome工具的特定信息,而不仅仅是动态生成的HTML,那么您需要一种钩入Chrome本身的方法。 没有办法。
driver = webdriver.PhantomJS( )
driver.get(http://google.com)
#这将得到最初的html - 在javascript
之前html1 = driver.page_source
#这将在加载后得到html
html2 = driver.execute_script(return document.documentElement.innerHTML;)
I'd like to get the data from inspect element using Python. I'm able to download the source code using BeautifulSoup but now I need the text from inspect element of a webpage. I'd truly appreciate if you could advise me how to do it.
Edit: By inspect element I mean, in google chrome, right click gives us an option called inspect element which has code related to each element of that particular page. I'd like to extract that code/ just its text strings.
解决方案If you want to automatically fetch a web page from Python in a way that runs Javascript, you should look into Selenium. It can automatically drive a web browser (even a headless web browser such as PhantomJS, so you don't have to have a window open).
In order to get the HTML, you'll need to evaluate some javascript. Simple sample code, alter to suit:
from selenium import webdriver driver = webdriver.PhantomJS() driver.get("http://google.com") # This will get the initial html - before javascript html1 = driver.page_source # This will get the html after on-load javascript html2 = driver.execute_script("return document.documentElement.innerHTML;")
Note 1: If you want a specific element or elements, you actually have a couple of options -- parse the HTML in Python, or write more specific JavaScript that returns what you want.
Note 2: if you actually need specific information from Chrome's tools that is not just dynamically generated HTML, you'll need a way to hook into Chrome itself. No way around that.
这篇关于如何使用Python从网页的检查元素获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!