如何使用Python从网页的检查元素获取数据 [英] How to get data from inspect element of a webpage using Python

查看:2408
本文介绍了如何使用Python从网页的检查元素获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python从inspect元素中获取数据。我可以使用BeautifulSoup下载源代码,但现在我需要来自网页的检查元素的文本。如果你能告诉我如何去做,我会很感激。

编辑:
通过检查元素我的意思是,在谷歌浏览器中,右键单击给我们一个选项,名为inspect元素,它具有与该特定元素相关的代码页。我想提取这些代码/只是它的文本字符串。 以运行Javascript的方式,你应该看看Selenium。它可以自动驱动一个网页浏览器(甚至是像PhantomJS这样的无头网页浏览器,所以你不必打开一个窗口)。



为了获得HTML,你需要评估一些JavaScript。简单的示例代码,改变以适应:来自selenium的

  import webdriver 

driver = webdriver.PhantomJS( )
driver.get(http://google.com)

#这将得到最初的html - 在javascript
之前html1 = driver.page_source

#这将在加载后得到html
html2 = driver.execute_script(return document.documentElement.innerHTML;)
没有办法。


I'd like to get the data from inspect element using Python. I'm able to download the source code using BeautifulSoup but now I need the text from inspect element of a webpage. I'd truly appreciate if you could advise me how to do it.

Edit: By inspect element I mean, in google chrome, right click gives us an option called inspect element which has code related to each element of that particular page. I'd like to extract that code/ just its text strings.

解决方案

If you want to automatically fetch a web page from Python in a way that runs Javascript, you should look into Selenium. It can automatically drive a web browser (even a headless web browser such as PhantomJS, so you don't have to have a window open).

In order to get the HTML, you'll need to evaluate some javascript. Simple sample code, alter to suit:

from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("http://google.com")

# This will get the initial html - before javascript
html1 = driver.page_source

# This will get the html after on-load javascript
html2 = driver.execute_script("return document.documentElement.innerHTML;")

Note 1: If you want a specific element or elements, you actually have a couple of options -- parse the HTML in Python, or write more specific JavaScript that returns what you want.

Note 2: if you actually need specific information from Chrome's tools that is not just dynamically generated HTML, you'll need a way to hook into Chrome itself. No way around that.

这篇关于如何使用Python从网页的检查元素获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆