如何使用python以编程方式测量HTML源代码中元素的大小? [英] How to programmatically measure the elements' sizes in HTML source code using python?

查看:31
本文介绍了如何使用python以编程方式测量HTML源代码中元素的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用python做网页布局分析.一个基本任务是在给定 HTML 源代码的情况下以编程方式测量元素的大小,以便我们可以获得网页语料库的内容/广告比例、广告块位置、广告块大小的统计数据.

一个明显的方法是使用宽度/高度属性,但它们并不总是可用.此外,width: 50% 之类的东西需要在加载到 DOM 后计算.所以我想将 HTML 源代码加载到窗口大小预定义的浏览器中(比如 mechanize 虽然我'我不确定是否可以设置窗口的大小)是一个很好的尝试方法,但无论如何 mechanize 不支持返回元素大小.

有没有通用的方法(没有宽度/高度属性)在 python 中做到这一点,最好是使用一些库?

谢谢!

解决方案

我建议你看看 Ghost - 用 python 编写的 webkit web 客户端.它具有 JavaScript 支持,因此您可以轻松调用 JavaScript 函数并获取其返回值.示例显示如何找出谷歌文本框的宽度:

<预><代码>>>>从幽灵进口幽灵>>>鬼 = 鬼()>>>ghost.open('https://google.lt')>>>width, resources = ghost.evaluate("document.getElementById('gbqfq').offsetWidth;")>>>宽度541.0 # 谷歌文本框宽度 541px

I'm doing webpage layout analysis in python. A fundamental task is to programmatically measure the elements' sizes given HTML source codes, so that we could obtain statistical data of content/ad ratio, ad block position, ad block size for the webpage corpus.

An obvious approach is to use the width/height attributes, but they're not always available. Besides, things like width: 50% needs to be calculated after loading into DOM. So I guess loading the HTML source code into a window-size-predefined-browser (like mechanize although I'm not sure if window's size could be set) is a good way to try, but mechanize doesn't support the return of an element size anyway.

Is there any universal way (without width/height attributes) to do it in python, preferably with some library?

Thanks!

解决方案

I suggest You to take a look at Ghost - webkit web client written in python. It has JavaScript support so you can easily call JavaScript functions and get its return value. Example shows how to find out google text box width:

>>> from ghost import Ghost
>>> ghost = Ghost()
>>> ghost.open('https://google.lt')
>>> width, resources = ghost.evaluate("document.getElementById('gbqfq').offsetWidth;")
>>> width
541.0  # google text box width 541px

这篇关于如何使用python以编程方式测量HTML源代码中元素的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆