计算平均身高和div标签的平均宽度 [英] compute the average height and the average width of div tag
问题描述
我需要得到一个HTML文档的平均股利高度和宽度。
I have need to get the average div height and width of an html doc.
我也尝试这种解决方案,但它不工作:
I have try this solution but it doesn't work:
import numpy as np
average_width = np.mean([div.attrs['width'] for div in my_doc.get_div() if 'width' in div.attrs])
average_height = np.mean([div.attrs['height'] for div in my_doc.get_div() if 'height' in div.attrs])
print average_height,average_width
在 get_div
办法返回所有的名单通过DIV的beautifulSoup的 find_all
法
the get_div
method return the list of all div retrieved by the find_all
method of beautifulSoup
下面是一个例子:
print my_doc.get_div()[1]
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>
当我得到的属性,它完美
when i get the attributes, it works perfectly
print my_doc.get_div()[1].attrs
{u'style': u'position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;'}
但是当我试图获取值
but when i try to get the value
print my_doc.get_div()[1].attrs['width']
我得到一个错误:
I get an error :
KeyError: 'width'
但我不理解,因为当我检查的类型:
but i don't understand because when i check the type :
print type(my_doc.get_div()[1].attrs)
这是一本字典,&LT;键入'字典'&GT;
推荐答案
有可能是更好的办法 -
There may be better way-
路-1
下面是我的测试code提取的宽度和高度
Below is my tested code to extract width and height.
from bs4 import BeautifulSoup
html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>'''
soup = BeautifulSoup(html_doc,'html.parser')
my_att = [i.attrs['style'] for i in soup.find_all("div")]
dd = ''.join(my_att).split(";")
dd_cln= filter(None, dd)
dd_cln= [i.strip() for i in dd_cln ]
my_dict = dict(i.split(':') for i in dd_cln)
print my_dict['width']
路-2
描述使用常规的前pression <一个href=\"http://stackoverflow.com/questions/9271365/how-to-pull-out-css-attributes-from-inline-styles-with-beautifulsoup\">here.
工作code -
Working code-
import numpy as np
import re
from bs4 import BeautifulSoup
html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>'''
soup = BeautifulSoup(html_doc,'html.parser')
my_att = [i.attrs['style'] for i in soup.find_all("div")]
css = ''.join(my_att)
print css
width_list = map(float,re.findall(r'(?<=width:)(\d+)(?=px;)', css))
height_list = map(float,re.findall(r'(?<=height:)(\d+)(?=px;)', css))
print np.mean(height_list)
print np.mean(width_list)
这篇关于计算平均身高和div标签的平均宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!