通过漂亮的汤蟒找到所有字体大小比最常见的跨度样式 [英] Find all the span styles with font size larger than the most common one via beautiful soup python

查看:183
本文介绍了通过漂亮的汤蟒找到所有字体大小比最常见的跨度样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我理解如何从这个问题中获取特定 div span 样式的文本:如何找到最多常见的跨度样式

I understand how to obtain the text from a specific div or span style from this question: How to find the most common span styles

现在难以找到所有字体大小比最常见的跨度样式?

Now the difficulty is trying to find all the span styles with font sizes larger than the most common one?

我怀疑我应该使用正则表达式,但首先我需要提取特定最常用的字体大小?

I suspect I should use regular expressions, but first I need to extract the specific most common font size?

另外,当条件是字符串时,如何确定大于?

Also, how do you determine "larger than" when the condition is a string?

推荐答案

这可以帮到你: -

This may help you:-

    from bs4 import BeautifulSoup
    import re

    usedFontSize = [] #list of all font number used

    #Find all the span contains style 
    spans = soup.find_all('span',style=True)
    for span in spans:
        #print span['style']
        styleTag = span['style']
        fontSize = re.findall("font-size:(\d+)px",styleTag)
        usedFontSize.append(int(fontSize[0]))

    #Find most commanly used font size
    from collections import Counter
    count = Counter(usedFontSize)
    #Print list of all the font size with it's accurence.
    print count.most_common()

这篇关于通过漂亮的汤蟒找到所有字体大小比最常见的跨度样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆