Python BS:获取具有和不具有颜色属性的行 [英] Python BS: Fetching rows with and without color attribute

查看:72
本文介绍了Python BS:获取具有和不具有颜色属性的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些看起来像这样的html(表示表中的数据行,即 tr /tr 之间的数据是表中的一行)

I have some html that looks like this (this represents rows of data in a table, i.e the data between tr and /tr is one row in a table)

<tr bgcolor="#f4f4f4">
<td height="25" nowrap="NOWRAP">&nbsp;CME_ES&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:46&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;Connected&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:00&nbsp;</td>
**<td height="25" nowrap="NOWRAP" bgcolor="#55aa2a">&nbsp;--:--:--&nbsp;</td>**
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;01:25:00 &nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp; 22:00:00&nbsp;</td>
</tr>
.
.
.
<tr bgcolor="#ffffff">
<td height="25" nowrap="NOWRAP">&nbsp;CME_NQ&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:46&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;Connected&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;191&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;07:58:01&nbsp;</td>
**<td height="25" nowrap="NOWRAP">&nbsp;--:--:--&nbsp;</td>**
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;0&nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp;01:25:00 &nbsp;</td>
<td height="25" nowrap="NOWRAP">&nbsp; 22:00:00&nbsp;</td>
</tr>

我有从每个行集中获取颜色的代码:

I have code that grabs the color from each row set:

mrkt_stat = []
for td in site.findAll('td'):
 if 'bgcolor' in td.attrs:
  mrkt_stat.append(td.attrs['bgcolor'])

问题是,当行集没有 bgcolor 属性时,没有数据添加到 mrkt_stat 列表中.

Issue is that when the row set has no bgcolor attribute, no data is added to mrkt_stat list.

我该如何抓取,以便即使某行没有bgcolor attr,它仍将以NULL或N/A的形式添加到列表中?

How do I scrape this so that even if a row has no bgcolor attr, it will still be added to the list as NULL or N/A?

知道bgcolor attr(可能存在或可能不存在)将始终出现在行的第9行中(无论该行是否具有attr)都很有用(请查看用**括起来的html行) )

It is useful to know that the bgcolor attr (that may or may not be present) will always appear in the 9th line of a row set whether that row has the attr or not (look at the html lines enclosed with **)

输出应如下所示(每行集第9行的所有颜色属性列表,如果不存在颜色属性,则显示'N/A'):

Output should look like the following (a list of all color attrs from row 9 of each row set and display 'N/A' if there is no color attr present):

['#55aa2a',...,'N/A'] 

推荐答案

我想出了解决方法,尽管方法相当长,但是仍然可以解决问题

I figured out how to solve this, albeit a rather long approach, but solves the issue nontheless

keys = []
for tr in site.find_all('br'):
    for td in site.find_all('tr'):
        if td in keys:
            pass
        else:
            keys.append(td)
del keys[:4]

for i in range(0, len(keys)):
    g = keys[i]
    color = []

    for line in g:
        color.append(line)

    del color[:17]

    check = []
    h = color[0]
    if 'bgcolor' in h.attrs:
        check.append(h['bgcolor'])
    else:
        check.append('N/A')

总结到h = color[0]行是将行集的第9行存储到变量h中,然后检查bgcolor是否在此标记的属性中.如果是,它将被添加到check列表中,否则将被添加到'N/A'

To summarize up to the line h = color[0] is where I store the 9th line of the row set into a variable h and then I check if the bgcolor is in the attributes of this tag. If it is, it gets added to the check list if not, then 'N/A' gets added instead

如果能弄清楚如何缩短这种方法的话,敬请原谅:)!

GREATLY APPRECIATED if one can figure out how to shorten this approach though :)!

这篇关于Python BS:获取具有和不具有颜色属性的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆