beautifulsoup,查找文本“价格”TH,然后获取从明年价格第 [英] beautifulsoup, Find th with text 'price', then get price from next th
问题描述
我的HTML如下:
<td>
<table ..>
<tr>
<th ..>price</th>
<th>$99.99</th>
</tr>
</table>
</td>
所以我在当前的表格单元格,我将如何得到99.99价值?
So I am in the current table cell, how would I get the 99.99 value?
我到目前为止有:
td[3].findChild('th')
不过,我需要做的:
But I need to do:
第查找文本价格,进而获得下一个个标签的字符串值。
推荐答案
想想看在台阶......鉴于部分 X
是子树的根你考虑,
Think about it in "steps"... given that some x
is the root of the subtree you're considering,
x.findAll(text='price')
是所有项目中,包含子树文价格
列表。这些项目的父母那当然将是:
is the list of all items in that subtree containing text 'price'
. The parents of those items then of course will be:
[t.parent for t in x.findAll(text='price')]
如果你只是想保留那些名(标签)为'日'
,那么当然
[t.parent for t in x.findAll(text='price') if t.parent.name=='th']
和你想的那些下的兄弟姐妹(但只有当他们也'日'
S),所以
and you want the "next siblings" of those (but only if they're also 'th'
s), so
[t.parent.nextSibling for t in x.findAll(text='price')
if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']
下面您看到使用列表COM prehension问题:过多重复,因为我们不能将中间结果简单的名称。因此,让我们切换到很老的环...
Here you see the problem with using a list comprehension: too much repetition, since we can't assign intermediate results to simple names. Let's therefore switch to a good old loop...:
修改:文字的母公司之间的串个
和下一个兄弟,以及为后者容加宽容一个 D
代替,每OP的评论。
Edit: added tolerance for a string of text between the parent th
and the "next sibling" as well as tolerance for the latter being a td
instead, per OP's comment.
for t in x.findAll(text='price'):
p = t.parent
if p.name != 'th': continue
ns = p.nextSibling
if ns and not ns.name: ns = ns.nextSibling
if not ns or ns.name not in ('td', 'th'): continue
print ns.string
我已经添加了 ns.string
,将给接下来的兄弟姐妹的内容,当且仅当他们只是文本(没有进一步的嵌套标签) - 当然您可以进一步在这一点上,而不是analize,取决于你的应用程序的需要 - !)。同样的,我想你不会做只是打印
,但聪明的东西,但我给你的结构。
I've added ns.string
, that will give the next sibling's contents if and only if they're just text (no further nested tags) -- of course you can instead analize further at this point, depends on your application's needs!-). Similarly, I imagine you won't be doing just print
but something smarter, but I'm giving you the structure.
在谈到结构,即两次,我使用的通知如果...:继续
:相对于反转的替代这减少筑巢如果
的条件和缩进回路中的所有下面的语句 - 和扁平比嵌套的更好是在Python的禅(导入此的koans之一code>在一个交互式提示一饱眼福和冥想; - )
Talking about the structure, notice that twice I use if...: continue
: this reduces nesting compared to the alternative of inverting the if
's condition and indenting all the following statements in the loop -- and "flat is better than nested" is one of the koans in the Zen of Python (import this
at an interactive prompt to see them all and meditate;-).
这篇关于beautifulsoup,查找文本“价格”TH,然后获取从明年价格第的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!