beautifulsoup,查找文本“价格”TH,然后获取从明年价格第 [英] beautifulsoup, Find th with text 'price', then get price from next th

查看:217
本文介绍了beautifulsoup,查找文本“价格”TH,然后获取从明年价格第的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的HTML如下:

<td>
   <table ..>
      <tr>
         <th ..>price</th>
         <th>$99.99</th>
      </tr>
   </table>
</td>

所以我在当前的表格单元格,我将如何得到99.99价值?

So I am in the current table cell, how would I get the 99.99 value?

我到目前为止有:

td[3].findChild('th')

不过,我需要做的:

But I need to do:

第查找文本价格,进而获得下一个个标签的字符串值。

推荐答案

想想看在台阶......鉴于部分 X 是子树的根你考虑,

Think about it in "steps"... given that some x is the root of the subtree you're considering,

x.findAll(text='price')

是所有项目中,包含子树文价格列表。这些项目的父母那当然将是:

is the list of all items in that subtree containing text 'price'. The parents of those items then of course will be:

[t.parent for t in x.findAll(text='price')]

如果你只是想保留那些名(标签)为'日',那么当然

[t.parent for t in x.findAll(text='price') if t.parent.name=='th']

和你想的那些下的兄弟姐妹(但只有当他们也'日' S),所以

and you want the "next siblings" of those (but only if they're also 'th's), so

[t.parent.nextSibling for t in x.findAll(text='price')
 if t.parent.name=='th' and t.parent.nextSibling and t.parent.nextSibling.name=='th']

下面您看到使用列表COM prehension问题:过多重复,因为我们不能将中间结果简单的名称。因此,让我们切换到很老的环...

Here you see the problem with using a list comprehension: too much repetition, since we can't assign intermediate results to simple names. Let's therefore switch to a good old loop...:

修改:文字的母公司之间的串和下一个兄弟,以及为后者容加宽容一个 D 代替,每OP的评论。

Edit: added tolerance for a string of text between the parent th and the "next sibling" as well as tolerance for the latter being a td instead, per OP's comment.

for t in x.findAll(text='price'):
  p = t.parent
  if p.name != 'th': continue
  ns = p.nextSibling
  if ns and not ns.name: ns = ns.nextSibling
  if not ns or ns.name not in ('td', 'th'): continue
  print ns.string

我已经添加了 ns.string ,将给接下来的兄弟姐妹的内容,当且仅当他们只是文本(没有进一步的嵌套标签) - 当然您可以进一步在这一点上,而不是analize,取决于你的应用程序的需要 - !)。同样的,我想你不会做只是打印,但聪明的东西,但我给你的结构。

I've added ns.string, that will give the next sibling's contents if and only if they're just text (no further nested tags) -- of course you can instead analize further at this point, depends on your application's needs!-). Similarly, I imagine you won't be doing just print but something smarter, but I'm giving you the structure.

在谈到结构,即两次,我使用的通知如果...:继续:相对于反转的替代这减少筑巢如果的条件和缩进回路中的所有下面的语句 - 和扁平比嵌套的更好是在Python的禅(导入此在一个交互式提示一饱眼福和冥想; - )

Talking about the structure, notice that twice I use if...: continue: this reduces nesting compared to the alternative of inverting the if's condition and indenting all the following statements in the loop -- and "flat is better than nested" is one of the koans in the Zen of Python (import this at an interactive prompt to see them all and meditate;-).

这篇关于beautifulsoup,查找文本“价格”TH,然后获取从明年价格第的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆