如何从分裂树的HTML标签 [英] How to split the tags from html tree

查看：154 发布时间：2016/8/5 19:11:14 python beautifulsoup lxml

本文介绍了如何从分裂树的HTML标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的HTML树

 <li class="taf"><h3><a href="26eOfferCode%3DGSONESTP-----------" id="pa1">
    Citibank <b>Credit Card</b> - Save over 5% on fuel | Citibank.co.in</a>
   </h3>Get the IndianOil Citibank <b>Card</b>. Apply Now! 
   <br />
   <a href="e%253DGOOGLE ------">Get 10X Rewards On Shopping</a> -
   <a href="S%2526eOfferCode%253DGSCCSLEX ------">Save Over 5% On Fuel</a>
   <br />
   <cite>www.citibank.co.in/<b>CreditCards</b></cite>
</li>

从这个网站，我需要提取beforeth＆LT线; BR>标签

From this html i need to extract the lines beforeth of < br > tag

行1：获得印度石油公司花旗银行卡。现在申请！

line1 : Get the IndianOil Citibank Card. Apply Now!

2号线：获得奖励10X安商场 - 节省超过5％的燃油

line2 : Get 10X Rewards On Shopping - Save Over 5% On Fuel

它是如何应该在Python呢？

how it would supposed to do in python?

推荐答案

我觉得你刚才问的前行每个＆LT; BR /＆GT;

I think you just asked for the line before each <br/>.

这下code会为你做它所提供的样品，通过分拆出来的＆LT; B＆GT; 和＆LT; A＆GT; 标签和打印每个元素，其的 .tail 以下同胞是一个＆LT; BR /方式＆gt;

This following code will do it for the sample you've provided, by striping out the <b> and <a> tags and printing the .tail of each element whose following-sibling is a <br/>.

from lxml import etree

doc = etree.HTML("""
<li class="taf"><h3><a href="26eOfferCode%3DGSONESTP-----------" id="pa1">
    Citibank <b>Credit Card</b> - Save over 5% on fuel | Citibank.co.in</a>
   </h3>Get the IndianOil Citibank <b>Card</b>. Apply Now! 
   <br />
   <a href="e%253DGOOGLE ------">Get 10X Rewards On Shopping</a> -
   <a href="S%2526eOfferCode%253DGSCCSLEX ------">Save Over 5% On Fuel</a>
   <br />
   <cite>www.citibank.co.in/<b>CreditCards</b></cite>
</li>""")

etree.strip_tags(doc,'a','b')

for element in doc.xpath('//*[following-sibling::*[name()="br"]]'):
  print repr(element.tail.strip())

收益率：

'Get the IndianOil Citibank Card. Apply Now!'
'Get 10X Rewards On Shopping -\n   Save Over 5% On Fuel'

这篇关于如何从分裂树的HTML标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从分裂树的HTML标签 [英] How to split the tags from html tree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从分裂树的HTML标签 [英] How to split the tags from html tree

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭