BeautifulSoup-合并连续标签 [英] BeautifulSoup - combine consecutive tags

查看：183 发布时间：2020/9/20 6:25:14 python html beautifulsoup

本文介绍了BeautifulSoup-合并连续标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我必须使用最混乱的HTML，其中将各个单词拆分为单独的标签，如以下示例所示:

I have to work with the messiest HTML where individual words are split into separate tags, like in the following example:

<b style="mso-bidi-font-weight:normal"><span style='font-size:14.0pt;mso-bidi-font-size:11.0pt;line-height:107%;font-family:"Times New Roman",serif;mso-fareast-font-family:"Times New Roman"'>I</span></b><b style="mso-bidi-font-weight:normal"><span style='font-family:"Times New Roman",serif;mso-fareast-font-family:"Times New Roman"'>NTRODUCTION</span></b>

这很难读，但是基本上"INTRODUCTION"一词被分成了

That's kind of hard to read, but basically the word "INTRODUCTION" is split into

<b><span>I</span></b>

和

<b><span>NTRODUCTION</span></b>

span和b标签具有相同的内联属性.

having the same inline properties for both span and b tags.

将这些结合起来的好方法是什么?我以为要遍历才能找到这样的连续b标签，但是我坚持如何合并连续b标签.

What's a good way to combine these? I figured I'd loop through to find consecutive b tags like this, but am stuck on how I'd go about merging the consecutive b tags.

for b in soup.findAll('b'):
    try:
       if b.next_sibling.name=='b':
       ## combine them here??
    except:
        pass

有什么想法吗?

预期的输出如下

<b style="mso-bidi-font-weight:normal"><span style='font-family:"Times New Roman",serif;mso-fareast-font-family:"Times New Roman"'>INTRODUCTION</span></b>

BeautifulSoup-合并连续标签 [英] BeautifulSoup - combine consecutive tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

BeautifulSoup-合并连续标签 [英] BeautifulSoup - combine consecutive tags

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭