如何使用BeautifulSoup删除嵌套标签中的内容? [英] How to remove content in nested tags with BeautifulSoup?
问题描述
如何使用 所需的输出: 有些别的东西
您可以检查子级上的 输出; How to remove content in nested tags with I have tried Desired output: Something something something else
You can check for Output;
这篇关于如何使用BeautifulSoup删除嵌套标签中的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!BeautifulSoup
删除嵌套标签中的内容?这些帖子显示了在嵌套标签中检索内容的反向操作:如何使用BeautifulSoup 和
bs4.element.NavigableString
:from bs4 import BeautifulSoup as bs
import bs4
html = "<foo>Something something <bar> blah blah</bar> something <bar2>GONE!</bar2> else</foo>"
def get_only_text(elem):
for item in elem.children:
if isinstance(item,bs4.element.NavigableString):
yield item
print ''.join(get_only_text(bs(html).find_all('foo')[0]))
Something something something else
BeautifulSoup
? These posts showed the reverse to retrieve the content in nested tags: How to get contents of nested tag using BeautifulSoup, and BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?.text
but it only removes the tags>>> from bs4 import BeautifulSoup as bs
>>> html = "<foo>Something something <bar> blah blah</bar> something</foo>"
>>> bs(html).find_all('foo')[0]
<foo>Something something <bar> blah blah</bar> something else</foo>
>>> bs(html).find_all('foo')[0].text
u'Something something blah blah something else'
bs4.element.NavigableString
on children:from bs4 import BeautifulSoup as bs
import bs4
html = "<foo>Something something <bar> blah blah</bar> something <bar2>GONE!</bar2> else</foo>"
def get_only_text(elem):
for item in elem.children:
if isinstance(item,bs4.element.NavigableString):
yield item
print ''.join(get_only_text(bs(html).find_all('foo')[0]))
Something something something else