为什么BeautifulSoup .children包含无名元素以及预期的标记 [英] Why does BeautifulSoup .children contain nameless elements as well as the expected tag(s)
本文介绍了为什么BeautifulSoup .children包含无名元素以及预期的标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
#!/usr/bin/env python3
from bs4 import BeautifulSoup
test="""<!DOCTYPE html>
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
<title>Test</title>
</head>
<body>
<table>
<tbody>
<tr>
<td>
<div>
<b>
Icon
</b>
</div>
</td>
</tr>
</tbody>
</table>
</body>
</html>"""
soup = BeautifulSoup(test2)
rows = soup.findAll('tr')
for r in rows:
print(r.name)
for c in r.children:
print('>', c.name)
输出
tr
> None
> td
> None
为什么该行的子级列表中没有匿名元素?
在Windows 8上使用html.parser
(这是Python的内置版本)在64位Python 3.3.1上运行时会发生这种情况.
Why are there nameless elements in the list of the row's children?
This occurs running Python 3.3.1 64-bit on Windows 8, with html.parser
(that's Python's built-in one).
推荐答案
.children
的元素可以是 NavigableStrings 以及标签一个>.在您的示例中,它们是td
元素前后的空白.
The elements of .children
can be NavigableStrings as well as Tags. In the case of your example, they're the whitespace before and after the td
element.
您的代码的这种变化希望可以清楚地说明:
This variation on your code hopefully makes it clear:
>>> rows = soup.findAll('tr')
>>> for r in rows:
... print("row:", r.name)
... for c in r.children:
... print("---")
... print(type(c))
... print(repr(c))
...
row: tr
---
<class 'bs4.element.NavigableString'>
'\n'
---
<class 'bs4.element.Tag'>
<td>
<div>
<b>
Icon
</b>
</div>
</td>
---
<class 'bs4.element.NavigableString'>
'\n'
这篇关于为什么BeautifulSoup .children包含无名元素以及预期的标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文