.string和的.text BeautifulSoup的区别 [英] Difference between .string and .text BeautifulSoup
问题描述
我发现了一些奇怪的关于使用BeautifulSoup时,找不到任何文件来支持这个,所以我想在这里问了。
I noticed something odd about when working with BeautifulSoup and couldn't find any documentation to support this so I wanted to ask over here.
假设我们有一个标签,如这些,我们已经与BS解析:
Say we have a tags like these that we have parsed with BS:
<td>Some Table Data</td>
<td></td>
记录的方式来提取数据是汤.string
。然而,这种提取第二个是NoneType &LT; TD&GT;
标记。所以,我想 soup.text
(因为为什么不呢?)和我想它提取一个空字符串完全一样。
The official documented way to extract the data is soup.string
. However this extracted a NoneType for the second <td>
tag. So I tried soup.text
(because why not?) and it extracted an empty string exactly as I wanted.
不过,我找不到文档中此任何引用,我担心的东西是一种怀念。任何人都可以让我知道,如果这是可以接受的使用或将问题后引起?
However I couldn't find any reference to this in the documentation and am worried that something is a miss. Can anyone let me know if this is acceptable to use or will it cause problems later?
BTW,我从网页刮表数据,并意味着从数据创建CSV的,所以我真的需要空字符串,而不是NoneTypes。
BTW I am scraping table data from a web page and mean to create CSVs from the data so I do actually need empty strings rather than NoneTypes.
推荐答案
.string
在标签
键入对象返回 NavigableString
键入对象。在另一方面,的.text
得到所有的子字符串,并使用给定的分隔符返回连接在一起。的.text的返回类型为 UNI code
对象。
.string
on a Tag
type object returns a NavigableString
type object. On the other hand, .text
gets all the child strings and return concatenated using the given separator. Return type of .text is unicode
object.
从文档,A NavigableString
就像是一个Python 的Uni code
字符串,除了它也支持一些在的Navigating~~V树并的搜索树。
From the documentation, A NavigableString
is just like a Python Unicode
string, except that it also supports some of the features described in Navigating the tree and Searching the tree.
从上 .string $ c中的文档 $ C>,我们可以看到,如果HTML是这样的,
From the documentation on .string
, we can see that, If the html is like this,
<td>Some Table Data</td>
<td></td>
然后, .string
第二TD将返回无
。
但的.text
将返回空字符串,它是一个 UNI code
键入对象。
Then, .string
on the second td will return None
.
But .text
will return and empty string which is a unicode
type object.
有关更多的便利,
string
- A
标签来获得这个标签内的单串的简便属性。
- 如果在
标记
有一个单一的串儿则返回值是字符串。 - 如果在
标记
没有孩子或一个以上的孩子则返回值无
- 如果这个
标记
有一个子标签返回的值是子标签的字符串属性,递归。 - Convenience property of a
tag
to get the single string within this tag. - If the
tag
has a single string child then the return value is that string. - If the
tag
has no children or more than one child the return value isNone
- If this
tag
has one child tag return value is the 'string' attribute of the child tag, recursively. - 获取所有的子字符串,并使用给定的分隔符返回连接在一起。
和文本
如果在 HTML
是这样的:
<td>some text</td>
<td></td>
<td><p>more text</p></td>
<td>even <p>more text</p></td>
.string
上的四个 D
将返回,
some text
None
more text
None
的.text
将给出这样的结果是,
some text
more text
even more text
这篇关于.string和的.text BeautifulSoup的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!