.string和的.text BeautifulSoup的区别 [英] Difference between .string and .text BeautifulSoup

查看:697
本文介绍了.string和的.text BeautifulSoup的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一些奇怪的关于使用BeautifulSoup时,找不到任何文件来支持这个,所以我想在这里问了。

I noticed something odd about when working with BeautifulSoup and couldn't find any documentation to support this so I wanted to ask over here.

假设我们有一个标签,如这些,我们已经与BS解析:

Say we have a tags like these that we have parsed with BS:

<td>Some Table Data</td>
<td></td>

记录的方式来提取数据是汤.string 。然而,这种提取第二个是NoneType &LT; TD&GT; 标记。所以,我想 soup.text (因为为什么不呢?)和我想它提取一个空字符串完全一样。

The official documented way to extract the data is soup.string. However this extracted a NoneType for the second <td> tag. So I tried soup.text (because why not?) and it extracted an empty string exactly as I wanted.

不过,我找不到文档中此任何引用,我担心的东西是一种怀念。任何人都可以让我知道,如果这是可以接受的使用或将问题后引起?

However I couldn't find any reference to this in the documentation and am worried that something is a miss. Can anyone let me know if this is acceptable to use or will it cause problems later?

BTW,我从网页刮表数据,并意味着从数据创建CSV的,所以我真的需要空字符串,而不是NoneTypes。

BTW I am scraping table data from a web page and mean to create CSVs from the data so I do actually need empty strings rather than NoneTypes.

推荐答案

.string 标签键入对象返回 NavigableString 键入对象。在另一方面,的.text 得到所有的子字符串,并使用给定的分隔符返回连接在一起。的.text的返回类型为 UNI code 对象。

.string on a Tag type object returns a NavigableString type object. On the other hand, .text gets all the child strings and return concatenated using the given separator. Return type of .text is unicode object.

文档,A NavigableString 就像是一个Python 的Uni code 字符串,除了它也支持一些在的Navigating~~V树并的搜索树

From the documentation, A NavigableString is just like a Python Unicode string, except that it also supports some of the features described in Navigating the tree and Searching the tree.

从上 .string 文档 $ C>,我们可以看到,如果HTML是这样的,

From the documentation on .string, we can see that, If the html is like this,

<td>Some Table Data</td>
<td></td>

然后, .string 第二TD将返回
的.text 将返回空字符串,它是一个 UNI code 键入对象。

Then, .string on the second td will return None. But .text will return and empty string which is a unicode type object.

有关更多的便利,

string


  • A 标签来获得这个标签内的单串的简便属性。

  • 如果在标记有一个单一的串儿则返回值是字符串。

  • 如果在标记没有孩子或一个以上的孩子则返回值

  • 如果这个标记有一个子标签返回的值是子标签的字符串属性,递归。

    • Convenience property of a tag to get the single string within this tag.
    • If the tag has a single string child then the return value is that string.
    • If the tag has no children or more than one child the return value is None
    • If this tag has one child tag return value is the 'string' attribute of the child tag, recursively.
    • 文本


      • 获取所有的子字符串,并使用给定的分隔符返回连接在一起。

      如果在 HTML 是这样的:

      <td>some text</td>
      <td></td>
      <td><p>more text</p></td>
      <td>even <p>more text</p></td>
      

      .string 上的四个 D 将返回,

      some text
      None
      more text
      None
      

      的.text 将给出这样的结果是,

      some text
      
      more text
      even more text
      

      这篇关于.string和的.text BeautifulSoup的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆