Python的美丽的汤.content属性 [英] Python Beautiful Soup .content Property
问题描述
什么是BeautifulSoup的.content吗?我通过 crummy.com 的教程工作,我真的不明白.content做什么。我已经看了看论坛和我还没有看到任何答案。看着下面的code ....
What does BeautifulSoup's .content do? I am working through crummy.com's tutorial and I don't really understand what .content does. I have looked at the forums and I have not seen any answers. Looking at the code below....
from BeautifulSoup import BeautifulSoup
import re
doc = ['<html><head><title>Page title</title></head>',
'<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
'<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
'</html>']
soup = BeautifulSoup(''.join(doc))
print soup.contents[0].contents[0].contents[0].contents[0].name
我期望code打印出身体的最后一行,而不是...
I would expect the last line of the code to print out 'body' instead of...
File "pe_ratio.py", line 29, in <module>
print soup.contents[0].contents[0].contents[0].contents[0].name
File "C:\Python27\lib\BeautifulSoup.py", line 473, in __getattr__
raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr)
AttributeError: 'NavigableString' object has no attribute 'name'
时.content只关注HTML,头部和标题?如果,那么,为什么会这样?
Is .content only concerned with html, head and title? If, so why is that?
感谢您提前帮助。
推荐答案
这只是给你什么在的标签。让我用一个例子证明:
It just gives you whats inside the tag. Let me demonstrate with an example:
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
head = soup.head
print head.contents
以上code给我一个名单, [&LT;标题&gt;将睡鼠的故事&LT; /标题&GT;]
,因为多数民众赞成的在在头
标记。因此调用 [0]
会给你的第一个项目在列表中。
The above code gives me a list,[<title>The Dormouse's story</title>]
, because thats inside the head
tag. So calling [0]
would give you the first item in the list.
你得到一个错误的原因是因为 soup.contents [0] .contents [0] .contents [0] .contents [0]
返回的东西,没有进一步的标签(因此没有属性)。它返回页面标题
从code,因为第一个内容[0]
为您提供了HTML标签,第二个,为您提供了头
标记。第三个导致标题
标记,第四人给你的实际内容。所以,当你调用一个名称
就可以了,它没有标签给你。
The reason you get an error is because soup.contents[0].contents[0].contents[0].contents[0]
returns something with no further tags (therefore no attributes). It returns Page Title
from your code, because the first contents[0]
gives you the HTML tag, the second one, gives you the head
tag. The third one leads to the title
tag, and the fourth one gives you the actual content. So, when you call a name
on it, it has no tags to give you.
如果你想身体印刷,可以做到以下几点:
If you want the body printed, you can do the following:
soup = BeautifulSoup(''.join(doc))
print soup.body
如果你想体
使用内容
只,然后使用以下内容:
If you want body
using contents
only, then use the following:
soup = BeautifulSoup(''.join(doc))
print soup.contents[0].contents[1].name
它使用了你不会得到 [0]
作为指标,因为体
是第二个元素后头
。
You will not get it using [0]
as the index, because body
is the second element after head
.
这篇关于Python的美丽的汤.content属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!