如何使用 BeautifulSoup bs4 获取 HTML 标签的内部文本值? [英] How to get inner text value of an HTML tag with BeautifulSoup bs4?
本文介绍了如何使用 BeautifulSoup bs4 获取 HTML 标签的内部文本值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用 BeautifulSoup bs4 时,如何从 HTML 标签中获取文本?当我运行这一行时:
When using BeautifulSoup bs4, how to get text from inside a HTML tag? When I run this line:
oname = soup.find("title")
我得到这样的 title
标签:
I get the title
tag like this:
<title>page name</title>
现在我只想得到它的内部文本,page name
,没有标签.怎么做?
and now I want to get only the inner text of it, page name
, without tags. How to do that?
推荐答案
使用 .text 从标签中获取文本.
Use .text to get the text from the tag.
oname = soup.find("title")
oname.text
或者只是soup.title.text
In [4]: from bs4 import BeautifulSoup
In [5]: import requests
In [6]: r = requests.get("http://stackoverflow.com/questions/27934387/how-to-retrieve-information-inside-a-tag-with-python/27934403#27934387")
In [7]: BeautifulSoup(r.content).title.text
Out[7]: u'html - How to Retrieve information inside a tag with python - Stack Overflow'
要打开文件并使用文本作为名称,请像使用任何其他字符串一样简单地使用它:
To open a file and use the text as the name simple use it as you would any other string:
with open(oname.text, 'w') as f
这篇关于如何使用 BeautifulSoup bs4 获取 HTML 标签的内部文本值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文