使用BeautifulSoup提取标题 [英] Extract title with BeautifulSoup
本文介绍了使用BeautifulSoup提取标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个
from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]
from bs4 import BeautifulSoup
raw = BeautifulSoup(html, 'html.parser').get_text()
raw.find_all('title', limit=1)
print (raw.find_all("title"))
'<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN'
我想使用BeautifulSoup提取页面标题,但出现此错误
I want to extract the title of the page using BeautifulSoup but getting this error
Traceback (most recent call last):
File "C:\Users\Passanova\AppData\Local\Programs\Python\Python35-32\test.py", line 8, in <module>
raw.find_all('title', limit=1)
AttributeError: 'str' object has no attribute 'find_all'
请提出任何建议
推荐答案
要浏览汤,您需要BeautifulSoup对象,而不是字符串.因此,删除对汤的get_text()
调用.
To navigate the soup, you need a BeautifulSoup object, not a string. So remove your get_text()
call to the soup.
此外,您可以将raw.find_all('title', limit=1)
替换为等效的find('title')
.
Moreover, you can replace raw.find_all('title', limit=1)
with find('title')
which is equivalent.
尝试一下:
from urllib import request
url = "http://www.bbc.co.uk/news/election-us-2016-35791008"
html = request.urlopen(url).read().decode('utf8')
html[:60]
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
title = soup.find('title')
print(title) # Prints the tag
print(title.string) # Prints the tag string content
这篇关于使用BeautifulSoup提取标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文