如何在请求中获取页面标题 [英] How to get page title in requests

查看:92
本文介绍了如何在请求中获取页面标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在请求"中获取页面标题的最简单方法是什么?

What would be the simplest way to get the title of a page in Requests?

r = requests.get('http://www.imdb.com/title/tt0108778/')
# ? r.title
Friends (TV Series 1994–2004) - IMDb

推荐答案

您需要一个HTML解析器来解析HTML响应并获取title标签的文本:

You need an HTML parser to parse the HTML response and get the title tag's text:

使用 lxml.html 的示例:

Example using lxml.html:

>>> import requests
>>> from lxml.html import fromstring
>>> r = requests.get('http://www.imdb.com/title/tt0108778/')
>>> tree = fromstring(r.content)
>>> tree.findtext('.//title')
u'Friends (TV Series 1994\u20132004) - IMDb'

当然还有其他选项,例如 mechanize 库:

There are certainly other options, like, for example, mechanize library:

>>> import mechanize
>>> br = mechanize.Browser()
>>> br.open('http://www.imdb.com/title/tt0108778/')
>>> br.title()
'Friends (TV Series 1994\xe2\x80\x932004) - IMDb'

选择哪种选项取决于您下一步要做的事情:解析页面以获取更多数据,或者可能要与之交互:单击按钮,提交表单,关注链接等.

What option to choose depends on what are you going to do next: parse the page to get more data, or, may be, you want to interact with it: click buttons, submit forms, follow links etc.

此外,您可能希望使用IMDB提供的API,而不是进行HTML解析,请参阅:

Besides, you may want to use an API provided by IMDB, instead of going down to HTML parsing, see:

  • Does IMDB provide an API?
  • IMDbPY

IMDbPY软件包的用法示例:

>>> from imdb import IMDb
>>> ia = IMDb()
>>> movie = ia.get_movie('0108778')
>>> movie['title']
u'Friends'
>>> movie['series years']
u'1994-2004'

这篇关于如何在请求中获取页面标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆