从网页中提取元关键字? [英] Extract Meta Keywords From Webpage?

查看：107 发布时间：2020/4/26 9:35:50 python extract webpage keyword urllib

本文介绍了从网页中提取元关键字?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用Python从网页中提取meta关键字.我以为可以使用urllib或urllib2来完成此操作，但我不确定.有人有什么想法吗?

I need to extract the meta keywords from a web page using Python. I was thinking that this could be done using urllib or urllib2, but I'm not sure. Anyone have any ideas?

我在Windows XP上使用Python 2.6

I am using Python 2.6 on Windows XP

推荐答案

lxml 是我觉得它比BeautifulSoup更快，并且具有更好的功能，同时仍然相对易于使用.示例:

lxml is faster than BeautifulSoup (I think) and has much better functionality, while remaining relatively easy to use. Example:

52> from urllib import urlopen
53> from lxml import etree

54> f = urlopen( "http://www.google.com" ).read()
55> tree = etree.HTML( f )
61> m = tree.xpath( "//meta" )

62> for i in m:
..>     print etree.tostring( i )
..>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-2"/>

另一个例子.

75> f = urlopen( "http://www.w3schools.com/XPath/xpath_syntax.asp" ).read()
76> tree = etree.HTML( f )
85> tree.xpath( "//meta[@name='Keywords']" )[0].get("content")
85> "xml,tutorial,html,dhtml,css,xsl,xhtml,javascript,asp,ado,vbscript,dom,sql,colors,soap,php,authoring,programming,training,learning,b
eginner's guide,primer,lessons,school,howto,reference,examples,samples,source code,tags,demos,tips,links,FAQ,tag list,forms,frames,color table,w3c,cascading
 style sheets,active server pages,dynamic html,internet,database,development,Web building,Webmaster,html guide"

顺便说一句: XPath 值得了解.

BTW: XPath is worth knowing.

另一个

或者，您也可以使用regexp:

Alternatively, you can just use regexp:

87> f = urlopen( "http://www.w3schools.com/XPath/xpath_syntax.asp" ).read()
88> import re
101> re.search( "<meta name=\"Keywords\".*?content=\"([^\"]*)\"", f ).group( 1 )
101>"xml,tutorial,html,dhtml,css,xsl,xhtml,javascript,asp,ado,vbscript,dom,sql, ...etc...

...但是我发现它的可读性较差，而且更容易出错(但仅涉及标准模块，仍然可以放在一行上).

...but I find it less readable and more error prone (but involves only standard module and still fits on one line).

这篇关于从网页中提取元关键字?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从网页中提取元关键字? [英] Extract Meta Keywords From Webpage?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从网页中提取元关键字? [英] Extract Meta Keywords From Webpage?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭