如何使用Python获取HTML文件？ [英] How to get an HTML file using Python?

查看：778 发布时间：2018/6/15 10:25:57 python html webclient

本文介绍了如何使用Python获取HTML文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对Python不是很熟悉。我试图从下面的页面中提取艺术家的名字（用于开始:)）： http：// www。 infolanka.com/miyuru_gee/art/art.html 。

如何检索页面？我的两个主要问题是;使用哪些函数以及如何从页面中过滤掉无用链接？使用urlib和lxml.html的示例：

使用urlib和lxml.html的示例

 从lxml导入urllib 
导入html 
 
 url =http：//www.infolanka .com / miyuru_gee / art / art.html
 page = html.fromstring（urllib.urlopen（url）.read（））
 
用于链接page.xpath（// a）：
 printName，link.text，URL，link.get（href）
 
输出>> 
 ['Aathma Liyanage'，'athma.html'），
（'Abewardhana Balasuriya'，'abewardhana.html'），
（'Aelian Thilakeratne'，'aelian_thi.html' ），
（'Ahamed Mohideen'，'ahamed.html'），
]

I am not very familiar with Python. I am trying to extract the artist names (for a start :)) from the following page: http://www.infolanka.com/miyuru_gee/art/art.html.

How do I retrieve the page? My two main concerns are; what functions to use and how to filter out useless links from the page?

解决方案

Example using urlib and lxml.html:

import urllib
from lxml import html

url = "http://www.infolanka.com/miyuru_gee/art/art.html"
page = html.fromstring(urllib.urlopen(url).read())

for link in page.xpath("//a"):
    print "Name", link.text, "URL", link.get("href")

output >>
    [('Aathma Liyanage', 'athma.html'),
     ('Abewardhana Balasuriya', 'abewardhana.html'),
     ('Aelian Thilakeratne', 'aelian_thi.html'),
     ('Ahamed Mohideen', 'ahamed.html'),
    ]

这篇关于如何使用Python获取HTML文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Python获取HTML文件？ [英] How to get an HTML file using Python?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用Python获取HTML文件？ [英] How to get an HTML file using Python?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭