用 Python 抓取? [英] Scraping with Python?

查看:53
本文介绍了用 Python 抓取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从这里获取所有索引词及其定义.是否可以使用 Python 抓取网页内容?

I'd like to grab all the index words and its definitions from here. Is it possible to scrape web content with Python?

Firebug 探索显示以下 URL 返回了我想要的内容,包括索引及其对a"的定义.

Firebug exploration shows the following URL returns my desirable contents including both index and its definition as to 'a'.

http://pali.hum.ku.dk/cgi-bin/cpd/pali?acti=xart&arid=14179&sphra=undefined

使用的模块是什么?有教程吗?

what are the modules used? Is there any tutorial available?

我不知道字典中索引了多少个单词.我绝对是编程的初学者.

I do not know how many words indexed in the dictionary. I`m absolute beginner in the programming.

推荐答案

你应该使用 urllib2 用于获取 URL 内容,BeautifulSoup 用于解析 HTML/XML.

You should use urllib2 for gettting the URL contents and BeautifulSoup for parsing the HTML/XML.

示例 - 从 StackOverflow.com 主页检索所有问题:

Example - retrieving all questions from the StackOverflow.com main page:

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen("http://stackoverflow.com")
soup = BeautifulSoup(page)

for incident in soup('h3'):
    print [i.decode('utf8') for i in incident.contents]
    print

此代码示例改编自 BeautifulSoup 文档.

这篇关于用 Python 抓取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆