使用python网页抓取数据？ [英] Web Scraping data using python?

查看：207 发布时间：2016/8/5 19:06:51 python html web-scraping beautifulsoup

本文介绍了使用python网页抓取数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚开始学习网页使用Python刮。不过，我已经遇到了一些问题。

I just started learning web scraping using Python. However, I've already ran into some problems.

我的目标是网络报废不同种类的金枪鱼的名字从fishbase.org（http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=salmon）

My goal is to web scrap the names of the different tuna species from fishbase.org (http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=salmon)

问题：我不能提取所有物种的名字

The problem: I'm unable to extract all of the species names.

这是我迄今为止：

import urllib2
from bs4 import BeautifulSoup

fish_url = 'http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=Tuna'
page = urllib2.urlopen(fish_url)

soup = BeautifulSoup(html_doc)

spans = soup.find_all(

从这里，我不知道我怎么会去提取物种名称。我想过使用正则表达式（即 soup.find_all（A，文本= re.compile（\\ D + \\ S + \\ D +））捕获文本在标签内...

From here, I don't know how I would go about extracting the species names. I've thought of using regex (i.e. soup.find_all("a", text=re.compile("\d+\s+\d+")) to capture the texts inside the tag...

任何投入将是非常美联社preciated！

Any input will be highly appreciated!

推荐答案

jozek 建议是正确的做法，但我不能让他的片断的工作（但是这也许是因为我没有运行BeautifulSoup 4测试版）。什么工作对我来说是：

What jozek suggests is the correct approach, but I couldn't get his snippet to work (but that's maybe because I am not running the BeautifulSoup 4 beta). What worked for me was:

import urllib2
from BeautifulSoup import BeautifulSoup

fish_url = 'http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=Tuna'
page = urllib2.urlopen(fish_url)

soup = BeautifulSoup(page)

scientific_names = [it.text for it in soup.table.findAll('i')]

print scientific_names

这篇关于使用python网页抓取数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用python网页抓取数据？ [英] Web Scraping data using python?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用python网页抓取数据？ [英] Web Scraping data using python?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭