用scrapy进行数据抓取 [英] Data scraping with scrapy

查看:172
本文介绍了用scrapy进行数据抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想制作一个新的投注工具,但我需要一个赔率和结果的数据库,但在网络上找不到任何东西。我发现这个网站有很好的存档: OddsPortal

i want to make a new betting tool, but i need a database of odds and results and can't find anything in the web. I found this site that has great archive: OddsPortal

我想做的就是从上面的页面中抓取结果和赔率。我发现一个名为Scrapy的工具可以做到,是真的吗?有人可以帮我提一些提示吗?

All i want to do is scrape the results and the odds from page like the one above. I found that a tool called Scrapy can do it, is it true? Can someone help me with some hints?

推荐答案

我不知道Scrapy,但JSoup可以帮助你开始。

I don't know about Scrapy, but JSoup should help you get you started.

http://jsoup.org/

下载.jar文件。右键单击项目文件夹>属性> Java构建路径>库>添加外部jar>找到jar并单击它。

Download the .jar file. Right click your project folder > Properties > Java build path > libraries > add external jars > find the jar and click it.

这是一个不错的小HTML解析器

It's a nice little HTML parser

以下是一个例子。

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class HtmlParser {

public static void main(String[] args) throws IOException{
    String url = "http://stackoverflow.com/questions/16794913/data-scraping-with-scrapy";
    Document document = Jsoup.connect(url).get();

    String question = document.select(".question .postcell").text();

    System.out.println(question);
}

这将打印您的问题:P

This will print your question :P

右键单击此网页并点击inspect元素。

Right click this webpage and hit inspect element.

然后找到您想要的元素并将其类名(或ID)放入文档中.select()在这种情况下是.question .postcell的一部分。 (.parentClass .childClass)

Then find the element you want and put the name of it's class (or ID) in the document.select() part in this case ".question .postcell". (.parentClass .childClass)

如果你需要额外的帮助,jsoup网站上有一个指南

If you need extra help there is a guide on the jsoup website

希望这有帮助!

编辑:

我登上了所以我把所有意大利的东西扔到了一起在2003-2004年的足球比分,享受:D - link - http:// www.oddsportal.com/soccer/italy/serie-a-2003-2004/results/

I was board so I threw together a little something that fetches all the Italy soccer scores in the 2003-2004 year, enjoy :D - link - http://www.oddsportal.com/soccer/italy/serie-a-2003-2004/results/

public class HtmlParser {

    String url = "http://www.oddsportal.com/soccer/italy/serie-a-2003-2004/results/";
    Document document = Jsoup.connect(url).get();

    Elements stats = document.select("#tournamentTable tbody tr");

    for(int i = 0; i < stats.size(); i++){
        System.out.println(stats.get(i).text());
    }
}

Output:

Soccer» Italy»Serie A 2003/2004

1 X 2 B's


AC Milan - Brescia 4:2 - - - 6

Chievo - Bologna 2:1 - - - 5

Empoli - Inter 2:3 - - - 5

Parma - Udinese 4:3 - - - 5

Lazio - Modena 2:1 - - - 4

Lecce - Reggina 2:1 - - - 5

Perugia - Ancona 1:0 - - - 1

Sampdoria - AS Roma 0:0 - - - 4

Siena - Juventus 1:3 - - - 5

1 X 2 B's



Ancona - Empoli 2:1 - - - 1

AS Roma - Perugia 1:3 - - - 3

Bologna - Lecce 1:1 - - - 7

Brescia - Lazio 2:1 - - - 1

Inter - Parma 1:0 - - - 7

Juventus - Sampdoria 2:0 - - - 7

Modena - Siena 1:3 - - - 7

Reggina - AC Milan 2:1 - - - 1

Udinese - Chievo 1:1 - - - 3

1 X 2 B's



AC Milan - AS Roma 1:0 - - - 6

Parma - Ancona 3:1 - - - 3

Lazio - Reggina 1:1 - - - 6

Lecce - Inter 2:1 - - - 6

Perugia - Juventus 1:0 - - - 4

Sampdoria - Udinese 1:3 - - - 5

Siena - Brescia 0:1 - - - 3

1 X 2 B's



Ancona - Chievo 0:2 - - - 3

AS Roma - Empoli 3:0 - - - 6

Inter - Lazio 0:0 - - - 6

Juventus - Lecce 3:4 - - - 6

Modena - Sampdoria 1:0 - - - 5

Reggina - Parma 1:1 - - - 5

Udinese - AC Milan 0:0 - - - 6

1 X 2 B's



Lazio - AS Roma 1:1 - - - 7

1 X 2 B's



Modena - AS Roma 0:1 - - - 6

Chievo - Reggina 0:0 - - - 4

Empoli - Brescia 1:1 - - - 5

Parma - Juventus 2:2 - - - 6

Inter - Bologna 4:2 - - - 6

Lazio - Ancona 4:2 - - - 5

Sampdoria - Perugia 3:2 - - - 6

1 X 2 B's



Lecce - Udinese 2:1 - - - 6

Siena - AC Milan 1:2 - - - 5

1 X 2 B's



Perugia - Inter 2:3 - - - 7

1 X 2 B's



Juventus - Lazio 1:0 - - - 7

AC Milan - Empoli 1:0 - - - 7

Ancona - Bologna 3:2 - - - 7

AS Roma - Chievo 3:1 - - - 7

Brescia - Modena 0:0 - - - 7

Reggina - Udinese 0:1 - - - 7



Siena - Sampdoria 0:0 - - - 7

太酷了!

这篇关于用scrapy进行数据抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆