使用Java从Web提取数据 [英] Using Java to pull data from web

查看:194
本文介绍了使用Java从Web提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有一种方法可以使用java(eclipse)从网站中提取特定数据.例如,来自Yahoo Finances或Bloomberg的股票信息.我环顾四周并找到了一些资源,但是我无法使它们工作,也许我丢失了某些东西,或者它们已经过时了.如果可能的话,我也想避免下载任何外部资源,我已经阅读了JSoup,如果其他所有方法都失败了,我会更认真地考虑.

I was wondering if there is a way to pull specific data from a website using java (eclipse). For example, stock information from Yahoo Finances or from Bloomberg. I've looked around and have found some resources, but I haven't been able to get them to work, perhaps I'm missing something or they're outdated. If possible, I also want to avoid downloading any external resources, I've read up on JSoup and will consider it more seriously if all else fails.

感谢您的帮助.

推荐答案

答案是:是的,有很多方法可以从网站提取数据.

The answer is: yes there are many different ways to pull data from websites.

无论哪种编程语言(Java,.NET,Perl ...),基本上都有两种选择:

There are essentially 2 alternatives no matter the programming language (Java, .NET, Perl...):

  1. 该网站具有API:在这种情况下,它将是REST或SOAP API或自定义的API(REST和SOAP可能占绝大多数).查看该网站的API文档(如果有).另外,请访问可编程Web 以获得参考.
  2. 该网站没有API.然后,您需要执行此处所说的屏幕抓取操作.本质上,您将像浏览器一样发送一系列HTTP GET或HTTP POST请求.服务器回复包含HTML代码的响应.从那里开始,您需要解析" HTML以提取所需的信息.这将需要重型XPath(如果内容为XML)或正则表达式(如果内容为HTML或文本).
  1. the website has an API: in this case it will be a REST or SOAP API or perhaps a custom one (REST and SOAP probably account for the vast majority). Check out that website's API documentation if any. Also check out Programmable Web for references.
  2. the website doesn't have an API. You then need to do what you call here as screen-scraping. Essentially you will send a series of HTTP GET or HTTP POST requests as your browser would. The server replies with a response which contains HTML code. From there on, you need to "parse" the HTML to extract the information you need. This will require heavy duty XPath (if the content is XML) or regular expressions (if the content is HTML or text).

请查看 Apache HTTP组件,以使您入门.

如果您只想获取财务信息,Google会提供一个JSON/REST API,SO上有一个问题可以帮助您:

If all you want is Finance information, Google has a JSON/REST API for that and there's a question on SO that will help you: How can I get stock quotes using Google Finance API?.

Yahoo也有一个,因此在SO中也已经有一个问题: Yahoo Finance所有货币报价API文档

Yahoo also has one and there is also already an question on it in SO: Yahoo Finance All Currencies quote API Documentation

这篇关于使用Java从Web提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆