Selenium jSoup从Javascript网页获取数据 [英] Selenium jSoup get data from Javascript Webpage
问题描述
最近已经问了几个问题,但是还没有真正找到我想要的东西.
Have asked a few questions around this recently, but haven't really found what I'm looking for.
我正在尝试从 http://www获得所有匹配项.futbol24.com/Live/?__ igp = 1& LiveDate = 20141106 打印出来,包括时间,主队和客队.我了解页面加载后会加载内容.
I am trying to get all of the matches from http://www.futbol24.com/Live/?__igp=1&LiveDate=20141106 to print out, with time, home team and away team. I understand the content is loaded after the page is.
有人告诉我要使用Selenium,然后对结果使用jSoup来获取所需的数据.是否有人可以在上面的网站上给我看一些教程或一些示例代码?
I have been told to use Selenium and then use jSoup on the result to get the data I want. Does anybody have a tutorial or some sample code they could show me, for how to do it on the website above?
任何例子,将不胜感激,谢谢
Any examples would be greatly appreciated, thanks
推荐答案
如果您要抓取/数据挖掘某人的站点,请注意以下事项:
If you are going to scrape / datamine someone's site, here are some considerations:
- 获得网站所有者的许可!如果您不这样做,您会生气,并在最好的情况下将其列入黑名单,或者在最坏的情况下被提起诉讼.
- 了解该网站是否公开了 api .这始终是刮取网站的更好方法.
- 更适合此任务的研究工具/库.其中一些包括 curl , wget , httpbuilder ,.....根据您的舒适度/知识水平,您可能需要研究以下技术: http ,休息,.....
- selenium 的问题是用于浏览器的功能测试库应用程序,这使该任务成为可怜的选择.
- Get permission from the site's owner! If you do not, you will piss off the owner and get blacklisted in the best case, or be served with a lawsuit in the worst case.
- Find out if the site exposes an api. This is always the better way of scraping a site.
- Research tools / libraries that are more appropriate for this task. Some of these include curl, wget, httpbuilder, ..... Depending on your level of comfort / knowledge, you may need to research the underlying technologies: http, rest, .....
- selenium is a functional test library for browser applications, which makes it a poor choice for this task.
PS:我完全希望对此事能引起谴责/关闭,因为讨论/观点对于SO来说是题外话 >.
PS: I am fully expecting for this to get downvoted / closed, because discussions / opinions are off-topic for SO.
这篇关于Selenium jSoup从Javascript网页获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!