抓取由Javascript生成的网站 [英] Scrape web site generated by Javascript
问题描述
我认为这是一个非常具有挑战性的一个!
I think this is a real challenging one!
我为我的本地足球联赛写了一个网站www.rdyfl.co.uk,并包含javascript代码片段来自F.A的全时系统,我们生成我们的灯具,链接表格灯具最近的结果等。
I write a website for my local football league, www.rdyfl.co.uk , and include javascript code snippets from the F.A's Full-Time system where we generate our fixtures, linking in tables fixtures recent results etc.
我想要添加到我需要的网站的另一个功能为每个年龄组和部门刮掉即将到来的固定装置但是当我检查来源时我有两个问题。
For another feature I want to add to the site I need to scrape the 'Upcoming Fixtures' for each agegroup and division but when I examine the source I have two problems.
-
固定装置内容是由javascript生成的,因此我需要查看生成的源而不仅仅是源。
The fixtures content is generated by javascript and therefore I need to see the generated source and not just the source.
当我使用Firefox查看生成的源时,团队名称是实际上是更多的javascript链接而不是名称本身。
When I view the generated source using Firefox the team names are actually further javascript links and not the name itself.
我基本上想以某种方式定期下载灯具和然后写入一个mysql数据库?
I basically want to somehow download the fixtures on a regular basis and write then to a mysql database ?
我问过FA,他们没有更多的选项可以访问电子数据?
I have asked the F.A. and they have no more options available to access the data ?
之前从未编写过刮片编码可以让我指出一个简单的解决方案,还是有人喜欢挑战?
Having never coded for scraping before can anyone point me to a simple solution or does anyone fancy the challange?
推荐答案
并包含javascript代码片段
and include javascript code snippets
= >使用呈现Javascript的Web浏览器。这种方法适用于所有网站。
=> Use a web browser that renders the Javascript. This approach works will all websites.
您还可以对JS进行逆向工程并从中提取数据,但这只有在您需要来自数据的情况下才有意义。很少有网站或需要超高性能。否则工作太多。
You can also reverse engineer the JS and extract the data from it, but this makes only sense if you need to the data only from a very few websites or need super-high performance. Otherwise too much work.
基于浏览器的抓取的好解决方案是Watir,Watin,Selenium和iMacros。
Good solutions for browser-based scraping are Watir, Watin, Selenium and iMacros.
这篇关于抓取由Javascript生成的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!