从网站获取数据 [英] get data from a website

查看:103
本文介绍了从网站获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能从网站上取得(获取)数据。



例如: - 我有一个网站说www.getfinancialdata.com



现在我想通过运行脚本/网址frm我的系统来抓取数据到这个网站,然后对数据进行排序并保存在电子表格中。



我为一个简单的网站做了这件事,我可以查看网页正文中的HTML内容(在查看源代码之后)
但是,当我查看源代码时,我的问题有点棘手,我看到它是DOM数据(没有简单的html内容),还有填充数据的jquery函数。
ow我可以从DOM获取数据(jquery)

解决方案

我已经成功使用 Selenium 来删除使用大量javascript的网站。如果它显示在浏览器中,您可以使用Selenium。它是Java,但是有一些绑定可以用你最喜欢的脚本语言来驱动它;我使用Python。



您可能还想查看无头浏览器,如 Crowbar PhantomJS 。我喜欢硒的事情是能够看着它驱动浏览器帮助我调试。另外还有一个Firefox插件(IDE),它可以生成一些基本代码,以便开始使用...您只需点击一下,它就会记录您所做的事情(该代码肯定会始终需要按摩/大量编辑,但这是有帮助的,当你正在学习如何做到这一点)。



请注意,这是一个令人惊讶的难题。特别是在大规模。网站很混乱,彼此不同,而且随着时间的推移而变化。这可以让你根据自己的态度来选择真气或挑战。


How can i scrap(get ) the data from a website.

Example :- I have a site say www.getfinancialdata.com

now i want to grab the data by running a script/url frm my system to this website and then

sorting the data and save in spreadsheet.

I have done this thing for a simple website where i can view the HTML content in the body of a web page (after i do view source code) But my problem is bit compex when i view the source i see it is the DOM data(no simple html content)there are jquery functions which populate the data . ow can i grab the data from DOM(Jquery)

解决方案

I've had success using Selenium to scrape sites that use a lot of javascript. If it shows up in a browser, you can get it with Selenium. It's Java but there are bindings to drive it from your favorite scripting language; I use Python.

You may also want to look into headless browsers like Crowbar and PhantomJS. The thing I like about selenium is that being able to watch it drive the browser helps my debugging. Also there is a Firefox plugin (the IDE) that can generate some basic code to get you started... you just click along and it'll record what you've done (that code will definitely always need massaging/massive editing, but it's helpful while you're learning how to do this).

Note that this is a surprisingly hard thing to do. Especially on a large scale. Websites are messy, they are different from one another, and they change over time. This makes scraping either infuriating or a fun challenge, depending on your attitude.

这篇关于从网站获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆