如何以编程方式拍摄爬网网页的快照(在 Ruby 中)? [英] How to Programmatically take Snapshot of Crawled Webpages (in Ruby)?

查看:49
本文介绍了如何以编程方式拍摄爬网网页的快照(在 Ruby 中)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以编程方式拍摄网页快照的最佳解决方案是什么?

What is the best solution to programmatically take a snapshot of a webpage?

情况是这样的:我想抓取一堆网页并定期拍摄它们的缩略图快照,比如每隔几个月一次,而不必手动访问每个网页.我还希望能够拍摄可能完全是 Flash/Flex 的网站的 jpg/png 快照,因此我必须等到它加载后才能以某种方式拍摄快照.

The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.

如果我可以生成的缩略图数量没有限制(在合理范围内,比如每天 1000 个),那就太好了.

It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).

任何想法如何在 Ruby 中做到这一点?看起来很艰难.

Any ideas how to do this in Ruby? Seems pretty tough.

执行此操作的浏览器:Safari 或 Firefox,最好是 Safari.

Browsers to do this in: Safari or Firefox, preferably Safari.

非常感谢.

推荐答案

这实际上取决于您的操作系统.您需要的是一种连接到网络浏览器并将其保存为图像的方法.

This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.

如果您使用的是 Mac - 我想您最好的选择是使用 MacRuby(或RubyCocoa - 虽然我相信这将在不久的将来被弃用)然后使用 WebKit加载页面并将其呈现为图像的框架.

If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.

这绝对是可能的,你不妨看看狗仔队!webkit2png 项目.

This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.

另一个不依赖于操作系统的选项可能是使用 BrowserShots API.

Another option, which isn't dependent on the OS, might be to use the BrowserShots API.

这篇关于如何以编程方式拍摄爬网网页的快照(在 Ruby 中)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆