无头,脚本化的Firefox / Webkit的Linux? [英] Headless, scriptable Firefox/Webkit on linux?

查看:135
本文介绍了无头,脚本化的Firefox / Webkit的Linux?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我期望自动化一些网络交互,即定期从安全的网站下载文件。这基本上涉及到输入我的用户名/密码和导航到适当的URL。

我尝试了简单的Python脚本,然后更复杂的脚本,只有发现这个特定的网站使用一些令人讨厌的JavaScript和基于Flash的登录机制,使我的方法无用。

然后我尝试了HTMLUnit,但似乎也不想工作。我怀疑使用Flash是问题。



我不想再考虑这个问题了,所以我倾向于使用实际的浏览器脚本来登录并抓住我需要的文件。



要求是:


  • 在linux服务器上运行(即不运行X )。如果我真的需要有X我可以做到这一点,但我不会很高兴。

  • 可靠。我想开始这个事情,不要再想这个了。

  • 可以编写脚本。没有太复杂的东西,但我应该能够告诉浏览器采取的各种步骤和页面访问。



有什么好的X无脚本浏览器的工具包?你有没有尝试过这样的事情,如果有的话,你有任何智慧的话?解决方案

我做了与IE嵌入式浏览器相关的任务(虽然它是隐藏浏览器组件面板gui应用程序)。实际上,您可以使用任何布局引擎并切断输出逻辑。导航应该通过触发类似脚本的事件来完成。



您可以使用撬棍。这是Firefox的无头版本(Gecko引擎)。它将浏览器变成RESTful服务器,可以接受请求(抓取url)。所以它解析html,将其表示为DOM,等待所有执行脚本的延迟。

它在linux上工作。我想你可以很容易地使用JS和丰富的XULrunner功能扩展它。


I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL.

I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxious javascript and flash based mechanism for login, rendering my methods useless.

I then tried HTMLUnit, but that doesn't seem to want to work either. I suspect use of Flash is the issue.

I don't really want to think about it any more, so I'm leaning towards scripting an actual browser to log in and grab the file I need.

Requirements are:

  • Run on linux server (ie. no X running). If I really need to have X I can make that happen, but I won't be happy.
  • Be reliable. I want to start this thing and never think about it again.
  • Be scriptable. Nothing too sophisticated, but I should be able to tell the browser the various steps to take and pages to visit.

Are there any good toolkits for a headless, X-less scriptable browser? Have you tried something like this and if so do you have any words of wisdom?

解决方案

I did related task with IE embedded browser (although it was gui application with hidden browser component panel). Actually you can take any layout engine and cut output logic. Navigation is should be done via firing script-like events.

You can use Crowbar. It is headless version of firefox (Gecko engine). It turns browser into RESTful server that can accept requests ("fetch url"). So it parse html, represent it as DOM, wait defined delay for all script performed.

It works on linux. I suppose you can easily extend it for your goal using JS and rich XULrunner abilities.

这篇关于无头,脚本化的Firefox / Webkit的Linux?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆