刮动态生成HTML里面的Andr​​oid应用 [英] Scraping dynamically generated html inside Android app

查看:116
本文介绍了刮动态生成HTML里面的Andr​​oid应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在写一个Android应用程序,除其他事项外,使用从中我没有自己网站的文本信息。此外,一些页面需要鉴别

I am currently writing an Android app that, among other things, uses text information from websites which I do not own. In addition, some of the pages require authentification.

在一些网页上我已经能够登录并使用BasicNameValuePairs检索HTML code和一个用了HTTPClient及其相关对象。

For some pages I have been able to log in and retrieve the html code using BasicNameValuePairs and an HTTPClient with its associated objects.

不幸的是,这些方法无需检索运行的浏览器(Android的web视图偶数)将正常运行任何JavaScript函数的网页源代码。我需要一些脚本的检索的文本。

Unfortunately, these methods retrieve the webpage source without running any javascript functions that a browser (Android Webview even) would normally run. I need the text that some of these scripts are retrieving.

我做我的研究,但一切我发现是猜测和放大器;极其混乱。我好与无视需要登录现在的页面。此外,我愿意张贴任何code,可能是构建一个解决方案,有效;它是一个独立的项目。

I've done my research, but everything I've found is guesswork & extremely confusing. I'm okay with ignoring pages that require login for now. Also, I am willing to post any code that may be useful for constructing a solution; It is an independent project.

刮从JavaScript调用的HTML结果中的任何具体的解决办法?一个例子是绝对一流的。

Any concrete solutions for scraping the html result from javascript calls? An example would be absolutely top-notch.

推荐答案

上述解决方案是非常缓慢的,并限制你1 URL(当然,不是真的,但我敢说,你凑10的URL与犀牛,而你的用户是不耐烦地等待结果)。

The aforementioned solutions are very slow and restrict you to 1 url (well, not really, but I dare you to scrape 10 urls with Rhino while your user is impatiently waiting for results).

这是替代方法是使用一个云刮溶液。你得到不浪费在下载你会不会使用内容的手机带宽的优势。

An alternative is to use a cloud scraping solution. You get the benefit of not wasting phone bandwidth on downloading content you won't use.

尝试这种解决方案: Bobik的Java SDK

它给你凑多达数百个网站在几秒钟的事的能力。

It gives you the ability to scrape up to hundreds of sites in a matter of seconds

这篇关于刮动态生成HTML里面的Andr​​oid应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆