在 Android 应用程序中抓取动态生成的 html [英] Scraping dynamically generated html inside Android app

查看:42
本文介绍了在 Android 应用程序中抓取动态生成的 html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在编写一个 Android 应用,其中包括使用来自我不拥有的网站的文本信息.另外,部分页面需要认证.

I am currently writing an Android app that, among other things, uses text information from websites which I do not own. In addition, some of the pages require authentification.

对于某些页面,我已经能够使用 BasicNameValuePairs 和带有关联对象的 HTTPClient 登录并检索 html 代码.

For some pages I have been able to log in and retrieve the html code using BasicNameValuePairs and an HTTPClient with its associated objects.

不幸的是,这些方法在没有运行浏览器(甚至 Android Webview)通常会运行的任何 javascript 函数的情况下检索网页源.我需要其中一些脚本正在检索的文本.

Unfortunately, these methods retrieve the webpage source without running any javascript functions that a browser (Android Webview even) would normally run. I need the text that some of these scripts are retrieving.

我已经完成了我的研究,但我发现的一切都是猜测&非常混乱.我现在可以忽略需要登录的页面.另外,我愿意发布任何可能对构建解决方案有用的代码;这是一个独立的项目.

I've done my research, but everything I've found is guesswork & extremely confusing. I'm okay with ignoring pages that require login for now. Also, I am willing to post any code that may be useful for constructing a solution; It is an independent project.

从 javascript 调用中抓取 html 结果的任何具体解决方案?一个例子绝对是一流的.

Any concrete solutions for scraping the html result from javascript calls? An example would be absolutely top-notch.

推荐答案

上述解决方案非常缓慢,并且将您限制为 1 个 url(好吧,不是真的,但我敢在您的用户不耐烦的情况下使用 Rhino 抓取 10 个 url等待结果).

The aforementioned solutions are very slow and restrict you to 1 url (well, not really, but I dare you to scrape 10 urls with Rhino while your user is impatiently waiting for results).

另一种方法是使用云抓取解决方案.您不会因为下载不使用的内容而浪费手机带宽.

An alternative is to use a cloud scraping solution. You get the benefit of not wasting phone bandwidth on downloading content you won't use.

试试这个解决方案:Bobik Java SDK

它使您能够在几秒钟内抓取数百个站点

It gives you the ability to scrape up to hundreds of sites in a matter of seconds

这篇关于在 Android 应用程序中抓取动态生成的 html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆