从无限滚动页面(Facebook)检索HTML内容 [英] Retrieve HTML content from an infinite scroll page (Facebook)

查看:63
本文介绍了从无限滚动页面(Facebook)检索HTML内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从动态网页(例如公共Facebook页面)中检索HTML数据:>://www.facebook.com/bbcnews/(公开内容,无需登录)

I would like tor retrieve HTML data from a dynamic web page, like for example a public Facebook page: https://www.facebook.com/bbcnews/ (public content, without login)

例如,在此页面中,我们有无限的滚动,我们必须转到页面底部才能加载更多帖子.

For example, in this page, we have an infinite scroll, and we have to go at the bottom of the page to load more posts.

我当前的代码在这里:

URL url = new URL("https://www.facebook.com/bbcnews/");

BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
BufferedWriter writer = new BufferedWriter(new FileWriter("path"));

while ((line = reader.readLine()) != null) {
    writer.write(line);
}

此代码仅检索页面的第一部分.

This code retrieve only the first part of the page.

如何通过无限滚动来检索网页的更多内容?

How retrieve more content of the web page with the infinite scroll ?

谢谢.

推荐答案

您不会通过查看HTTP流的简单 BufferedReader 来获取该信息.打开浏览器控制台,然后到达页面末尾.您会看到向此URL触发了XHR调用(异步请求):

You won't get that through a simple BufferedReader looking at an HTTP stream. Open your browser console, then reach the end of the page. You'll see that an XHR call (asynchronous request) is fired toward this URL:

https://www.facebook.com/pages_reaction_units

带有很多的隐秘请求参数.您需要在Java代码中执行这种调用.由于某些原因,它被混淆了.从头开始完成它似乎不是一个好方法.

With a lot of cryptic request parameters. You'll need to perform this kind of call in your java code. It's obfuscated for some reasons. Getting it done from scratch doesn't seems to be a good approach.

最好使用 Facebook提供的API (也许 API图).

这篇关于从无限滚动页面(Facebook)检索HTML内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆