如何使用jsoup限制下载大小? [英] How to limit download size with jsoup?

查看:86
本文介绍了如何使用jsoup限制下载大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用JSoup限制下载的页面/链接的大小,具体如下所示(Scala代码):

I'm trying to limit the size of a downloaded page/link with JSoup, given something like the following (Scala code):

val document = Jsoup.connect(theURL).get();

val document = Jsoup.connect(theURL).get();

我只想获取给定页面的前几个KB,然后停止尝试下载超出该范围的文件.如果页面很大(或者theURL是不是html的链接,并且是大文件),我不想花时间下载其余页面.

I'd like to only get the first few KB of a given page, and stop trying to download beyond that. If there's a really large page (or theURL is a link that isn't html, and is a large file), I'd like to not have to spend time downloading the rest.

我的用例是IRC机器人的页面标题缠结.

My usecase is a page title snarfer for an IRC bot.

奖金问题:

有什么原因导致Jsoup.connect(theURL).timeout(3000).get();在大型文件上不超时?最终,如果有人粘贴了永无止境的音频流或大型ISO之类的东西(可以通过在另一个线程中获取URL标题(或使用Scala actor并在那里进行超时)来解决),则导致bot发出提示.当我认为timeout()应该能够达到相同的最终结果时,对于一个非常简单的bot来说似乎有点过头了.

Is there any reason why Jsoup.connect(theURL).timeout(3000).get(); isn't timing out on large files? It ends up causing the bot to ping out if someone pastes something like a never-ending audio stream or a large ISO (which can be solved by fetching URL titles in a different thread (or using Scala actors and timing out there), but that seems like overkill for a very simple bot when I think timeout() is supposed to accomplish the same end result).

推荐答案

现在,您可以使用maxBodySize()方法在版本1.7.2中限制最大正文大小. http://jsoup.org/apidocs/org/jsoup/Connection .Request.html#maxBodySize() 默认情况下限制为1MB,这将防止内存泄漏.

Now you can limit the max body size with version 1.7.2 using maxBodySize() method. http://jsoup.org/apidocs/org/jsoup/Connection.Request.html#maxBodySize() By default is limited to 1MB and this will prevent from memory leaks.

这篇关于如何使用jsoup限制下载大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆