通过Java递归下载远程HTTP目录 [英] Recursively downloading a remote HTTP directory through java

查看:54
本文介绍了通过Java递归下载远程HTTP目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个通过HTTP将远程目录(例如:"https://server.net/production/current/" )下载到本地文件夹的功能.我没有对远程目录的控制权,所以我不能只创建一个方便的tar球.我能够找到许多与检索单个文件有关的问题,但找不到与用例匹配的问题.

I want to create a function to download a remote directory (Ex: "https://server.net/production/current/") via HTTP to a local folder. I don't have control over the remote directory so I can't just create a convenient tar ball. I was able to find lots of questions related to retrieving individual files, but I couldn't find one that matched my use case.

为了让您大致了解我所指的内容,以下是该目录在浏览器中的示例.

To give you an idea of what I am referring to, here is a sample of what the directory looks like in browser.

换句话说,我想创建一个与此 wget 等效的函数,其中 Y 是本地目标文件夹,而 X 是远程目录检索.我会直接致电 wget ,但我想要一个无需额外设置即可在Windows上运行的跨平台解决方案.

In other words I want to create a function equivalent to this wget where Y is the local destination folder and X is the remote directory to retrieve. I would call wget directly, but I want a cross-platform solution that will work on windows without additional setup.

wget -r -np -R "index.html*" -P Y X

最终目标是一个Java函数,如下所示.

The end goal is a java function like the one shown below.

/**
 * Recursively downloads all of the files in a remote HTTPS directory to the local destination
 * folder.
 * @param remoteFolder a folder URL (Ex: "https://server.net/production/current/")
 * @param destination a local folder (Ex: "C:\Users\Home\project\production")
 */
public static void downloadDirectory(String remoteFolder, String destination) {}

它可以假定远程目录中没有循环依赖项,并且目标文件夹存在并且为空.

推荐答案

我希望在 java.io 或Apache commons-io 中有一些神奇的功能或最佳实践.代码>来执行此操作,但是由于听起来好像不存在,所以我编写了自己的版本,该版本手动通过html页面并遵循链接.

I was hoping there was some magic function or best practice in java.io or maybe Apache commons-io to do this, but since it sounds like none exists I wrote my own version that manually goes through the html page and follows links.

如果其他人有相同的问题或知道改善我的版本的方法,我将在此处保留此答案.

I'm just going to leave this answer here in case someone else has the same question or someone knows a way to improve my version.

import org.apache.commons.io.FileUtils;

private static final Pattern HREF_PATTERN = Pattern.compile("href=\"(.*?)\"");

/**
 * Recursively downloads all of the files in a remote HTTPS directory to a local
 * destination folder. This implementation requires that the destination string
 * ends in a file delimiter. If you don't know if it does, append "/" to the end
 * just to be safe.
 * 
 * @param src remote folder URL (Ex: "https://server.net/production/current/")
 * @param dst local folder to copy into (Ex: "C:\Users\Home\project\production\")
 */
public static void downloadDirectory(String src, String dst) throws IOException {
    Scanner out = new Scanner(new URL(src).openStream(), "UTF-8").useDelimiter("\n");
    List<String> hrefs = new ArrayList<>(8);

    while (out.hasNext()) {
        Matcher match = HREF_PATTERN.matcher(out.next());

        if (match.find())
            hrefs.add(match.group(1));
    }

    out.close();

    for (String next : hrefs) {
        if (next.equals("../"))
            continue;

        if (next.endsWith("/"))
            copyURLToDirectory(src + next, dst + next);
        else
            FileUtils.copyURLToFile(new URL(src + next), new File(dst + next));
    }
}

这篇关于通过Java递归下载远程HTTP目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆