从Java中S3上的文件在S3上创建一个zip文件 [英] Create a zip file on S3 from files on S3 in Java

查看:106
本文介绍了从Java中S3上的文件在S3上创建一个zip文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在S3上有很多文件需要压缩,然后通过S3提供压缩文件.目前,我将它们从流中压缩到本地文件,然后再次上传该文件.这会占用大量磁盘空间,因为每个文件大约有3-10MB,我必须压缩多达100.000个文件.因此,一个zip可以超过1TB.因此,我想按以下方式寻求解决方案:

I have a lot of files on S3 that I need to zip and then provide the zip via S3. Currently I zip them from stream to a local file and then upload the file again. This takes up a lot of disk space, as each file has around 3-10MB and I have to zip up to 100.000 files. So a zip can have more than 1TB. So I would like a solution just along this lines:

创建一个zip使用Lambda节点从S3上的文件中提取文件上的文件

在这里可以直接在S3上创建zip,而不会占用本地磁盘空间.但是我只是不够聪明,无法将上述解决方案转移到Java.我还在Java aws sdk上发现了相互矛盾的信息,说他们计划在2017年更改流的行为.

Here it seams the zip is created directly on S3 without taking up local disk space. But I am just not smart enough to transfer the above solution to Java. I am also finding conflicting information on the java aws sdk, saying that they planned on changing the stream behavior in 2017.

不确定这是否会有所帮助,但是到目前为止,这是我一直在做的事情( Upload 是保存S3信息的本地模型).我只是删除了日志和其他东西以提高可读性.我认为我不会占用直接将InputStream直接压缩"到zip中的下载空间.但是就像我说的,我也想避免使用本地zip文件,而直接在S3上创建它.但是,这可能需要使用S3作为目标而不是FileOutputStream创建ZipOutputStream.不确定该怎么做.

Not sure if this will help, but here's what I've been doing so far (Upload is my local model that holds S3 information). I just removed logging and stuff for better readability. I think I am not taking up space for the download "piping" the InputStream directly into the zip. But like I said I would also like to avoid the local zip file and create it directly on S3. That however would probably require the ZipOutputStream to be created with S3 as target instead of a FileOutputStream. Not sure how that can be done.

public File zipUploadsToNewTemp(List<Upload> uploads) {
    List<String> names = new ArrayList<>();

    byte[] buffer = new byte[1024];
    File tempZipFile;
    try {
      tempZipFile = File.createTempFile(UUID.randomUUID().toString(), ".zip");
    } catch (Exception e) {
      throw new ApiException(e, BaseErrorCode.FILE_ERROR, "Could not create Zip file");
    }
    try (
        FileOutputStream fileOutputStream = new FileOutputStream(tempZipFile);
        ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream)) {

      for (Upload upload : uploads) {
        InputStream inputStream = getStreamFromS3(upload);
        ZipEntry zipEntry = new ZipEntry(upload.getFileName());
        zipOutputStream.putNextEntry(zipEntry);
        writeStreamToZip(buffer, zipOutputStream, inputStream);
        inputStream.close();
      }
      zipOutputStream.closeEntry();
      zipOutputStream.close();
      return tempZipFile;
    } catch (IOException e) {
      logError(type, e);
      if (tempZipFile.exists()) {
        FileUtils.delete(tempZipFile);
      }
      throw new ApiException(e, BaseErrorCode.IO_ERROR,
          "Error zipping files: " + e.getMessage());
    }
}

  // I am not even sure, but I think this takes up memory and not disk space
private InputStream getStreamFromS3(Upload upload) {
    try {
      String filename = upload.getId() + "." + upload.getFileType();
      InputStream inputStream = s3FileService
          .getObject(upload.getBucketName(), filename, upload.getPath());
      return inputStream;
    } catch (ApiException e) {
      throw e;
    } catch (Exception e) {
      logError(type, e);
      throw new ApiException(e, BaseErrorCode.UNKOWN_ERROR,
          "Unkown Error communicating with S3 for file: " + upload.getFileName());
    }
}


private void writeStreamToZip(byte[] buffer, ZipOutputStream zipOutputStream,
      InputStream inputStream) {
    try {
      int len;
      while ((len = inputStream.read(buffer)) > 0) {
        zipOutputStream.write(buffer, 0, len);
      }
    } catch (IOException e) {
      throw new ApiException(e, BaseErrorCode.IO_ERROR, "Could not write stream to zip");
    }
}

最后是上传源代码.输入流是从Temp Zip文件创建的.

And finally the upload Source code. Inputstream is created from the Temp Zip file.

public PutObjectResult upload(InputStream inputStream, String bucketName, String filename, String folder) {
    String uploadKey = StringUtils.isEmpty(folder) ? "" : (folder + "/");
    uploadKey += filename;

    ObjectMetadata metaData = new ObjectMetadata();

    byte[] bytes;
    try {
      bytes = IOUtils.toByteArray(inputStream);
    } catch (IOException e) {
      throw new ApiException(e, BaseErrorCode.IO_ERROR, e.getMessage());
    }
    metaData.setContentLength(bytes.length);
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);

    PutObjectRequest putObjectRequest = new PutObjectRequest(bucketPrefix + bucketName, uploadKey, byteArrayInputStream, metaData);
    putObjectRequest.setCannedAcl(CannedAccessControlList.PublicRead);

    try {
      return getS3Client().putObject(putObjectRequest);
    } catch (SdkClientException se) {
      throw s3Exception(se);
    } finally {
      IOUtils.closeQuietly(inputStream);
    }
  }

我也找到了一个类似的问题,我也没有答案:

Just found a similar question to what I need also without answer:

将ZipOutputStream上传到S3无需使用AWS S3 Java将zip文件(大)临时保存到磁盘上

推荐答案

您可以从S3数据中获取输入流,然后压缩这批字节并将其流回到S3

You can get input stream from your S3 data, then zip this batch of bytes and stream it back to S3

        long numBytes;  // length of data to send in bytes..somehow you know it before processing the entire stream
        PipedOutputStream os = new PipedOutputStream();
        PipedInputStream is = new PipedInputStream(os);
        ObjectMetadata meta = new ObjectMetadata();
        meta.setContentLength(numBytes);

        new Thread(() -> {
            /* Write to os here; make sure to close it when you're done */
            try (ZipOutputStream zipOutputStream = new ZipOutputStream(os)) {
                ZipEntry zipEntry = new ZipEntry("myKey");
                zipOutputStream.putNextEntry(zipEntry);
                
                S3ObjectInputStream objectContent = amazonS3Client.getObject("myBucket", "myKey").getObjectContent();
                byte[] bytes = new byte[1024];
                int length;
                while ((length = objectContent.read(bytes)) >= 0) {
                    zipOutputStream.write(bytes, 0, length);
                }
                objectContent.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }).start();
        amazonS3Client.putObject("myBucket", "myKey", is, meta);
        is.close();  // always close your streams

这篇关于从Java中S3上的文件在S3上创建一个zip文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆