如何在不使用临时文件的情况下从 Java 中的嵌套 zip 文件中读取数据? [英] How to read data from nested zip files in Java without using temporary files?
问题描述
我正在尝试从嵌套的 zip 存档中提取文件并在内存中处理它们.
I am trying to to extract files out of a nested zip archive and process them in memory.
这个问题不是关于什么:
如何在 Java 中读取 zip 文件:不,问题是如何读取 zip 文件中的 zip 文件等等(如在嵌套的 zip 文件中).
How to read a zip file in Java: NO, the question is how to read a zip file within a zip file within a zip and so on and so forth (as in nested zip files).
在磁盘上写入临时结果:不,我要求在内存中完成所有操作.我找到了许多使用临时将结果写入磁盘的效率不高的技术的答案,但这不是我想要做的.
Write temporary results on disk: NO, I'm asking about doing it all in memory. I found many answers using the not-so-efficient technique of writing results temporarily to disk, but that's not what I want to do.
示例:
Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3
Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3
目标:提取在每个嵌套 zip 文件中找到的数据,所有数据都在内存中并使用 Java.
Goal: extract the data found in each of the nested zip files, all in memory and using Java.
ZipFile 就是答案, 你说?不,它不是,它适用于第一次迭代,即:
ZipFile is the answer, you say? NO, it is not, it works for the first iteration, that is for:
Zipfile -> Zipfile1
Zipfile -> Zipfile1
但是一旦你到达 Zipfile2,并执行一个:
But once you get to Zipfile2, and perform a:
ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;
你会得到一个 NullPointerException.
you will get a NullPointerException.
我的代码:
public class ZipHandler {
String findings = new String();
ZipFile zipFile = null;
public void init(String fileName) throws AppException{
try {
//read file into stream
zipFile = new ZipFile(fileName);
Enumeration<?> enu = zipFile.entries();
exctractInfoFromZip(enu);
zipFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{
try {
while (enu.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enu.nextElement();
String name = zipEntry.getName();
long size = zipEntry.getSize();
long compressedSize = zipEntry.getCompressedSize();
System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n",
name, size, compressedSize);
// directory ?
if (zipEntry.isDirectory()) {
System.out.println("dir found:" + name);
findings+=", " + name;
continue;
}
if (name.toUpperCase().endsWith(".ZIP") || name.toUpperCase().endsWith(".GZ")) {
String fileType = name.substring(
name.lastIndexOf(".")+1, name.length());
System.out.println("File type:" + fileType);
System.out.println("zipEntry: " + zipEntry);
if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(zipInputStreamToEnum(z));
} else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(gZipInputStreamToEnum(z));
} else
throw new AppException("extension not recognized!");
} else {
System.out.println(name);
findings+=", " + name;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Findings " + findings);
}
public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{
List<ZipEntry> list = new ArrayList<ZipEntry>();
while (zStream.available() != 0) {
list.add(zStream.getNextEntry());
}
return Collections.enumeration(list);
}
推荐答案
我还没有尝试过,但使用了 ZipInputStream
您可以读取任何包含 ZIP 文件作为数据的 InputStream.遍历条目,当您找到正确的条目时,使用
ZipInputStream 来创建另一个嵌套的
ZipInputStream`.
I have not tried it but using ZipInputStream
you can read any InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use the
ZipInputStreamto create another nested
ZipInputStream`.
以下代码演示了这一点.想象一下,我们在 0.zip
中有一个 readme.txt
,它再次被压缩在 1.zip
中,它被压缩在 2.zip 中代码>.现在我们从
readme.txt
中读取一些文本:
The following code demonstrates this. Imagine we have a readme.txt
inside 0.zip
which is again zipped in 1.zip
which is zipped in 2.zip
. Now we read some text from readme.txt
:
try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
ZipInputStream firstZip = new ZipInputStream(fin);
ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));
ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
char[] cbuf = new char[1024];
int read = reader.read(cbuf);
System.out.println(new String(cbuf, 0, read));
.....
public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
ZipEntry entry = null;
while ((entry = in.getNextEntry()) != null) {
if (entry.getName().equals(name)) {
return in;
}
}
return null;
}
注意代码真的很丑,它没有关闭任何东西,也没有检查错误.它只是一个最小化的版本,展示了它是如何工作的.
Note the code is really ugly and does not close anything nor does it checks for errors. It is just a minimized version that demonstrates how it works.
理论上,您级联到另一个的 ZipInputStreams 的数量没有限制.数据永远不会写入临时文件.解密仅在您读取每个 InputStream
时按需执行.
Theoretically there is no limit how many ZipInputStreams you cascade into another. The data is never written into a temporary file. The decryption is only performed on-demand when you read each InputStream
.
这篇关于如何在不使用临时文件的情况下从 Java 中的嵌套 zip 文件中读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!