从Zip文件中的文件中读取内容 [英] Read Content from Files which are inside Zip file
问题描述
我正在尝试创建一个简单的java程序,它从zip文件中的文件中读取和提取内容。 Zip文件包含3个文件(txt,pdf,docx)。我需要阅读所有这些文件的内容,并且我正在使用 Apache Tika 来实现此目的。
I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose.
有人可以帮助我实现这一功能。到目前为止我已经尝试过但没有成功
Can somebody help me out here to achieve the functionality. I have tried this so far but no success
代码段
public class SampleZipExtract {
public static void main(String[] args) {
List<String> tempString = new ArrayList<String>();
StringBuffer sbf = new StringBuffer();
File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip");
InputStream input;
try {
input = new FileInputStream(file);
ZipInputStream zip = new ZipInputStream(input);
ZipEntry entry = zip.getNextEntry();
BodyContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
while (entry!= null){
if(entry.getName().endsWith(".txt") ||
entry.getName().endsWith(".pdf")||
entry.getName().endsWith(".docx")){
System.out.println("entry=" + entry.getName() + " " + entry.getSize());
parser.parse(input, textHandler, metadata, new ParseContext());
tempString.add(textHandler.toString());
}
}
zip.close();
input.close();
for (String text : tempString) {
System.out.println("Apache Tika - Converted input string : " + text);
sbf.append(text);
System.out.println("Final text from all the three files " + sbf.toString());
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
推荐答案
如果您想知道如何从每个 ZipEntry
获取文件内容,它实际上非常简单。下面是一个示例代码:
If you're wondering how to get the file content from each ZipEntry
it's actually quite simple. Here's a sample code:
public static void main(String[] args) throws IOException {
ZipFile zipFile = new ZipFile("C:/test.zip");
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
InputStream stream = zipFile.getInputStream(entry);
}
}
一旦你有了InputStream,你可以阅读它但是你想。
Once you have the InputStream you can read it however you want.
这篇关于从Zip文件中的文件中读取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!