为什么我似乎无法从URL流中读取整个压缩文件？ [英] Why can't I seem to read an entire compressed file from a URL stream?

查看：118 发布时间：2020/6/7 18:58:25 java stream inputstream bzip2 wikidata

本文介绍了为什么我似乎无法从URL流中读取整个压缩文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Java直接从URL即时解析Wiktionary转储。 Wiki转储以压缩的BZIP2文件分发，我使用以下方法尝试解析它们：

I'm trying to parse Wiktionary dumps on the fly, directly from the URL, in Java. The Wiki dumps are distributed as compressed BZIP2 files, and I am using the following approach to attempt to parse them:

String fileURL = "https://dumps.wikimedia.org/cswiktionary/20171120/cswiktionary-20171120-pages-articles-multistream.xml.bz2";
URL bz2 = new URL(fileURL);
BufferedInputStream bis = new BufferedInputStream(bz2.openStream());
CompressorInputStream input = new CompressorStreamFactory().createCompressorInputStream(bis);
BufferedReader br2 = new BufferedReader(new InputStreamReader(input));
System.out.println(br2.lines().count());

但是，输出的行数仅为36，仅占整个文件的一小部分它的大小超过20MB。尝试逐行打印流，实际上只打印了几行XML：

However, the outputted line count is only 36, which is only a fraction of the total file, seeing it's over 20MB in size. Attempting to print the stream line-by-line, only a few lines of XML were actually printed:

String line = br2.readLine();
while(line != null) {
  System.out.println(line);
  line = br2.readLine();
}

我在这里缺少什么吗？我从网上找到的其他代码块几乎逐行复制了我的实现，其他人声称它们已经起作用了。为什么不读取整个流？

Is there something I am missing here? I copied my implementation almost line-for-line from other chunks of code I found online, which others claimed to have worked. Why isn't the entire stream being read? Thanks in advance.

为什么我似乎无法从URL流中读取整个压缩文件？ [英] Why can't I seem to read an entire compressed file from a URL stream?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

为什么我似乎无法从URL流中读取整个压缩文件？ [英] Why can&#39;t I seem to read an entire compressed file from a URL stream?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

为什么我似乎无法从URL流中读取整个压缩文件？ [英] Why can't I seem to read an entire compressed file from a URL stream?

登录关闭