GZIPInputStream逐行读取 [英] GZIPInputStream reading line by line

查看:193
本文介绍了GZIPInputStream逐行读取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.gz格式的文件。读取这个文件的java类是GZIPInputStream。
但是,这个类没有扩展java的BufferedReader类。因此,我无法逐行读取文件。我需要这样的东西



$ $ p $ reader = new MyGZInputStream(GZInputStream的一些构造函数)
reader.readLine().. 。

我虽然创建了我的类,它扩展了Java的Reader或BufferedReader类,并使用GZIPInputStream作为一个

  import java.io.BufferedReader; 
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.Reader;
import java.util.zip.GZIPInputStream;

public class MyGZFilReader extends Reader {
$ b $ private GZIPInputStream gzipInputStream = null;
char [] buf = new char [1024];
$ b $ @覆盖
public void close()throws IOException {
gzipInputStream.close();

$ b $ public MyGZFilReader(String filename)
抛出FileNotFoundException,IOException {
gzipInputStream = new GZIPInputStream(new FileInputStream(filename));

$ b @Override
public int read(char [] cbuf,int off,int len)throws IOException {
// TODO自动生成的方法存根
return gzipInputStream.read((byte [])buf,off,len);
}

}

但是,这不起作用当我使用

  BufferedReader in = new BufferedReader(
new MyGZFilReader(F:/ gawiki-20090614-stub-元history.xml.gz));
System.out.println(in.readLine());

可以有人建议如何进行。

  InputStream fileStream = new FileInputStream (文件名); 
InputStream gzipStream = new GZIPInputStream(fileStream);
Reader decoder = new InputStreamReader(gzipStream,encoding);
BufferedReader buffered = new BufferedReader(decoder);

这个片段的关键问题是编码的值。这是文件中文本的字符编码。是US-ASCII,UTF-8,SHIFT-JIS,ISO-8859-9,?有数百种可能性,正确的选择通常不能从文件本身确定。它必须通过一些带外频道来指定。

例如,也许是平台的默认值。然而在网络环境中,这是非常脆弱的。写文件的机器可能位于邻近的隔间,但是有一个不同的默认文件编码。

大多数网络协议使用标题或其他元数据来明确地标注字符编码。

在这种情况下,从文件扩展名看来,内容是XML。为此,XML在XML声明中包含编码属性。而且,XML实际上应该用XML解析器处理,而不是文本。逐行阅读XML看起来像是一个脆弱的特殊情况。



未明确指定编码是 against the second commandment。 b $ b

I have a file in .gz format. The java class for reading this file is GZIPInputStream. However, this class doesn't extend the BufferedReader class of java. As a result, I am not able to read the file line by line. I need something like this

reader  = new MyGZInputStream( some constructor of GZInputStream) 
reader.readLine()...

I though of creating my class which extends the Reader or BufferedReader class of java and use GZIPInputStream as one of its variable.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.Reader;
import java.util.zip.GZIPInputStream;

public class MyGZFilReader extends Reader {

    private GZIPInputStream gzipInputStream = null;
    char[] buf = new char[1024];

    @Override
    public void close() throws IOException {
        gzipInputStream.close();
    }

    public MyGZFilReader(String filename)
               throws FileNotFoundException, IOException {
        gzipInputStream = new GZIPInputStream(new FileInputStream(filename));
    }

    @Override
    public int read(char[] cbuf, int off, int len) throws IOException {
        // TODO Auto-generated method stub
        return gzipInputStream.read((byte[])buf, off, len);
    }

}

But, this doesn't work when I use

BufferedReader in = new BufferedReader(
    new MyGZFilReader("F:/gawiki-20090614-stub-meta-history.xml.gz"));
System.out.println(in.readLine());

Can someone advice how to proceed ..

The basic setup of decorators is like this:

InputStream fileStream = new FileInputStream(filename);
InputStream gzipStream = new GZIPInputStream(fileStream);
Reader decoder = new InputStreamReader(gzipStream, encoding);
BufferedReader buffered = new BufferedReader(decoder);

The key issue in this snippet is the value of encoding. This is the character encoding of the text in the file. Is it "US-ASCII", "UTF-8", "SHIFT-JIS", "ISO-8859-9", …? there are hundreds of possibilities, and the correct choice usually cannot be determined from the file itself. It must be specified through some out-of-band channel.

For example, maybe it's the platform default. In a networked environment, however, this is extremely fragile. The machine that wrote the file might sit in the neighboring cubicle, but have a different default file encoding.

Most network protocols use a header or other metadata to explicitly note the character encoding.

In this case, it appears from the file extension that the content is XML. XML includes the "encoding" attribute in the XML declaration for this purpose. Furthermore, XML should really be processed with an XML parser, not as text. Reading XML line-by-line seems like a fragile, special case.

Failing to explicitly specify the encoding is against the second commandment. Use the default encoding at your peril!

这篇关于GZIPInputStream逐行读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆