如何在Java中使用不同的行分隔符处理文件? [英] How to handle file with different line separator in java?

查看:265
本文介绍了如何在Java中使用不同的行分隔符处理文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的文件(超过3GB),其中包含以下格式的单个长行. "1243 @ 818 @ 9287 @ 543"

I have a huge file (more than 3GB) that contains a single long line in the following format. "1243@818@9287@543"

然后,我要分析的数据用"@"分隔.我的想法是更改默认的行尾 Java ans设置为"@"的字符.

Then the data I want to analyze is separated with "@". My idea is to change the default end of line character used by Java ans set "@".

我正在尝试使用"System.setProperty("line.separator","@");使用以下代码但不起作用,因为正在打印整行,对于此测试,我希望将其作为输出.

I'm trying with the following code using "System.setProperty("line.separator", "@");" but is not working, since is printing the complete line and for this test I'd like as output.

1243
818
9287
543

如何将默认的行分隔符更改为"@"?

How can I change the default line separator to "@"?

package test;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class Test {
    public static void main(String[] args) throws FileNotFoundException, IOException {
        System.setProperty("line.separator", "@");

        File testFile = new File("./Mypath/myfile");
        BufferedReader br = new BufferedReader(new FileReader(testFile));
        for(String line; (line = br.readLine()) != null; ) {
        // Process each the line.
            System.out.println(line); 
        }
    }

}

在此先感谢您的帮助.

推荐答案

然后,我要分析的数据用"@"分隔.我的想法是 更改Java ans设置为"@"的默认行尾字符.

Then the data I want to analyze is separated with "@". My idea is to change the default end of line character used by Java ans set "@".

我不会那样做,因为它可能会破坏,上帝知道还取决于line.separator.

I wouldn't do that as it might break God knows what else that is depending on line.separator.

至于为什么它不起作用,很遗憾地说这是RTFM无法完成的情况.这就是 BufferedReader.readLine的Javadocs 必须说:

As for why this doesn't work, I'm sorry to say this is a case of RTFM not being done. This is what the Javadocs for BufferedReader.readLine has to say:

public String readLine()
                throws IOException
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
Throws: IOException - If an I/O error occurs

用于readLine()方法的API文档明确指出它查找'\n''\r'.它并不表示它取决于line.separator.

The API docs for the readLine() method clearly says that it looks for '\n' or '\r'. It does not say it depends on line.separator.

line.separator属性仅用于开发需要可移植的,独立于平台的机制来标识行分隔符的API.就这些.此系统属性不是用于控制Java IO类的内部机制.

The line.separator property is only for developing API's that need a portable, platform-independent mechanism that identifies line separators. That is all. This system property is not for controlling the internal mechanisms of Java's IO classes.

我认为您太过复杂了.通过读取缓冲区中的n个字符(例如1024KB),然后扫描每个"@"定界符,以旧的方式进行操作.这就带来了复杂性,例如正常情况下,"@"定界符之间的数据在缓冲区之间分配.

I think you are over-complicating things. Just do it the old fashion way by reading n-number of characters (say 1024KB) on a buffer, and scan for each '@' delimiter. That introduces complications such as normal cases where data between '@' delimiters get split between buffers.

因此,我建议您从缓冲的读取器中读取一个字符(这并不算太坏,通常不会过度打入IO,因为缓冲的读取器确实可以... tada ...为您缓冲.)

So, I would suggest just read one character off the buffered reader (this is not that bad and does not typically hit IO excessively since the buffered reader does... tada... buffering for you.)

将每个字符添加到字符串生成器中,并且每次找到"@"定界符时,都将字符串生成器的内容刷新到标准输出或任何其他内容(因为这将代表"@"文件中的数据).

Pump each character to a string builder, and every time you find a '@' delimiter, you flush the content of the string builder to standard output or whatever (since that would represent a datum off your '@' file.)

首先使算法正常工作.稍后进行优化.这是下面的伪代码,不能保证没有编译错误.您应该可以使用语法正确的Java轻松充实它:

Get the algorithm to work correctly first. Optimize later. This is the pseudo-code below, no guarantees there are no compilation errors. You should be able to trivially flesh it out in syntactically correct Java:

File testFile = new File("./Mypath/myfile");
int buffer_size = 1024 * 1024
BufferedReader br = new BufferedReader(new FileReader(testFile), buffer_size);

StringBuilder bld = StringBuilder();
int c = br.read();

while(c != -1){
    char z = (char)c;
    if(z == '@'){
        System.out.println(bld);
        if(bld.length() > 0){
            bld.delete(0, bld.length() - 1);
        }
    } else {
        bld.append(z);
    }
}

这篇关于如何在Java中使用不同的行分隔符处理文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆