Java中文件中的行数 [英] Number of lines in a file in Java

查看:196
本文介绍了Java中文件中的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用大量数据文件,有时我只需要知道这些文件中的行数,通常我打开它们并逐行读取它们直到我到达文件末尾

I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file

我想知道是否有更聪明的方法

I was wondering if there is a smarter way to do that

推荐答案

这是我最快的版本到目前为止已发现,比readLines快约6倍。在150MB的日志文件上,这需要0.35秒,而使用readLines()时需要2.40秒。只是为了好玩,linux'wc -l命令需要0.15秒。

This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.

public static int countLines(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];
        int count = 0;
        int readChars = 0;
        boolean empty = true;
        while ((readChars = is.read(c)) != -1) {
            empty = false;
            for (int i = 0; i < readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
        }
        return (count == 0 && !empty) ? 1 : count;
    } finally {
        is.close();
    }
}

编辑,9年半后:我有几乎没有java经验,但无论如何我试图在下面的 LineNumberReader 解决方案中对此代码进行基准测试,因为它让我感到困扰,没有人这样做。看来特别是对于大文件我的解决方案更快。虽然在优化器完成一项体面的工作之前似乎需要几次运行。我已经玩了一些代码,并且已经生成了一个最快的新版本:

EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:

public static int countLinesNew(String filename) throws IOException {
    InputStream is = new BufferedInputStream(new FileInputStream(filename));
    try {
        byte[] c = new byte[1024];

        int readChars = is.read(c);
        if (readChars == -1) {
            // bail out if nothing to read
            return 0;
        }

        // make it easy for the optimizer to tune this loop
        int count = 0;
        while (readChars == 1024) {
            for (int i=0; i<1024;) {
                if (c[i++] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        // count remaining characters
        while (readChars != -1) {
            System.out.println(readChars);
            for (int i=0; i<readChars; ++i) {
                if (c[i] == '\n') {
                    ++count;
                }
            }
            readChars = is.read(c);
        }

        return count == 0 ? 1 : count;
    } finally {
        is.close();
    }
}

1.3GB文本文件的基准测试结果,y轴很快。我使用相同的文件执行了100次运行,并使用 System.nanoTime()测量每次运行。您可以看到 countLines 有一些异常值, countLinesNew 没有,似乎也稍快一些。 LineNumberReader 显然较慢。

Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLines has a few outliers, and countLinesNew has none and seems to be also slightly faster. LineNumberReader is clearly slower.

这篇关于Java中文件中的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆