如何从文件的内容创建 Java 字符串? [英] How do I create a Java string from the contents of a file?

查看:23
本文介绍了如何从文件的内容创建 Java 字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用下面的成语有一段时间了.它似乎是最普遍的,至少在我访问过的网站上是这样.

I've been using the idiom below for some time now. And it seems to be the most wide-spread, at least on the sites I've visited.

在 Java 中是否有更好/不同的方式将文件读入字符串?

Is there a better/different way to read a file into a string in Java?

private String readFile(String file) throws IOException {
    BufferedReader reader = new BufferedReader(new FileReader (file));
    String         line = null;
    StringBuilder  stringBuilder = new StringBuilder();
    String         ls = System.getProperty("line.separator");

    try {
        while((line = reader.readLine()) != null) {
            stringBuilder.append(line);
            stringBuilder.append(ls);
        }

        return stringBuilder.toString();
    } finally {
        reader.close();
    }
}

推荐答案

从文件中读取所有文本

Java 11 添加了 readString() 方法将小文件读取为 String,保留行终止符:

Read all text from a file

Java 11 added the readString() method to read small files as a String, preserving line terminators:

String content = Files.readString(path, StandardCharsets.US_ASCII);

对于 Java 7 和 11 之间的版本,这里有一个紧凑、健壮的习惯用法,包含在一个实用程序方法中:

For versions between Java 7 and 11, here's a compact, robust idiom, wrapped up in a utility method:

static String readFile(String path, Charset encoding)
  throws IOException
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

从文件中读取文本行

Java 7 添加了一个 将文件作为文本行读取的便捷方法, 表示为 List.这种方法是有损的"因为行分隔符从每行的末尾剥离.

Read lines of text from a file

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

List<String> lines = Files.readAllLines(Paths.get(path), encoding);

Java 8 添加了 Files.lines() 方法产生一个Stream.同样,这种方法是有损的,因为行分隔符被剥离了.如果在读取文件时遇到 IOException,它会被包裹在一个 UncheckedIOException,因为 Stream 不接受抛出已检查异常的 lambda.

Java 8 added the Files.lines() method to produce a Stream<String>. Again, this method is lossy because line separators are stripped. If an IOException is encountered while reading the file, it is wrapped in an UncheckedIOException, since Stream doesn't accept lambdas that throw checked exceptions.

try (Stream<String> lines = Files.lines(path, encoding)) {
  lines.forEach(System.out::println);
}

这个 Stream 确实需要一个 close() 调用;这在 API 上的记录很差,我怀疑很多人甚至没有注意到 Stream 有一个 close() 方法.请务必使用如图所示的 ARM 模块.

This Stream does need a close() call; this is poorly documented on the API, and I suspect many people don't even notice Stream has a close() method. Be sure to use an ARM-block as shown.

如果您使用的是文件以外的源,则可以使用 lines() 方法在 BufferedReader 代替.

If you are working with a source other than a file, you can use the lines() method in BufferedReader instead.

第一种方法,保留换行符,可以临时需要数倍于文件大小的内存,因为在短时间内原始文件内容(一个字节数组)和解码后的字符(每个是 16 位)即使在文件中编码为 8 位)也立即驻留在内存中.应用于您知道相对于可用内存较小的文件是最安全的.

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

第二种方法,读取行,通常内存效率更高,因为用于解码的输入字节缓冲区不需要包含整个文件.但是,它仍然不适合相对于可用内存非常大的文件.

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

为了读取大文件,您的程序需要不同的设计,从流中读取一大块文本,处理它,然后移动到下一个,重用相同的固定大小的内存块.在这里,大"取决于计算机规格.如今,这个阈值可能是数 GB 的 RAM.第三种方法,使用 Stream 是一种方法,如果您的输入记录"是碰巧是单独的行.(使用 BufferedReaderreadLine() 方法与此方法等效.)

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String> is one way to do this, if your input "records" happen to be individual lines. (Using the readLine() method of BufferedReader is the procedural equivalent to this approach.)

原始帖子中的示例中缺少的一件事是字符编码.在某些特殊情况下,平台默认值正是您想要的,但这种情况很少见,您应该能够证明您的选择是合理的.

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

StandardCharsets 类为所有 Java 运行时所需的编码定义了一些常量:

The StandardCharsets class defines some constants for the encodings required of all Java runtimes:

String content = readFile("test.txt", StandardCharsets.UTF_8);

平台默认值可从 Charset 本身:

The platform default is available from the Charset class itself:

String content = readFile("test.txt", Charset.defaultCharset());


注意:这个答案很大程度上取代了我的 Java 6 版本.Java 7 的实用程序安全地简化了代码,并且使用映射字节缓冲区的旧答案阻止了读取的文件被删除,直到映射缓冲区被垃圾收集为止.您可以通过已编辑"查看旧版本.这个答案的链接.


Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

这篇关于如何从文件的内容创建 Java 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆