如何用java保存中文字符? [英] How to save Chinese Characters to file with java?

查看:177
本文介绍了如何用java保存中文字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码将汉字保存到.txt文件中,但是当我用Wordpad打开它时,我无法阅读。

I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.

StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77");
boolean Append = true;

FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
    fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();

我该怎么办?我知道如果我剪切和粘贴汉字到写字板,我可以保存到一个.txt文件。

What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?

推荐答案

这里有几个因素:


  • 文本文件没有用于描述其编码的内在元数据(对于所有关于角括号税的讨论,有一些原因XML是流行的)

  • Windows的默认编码仍为8位(或双字节) ANSI 有限范围值的字符集 - 以此格式编写的文本文件不可移植

  • 要从ANSI文件中提取Unicode文件,Windows应用程序在文件开头出现字节顺序标记不完全正确 - Raymond Chen解释了)。理论上,BOM会告诉您数据的字节顺序(字节顺序)。对于UTF-8,即使只有一个字节顺序,Windows应用程序依靠标记字节自动确定它是Unicode(虽然你会注意到记事本在打开/保存对话框有一个编码选项)。

  • 这是错误的说,Java被打破,因为它不自动写UTF-8 BOM。在Unix系统上,将BOM写入脚本文件将是一个错误,例如,许多Unix系统使用UTF-8作为其默认编码。有时候,你不想在Windows上,例如,当你将数据附加到一个现有的文件: fos = new FileOutputStream(FileName,Append);

  • Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
  • The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
  • To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
  • It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file: fos = new FileOutputStream(FileName,Append);

以下是将UTF-8数据可靠地附加到文件的方法:

Here is a method of reliably appending UTF-8 data to a file:

  private static void writeUtf8ToFile(File file, boolean append, String data)
      throws IOException {
    boolean skipBOM = append && file.isFile() && (file.length() > 0);
    Closer res = new Closer();
    try {
      OutputStream out = res.using(new FileOutputStream(file, append));
      Writer writer = res.using(new OutputStreamWriter(out, Charset
          .forName("UTF-8")));
      if (!skipBOM) {
        writer.write('\uFEFF');
      }
      writer.write(data);
    } finally {
      res.close();
    }
  }

用法:

  public static void main(String[] args) throws IOException {
    String chinese = "\u4E0A\u6D77";
    boolean append = true;
    writeUtf8ToFile(new File("chinese.txt"), append, chinese);
  }



注意:如果文件已经存在并且您选择附加和现有数据 不是UTF-8编码的,代码将创建的唯一的东西是一个混乱。

Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.

这里是 此代码中使用的类型:

Here is the Closer type used in this code:

public class Closer implements Closeable {
  private Closeable closeable;

  public <T extends Closeable> T using(T t) {
    closeable = t;
    return t;
  }

  @Override public void close() throws IOException {
    if (closeable != null) {
      closeable.close();
    }
  }
}

style最好猜猜如何根据字节顺序标记读取文件:

This code makes a Windows-style best guess about how to read the file based on byte order marks:

  private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
      Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };

  private static Charset getEncoding(InputStream in) throws IOException {
    charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
      byte[] bom = "\uFEFF".getBytes(encodings);
      in.mark(bom.length);
      for (byte b : bom) {
        if ((0xFF & b) != in.read()) {
          in.reset();
          continue charsetLoop;
        }
      }
      return encodings;
    }
    return Charset.defaultCharset();
  }

  private static String readText(File file) throws IOException {
    Closer res = new Closer();
    try {
      InputStream in = res.using(new FileInputStream(file));
      InputStream bin = res.using(new BufferedInputStream(in));
      Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
      StringBuilder out = new StringBuilder();
      for (int ch = reader.read(); ch != -1; ch = reader.read())
        out.append((char) ch);
      return out.toString();
    } finally {
      res.close();
    }
  }

用法:

  public static void main(String[] args) throws IOException {
    System.out.println(readText(new File("chinese.txt")));
  }

(System.out使用默认编码,在您的平台和配置。 )

(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)

这篇关于如何用java保存中文字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆