从可运行的Jar创建Java中的UTF-8文件 [英] Creating UTF-8 files in Java from a runnable Jar

查看:143
本文介绍了从可运行的Jar创建Java中的UTF-8文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小项目,我将类文件的属性设置为UTF-8(我使用了默认CP1252上找不到的很多外来字符)。

I have a little Java project where I've set the properties of the class files to UTF-8 (I use a lot of foreign characters not found on the default CP1252).

目标是创建一个包含项目列表的文本文件(在Windows中)。
当从Eclipse本身运行类文件时(按Ctrl + F11),它可以完美地创建文件并在另一个编辑器中打开它(我正在使用Notepad ++)我可以看到我想要的字符。

The goal is to create a text file (in Windows) containing a list of items. When running the class files from Eclipse itself (hitting Ctrl+F11) it creates the file flawlessly and opening it in another editor (I'm using Notepad++) I can see the characters as I wanted.

┌──────────────────────────────────────────────────┐
│                          Universidade2010 (18/18)│
│                                         hidden: 0│
├──────────────────────────────────────────────────┤

但是,当我将项目(使用Eclipse)导出为可运行的Jar并使用'javaw -jar运行它时project.jar'创建的新文件是一堆问号

But, when I export the project (using Eclipse) as a runnable Jar and run it using 'javaw -jar project.jar' the new file created is a mess of question marks

????????????????????????????????????????????????????
?                          Universidade2010 (19/19)?
?                                         hidden: 0?
????????????????????????????????????????????????????

我已经遵循了一些关于如何使用UTF-8的提示(默认情况下似乎打破了在Java上)试图纠正这个现在我正在使用

I've followed some tips on how to use UTF-8 (which seems to be broken by default on Java) to try to correct this so now I'm using

Writer w = new OutputStreamWriter(fos, "UTF-8");

并将BOM表头写入文件,如此问题已经回答但是在导出到Jar时仍然没有运气

and writing the BOM header to the file like in this question already answered but still without luck when exporting to Jar

我是否遗漏了一些属性或命令行命令,因此Java知道我想默认创建UTF-8文件?

Am I missing some property or command-line command so Java knows I want to create UTF-8 files by default ?

问题不在于创建文件本身,因为正在开发文件时正确输出(使用unicode字符)

the problem is not on the creating the file itself , because while developing the file is outputted correctly (with the unicode characters)

创建的类该文件现在(并遵循使用Charset类的建议),如下所示:

The class that creates the file is now (and following the suggestion of using the Charset class) like this:

public class Printer {

    File f;
    FileOutputStream fos;
    Writer w;
    final byte[] utf8_bom = { (byte) 0xEF, (byte) 0xBB, (byte) 0xBF };

    public Printer(String filename){
        f = new File(filename);
        try {
            fos = new FileOutputStream(f);
            w = new OutputStreamWriter(fos, Charset.forName("UTF-8"));
            fos.write(utf8_bom);
        } catch (FileNotFoundException e) {
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void print(String s) {
        if(fos != null){
            try {
                fos.write(s.getBytes());
                fos.flush();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }

}

所有使用的字符都是这样定义的:

And all characters being used are defined as such:

private final char pipe = '\u2502';         /* │ */
private final char line = '\u2500';         /* ─ */
private final char pipeleft = '\u251c';     /* ├ */
private final char piperight = '\u2524';    /* ┤ */
private final char cupleft = '\u250c';      /* ┌ */
private final char cupright = '\u2510';     /* ┐ */
private final char cdownleft = '\u2514';    /* └ */
private final char cdownright = '\u2518';   /* ┘ */

问题依然存在,只需在Eclipse上运行项目即可输出文件,该文件出来完美,但在将项目部署到Jar并运行它之后,输出的文件的格式被破坏(我发现它们被'?'字符替换)

The problem remains, when outputting to a file simply by running the project on Eclipse, the file comes out perfect, but after deploying the project to a Jar and running it the outputted file has the formatting destroyed (I've found out that they are replaced by the '?' char)

我认为这不是代码的问题,将其部署到Jar文件是一个问题,我认为Eclipse正在将源文件编译为CP1252或其他东西,但即使是替换所有unicode字符的代码常量没有帮助

I've come to thinking this is not a problem with the code, is a problem from deploying it into a Jar file, I think Eclipse is compiling the source files to CP1252 or something, but even replacing all unicode chars by their code constants didn't help

推荐答案


我已经按照一些提示如何使用UTF-8(默认情况下在Java上似乎被破坏)

I've followed some tips on how to use UTF-8 (which seems to be broken by default on Java)

由于历史原因,Java的编码默认为系统编码(在Windows 95上更有意义的东西)。这种行为不太可能改变。据我所知,Java的编码器实现没有任何破坏。

For historical reasons, Java's encoding defaults to the system encoding (something that made more sense back on Windows 95). This behaviour isn't likely to change. To my knowledge, there isn't anything broken about Java's encoder implementation.

  private static final String BOM = "\ufeff";

  public static void main(String[] args) throws IOException {
    String data = "\u250c\u2500\u2500\u2510\r\n\u251c\u2500\u2500\u2524";
    OutputStream out = new FileOutputStream("data.txt");
    Closeable resource = out;
    try {
      Writer writer = new OutputStreamWriter(out, Charset.forName("UTF-8"));
      resource = writer;
      writer.write(BOM);
      writer.write(data);
    } finally {
      resource.close();
    }
  }

以上代码将发出以下文字为前缀字节顺序标记:

The above code will emit the following text prefixed with a byte order mark:

┌──┐

├&#x2500 ;─┤

┌──┐
├──┤

记事本等Windows应用可以推断BOM中的编码并正确解码文件。

Windows apps like Notepad can infer the encoding from the BOM and decode the file correctly.

没有代码,就无法发现任何错误。

Without code, it isn't possible to spot any errors.


我是否遗漏了一些属性或命令行命令所以Java知道我想默认创建UTF-8文件吗?

Am I missing some property or command-line command so Java knows I want to create UTF-8 files by default?

否 - 没有这样的设置。有些人可能会建议在命令行中设置 file.encoding ,但这是坏主意

No - there is no such setting. Some might suggest setting file.encoding on the command line, but this is a bad idea.

我写了一篇更全面的博客文章主题此处

I wrote a more comprehensive blog post on the subject here.

这是您的代码

public class Printer implements Closeable {
  private PrintWriter pw;
  private boolean error;

  public Printer(String name) {
    try {
      pw = new PrintWriter(name, "UTF-8");
      pw.print('\uFEFF'); // BOM
      error = false;
    } catch (IOException e) {
      error = true;
    }
  }

  public void print(String s) {
    if (pw == null) return;
    pw.print(s);
    pw.flush();
  }

  public boolean checkError() { return error || pw.checkError(); }

  @Override public void close() { if (pw != null) pw.close(); }
}

您想要的大多数功能已经存在于 PrintWriter 。请注意,您应该提供一些机制来检查基础错误并关闭流(或者您有泄漏文件句柄的风险)。

Most of the functionality you want already exists in PrintWriter. Note that you should provide some mechanism to check for underlying errors and close the stream (or you risk leaking file handles).

这篇关于从可运行的Jar创建Java中的UTF-8文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆