Java文件编码从ANSI转换为UTF8 [英] Java file encoding conversion from ANSI to UTF8

查看:370
本文介绍了Java文件编码从ANSI转换为UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将文件的编码从ANSI(windows-1252)更改为UTF8。我写了下面的程序来通过Java做到这一点。该程序将字符转换为UTF8,但是当我在notepad ++中打开文件时,编码类型显示为ANSI,为UTF8。当我将此文件导入Access数据库时,这给了我错误。只需要使用UTF8编码的文件。

I have a requirement to change the encoding of a file from ANSI(windows-1252) to UTF8. I wrote below program to do it through java. This program converts the characters to UTF8, but when I opened the file in notepad++ the encoding type was displayed as ANSI as UTF8. This gives me error when I import this file in access db. A file with UTF8 encoding only is desired. Also the requirement is to convert the file without opening it in any editor.

public class ConvertFromAnsiToUtf8 {

    private static final char BYTE_ORDER_MARK = '\uFEFF';
    private static final String ANSI_CODE = "windows-1252";
    private static final String UTF_CODE = "UTF8";
    private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);

    public static void main(String[] args) {

        List<File> fileList;
        File inputFolder = new File(args[0]);
        if (!inputFolder.isDirectory()) {
            return;
        }
        File parentDir = new File(inputFolder.getParent() + "\\"
                    + inputFolder.getName() + "_converted");

        if (parentDir.exists()) {
            return;
        }
        if (parentDir.mkdir()) {

        } else {
            return;
        }

        fileList = new ArrayList<File>();
        for (final File fileEntry : inputFolder.listFiles()) {
            fileList.add(fileEntry);
        }

        InputStream in;

        Reader reader = null;
        Writer writer = null;
        try {
            for (File file : fileList) {
                in = new FileInputStream(file.getAbsoluteFile());
                reader = new InputStreamReader(in, ANSI_CHARSET);

                OutputStream out = new FileOutputStream(
                            parentDir.getAbsoluteFile() + "\\"
                                            + file.getName());
                writer = new OutputStreamWriter(out, UTF_CODE);
                writer.write(BYTE_ORDER_MARK);
                char[] buffer = new char[10];
                int read;
                while ((read = reader.read(buffer)) != -1) {
                    System.out.println(read);
                    writer.write(buffer, 0, read);
                }
            }
            reader.close();
            writer.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

任何指针都将有所帮助。

Any pointers will be helpful.

谢谢,
Ashish

Thanks, Ashish

推荐答案

已发布的代码

Notepad ++消息令人困惑,因为 ANSI as UTF-8没有明显的含义。它似乎是Notepad ++中的开放缺陷。我相信Notepad ++的意思是没有BOM的UTF-8 (请参阅编码菜单。)

The Notepad++ message is confusing because "ANSI as UTF-8" has no obvious meaning; it appears to be an open defect in Notepad++. I believe Notepad++ means UTF-8 without BOM (see the encoding menu.)

作为Windows程序的Microsoft Access可能希望使用UTF- 8个文件以字节顺序标记开头( BOM )。

Microsoft Access, being a Windows program, probably expects UTF-8 files to start with a byte-order-mark (BOM).

您可以通过在文件开头写入代码点U + FEFF将BOM注入文档中:

You can inject a BOM into the document by writing the code point U+FEFF at the start of the file:

import java.io.*;
import java.nio.charset.*;

public class Ansi1252ToUtf8 {
  private static final char BYTE_ORDER_MARK = '\uFEFF';

  public static void main(String[] args) throws IOException {
    Charset windows1252 = Charset.forName("windows-1252");
    try (InputStream in = new FileInputStream(args[0]);
        Reader reader = new InputStreamReader(in, windows1252);
        OutputStream out = new FileOutputStream(args[1]);
        Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8)) {
      writer.write(BYTE_ORDER_MARK);
      char[] buffer = new char[1024];
      int read;
      while ((read = reader.read(buffer)) != -1) {
        writer.write(buffer, 0, read);
      }
    }
  }
}

这篇关于Java文件编码从ANSI转换为UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆