Java文件编码转换 [英] Java file encoding conversion
问题描述
我需要将文件的编码从ANSI(Windows-1252)更改为UTF8。我写下面的程序通过java来做到这一点。该程序将字符转换为UTF8,但是当我以notepade ++打开文件时,编码类型显示为ANSI为UTF8。当我在访问数据库中导入此文件时,这会给我错误。只需要具有UTF8编码的文件。此外,要求是在任何编辑器中转换文件而不打开它。
I have a requirement to change the encoding of a file from ANSI(windows-1252) to UTF8. I wrote below program to do it through java. This program converts the characters to UTF8, but when I opened the file in notepade++ the encoding type was displayed as ANSI as UTF8. This gives me error when I import this file in access db. A file with UTF8 encoding only is desired. Also the requirement is to convert the file without opening it in any editor.
public class ConvertFromAnsiToUtf8 {
public class ConvertFromAnsiToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';
private static final String ANSI_CODE = "windows-1252";
private static final String UTF_CODE = "UTF8";
private static final Charset ANSI_CHARSET = Charset.forName(ANSI_CODE);
public static void main(String[] args) {
List<File> fileList;
File inputFolder = new File(args[0]);
if (!inputFolder.isDirectory()) {
return;
}
File parentDir = new File(inputFolder.getParent() + "\\"
+ inputFolder.getName() + "_converted");
if (parentDir.exists()) {
return;
}
if (parentDir.mkdir()) {
} else {
return;
}
fileList = new ArrayList<File>();
for (final File fileEntry : inputFolder.listFiles()) {
fileList.add(fileEntry);
}
InputStream in;
Reader reader = null;
Writer writer = null;
try {
for (File file : fileList) {
in = new FileInputStream(file.getAbsoluteFile());
reader = new InputStreamReader(in, ANSI_CHARSET);
OutputStream out = new FileOutputStream(
parentDir.getAbsoluteFile() + "\\"
+ file.getName());
writer = new OutputStreamWriter(out, UTF_CODE);
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[10];
int read;
while ((read = reader.read(buffer)) != -1) {
System.out.println(read);
writer.write(buffer, 0, read);
}
}
reader.close();
writer.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
任何指针都将有所帮助。
Any pointers will be helpful.
谢谢,
Ashish
Thanks, Ashish
推荐答案
正确的从Windows-1252转码为UTF-8。
The posted code correctly transcodes from windows-1252 to UTF-8.
Notepad ++消息令人困惑,因为ANSI as UTF-8没有明显的含义;它似乎是Notepad ++中的打开缺陷。我相信Notepad ++意味着没有BOM的UTF-8 (参见编码菜单。)
The Notepad++ message is confusing because "ANSI as UTF-8" has no obvious meaning; it appears to be an open defect in Notepad++. I believe Notepad++ means UTF-8 without BOM (see the encoding menu.)
作为Windows程序的Microsoft Access可能希望使用UTF- 8个文件以字节顺序标记开始( BOM )。
Microsoft Access, being a Windows program, probably expects UTF-8 files to start with a byte-order-mark (BOM).
您可以通过在文件开头写入代码点U + FEFF将文档注入文档:
You can inject a BOM into the document by writing the code point U+FEFF at the start of the file:
import java.io.*;
import java.nio.charset.*;
public class Ansi1252ToUtf8 {
private static final char BYTE_ORDER_MARK = '\uFEFF';
public static void main(String[] args) throws IOException {
Charset windows1252 = Charset.forName("windows-1252");
try (InputStream in = new FileInputStream(args[0]);
Reader reader = new InputStreamReader(in, windows1252);
OutputStream out = new FileOutputStream(args[1]);
Writer writer = new OutputStreamWriter(out, StandardCharsets.UTF_8)) {
writer.write(BYTE_ORDER_MARK);
char[] buffer = new char[1024];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
}
}
}
这篇关于Java文件编码转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!