-Dfile.encoding =的非Unicode字符集的示例是什么? [英] What is an example for non unicode character set for -Dfile.encoding=?
问题描述
我有一个JVM.字符设置为"-Dfile.encoding = UTF-8".这就是设置UTF-8的方式.我想将其设置为非Unicode字符集.
I have a JVM. where character set as "-Dfile.encoding=UTF-8" . This is how UTF-8 is set. I would want to set it to a non Unicode character set.
是否存在非Unicode字符集的示例/值,以便我可以设置为 -Dfile.encoding =
?
Is there an example/value for non unicode character set so that I can set to -Dfile.encoding=
?
推荐答案
[ TLDR => Application encoding a confusing issue, but this document from Oracle should help. ]
关于通过在运行时设置系统属性 file.encoding
来指定编码的一些重要的一般要点:
First a few important general points about specifying the encoding by setting the System Property file.encoding
at run time:
-
它的使用不受正式支持,而且从未得到过支持.摘自1998年的Java Bug报告:
J2SE平台不需要"file.encoding"属性规格;这是Sun实施的内部细节,请勿通过用户代码检查或修改.也有意只读从技术上讲,这是无法支持设置的将此属性设置为命令行或其他任意值程序执行期间的时间.
The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.
有一个JEP草案(JDK增强提案), JDK-8187041使用UTF-8作为默认字符集,该提议建议:
使用UTF-8作为Java虚拟机的默认字符集,以便API取决于默认字符集的行为在所有情况下都保持一致平台.
Use UTF-8 as the Java virtual machine's default charset so that APIs that depend on the default charset behave consistently across all platforms.
声称此应用程序使用编码{x}" 并不一定有意义,因为可能有多种与应用程序关联的编码,可以用不同的方式解决,包括:
It doesn't necessarily make sense to claim that "This application uses encoding {x}" since there may be multiple encodings associated with an application, which can be addressed in different ways, including:
- 控制台输出的文件编码.
- 应用程序源文件的文件编码.
- 文件I/O的文件编码.
- 文件路径的文件编码.
所有这些,Oracle 指定所有Java SE 8支持的编码.我找不到最新JDK版本的相应文档.请注意:
All that said, Oracle specify all encodings supported by Java SE 8. I can't find a corresponding document for more recent JDK versions. Note that:
- 编码可以是特定于环境的,基于语言环境,操作系统,Java版本等.
- 几乎每种编码都有至少一个别名.例如,简体中文的编码名称是 GBK ,但是您也可以使用 CP936 或 windows-936 .
- 大多数编码都是非Unicode的,因为Unicode编码名称包含字符串"UTF".
- 编码名称可以根据应用程序处理文件的方式而有所不同(
java.nio
API与java.io
/java.lang
API.).例如,如果在Windows上的土耳其语文件上执行一些I/O:- 如果使用
java.nio.*
类,请在运行时指定 -Dfile.encoding = windows-1254 . - 如果
java.lang.*
&使用java.io.*
类,在运行时指定 -Dfile.encoding = Cp1254 .
- Encodings can be environment specific, based on locale, operating system, Java version, etc.
- Almost every encoding has at least one alias. For example, the encoding name for simplified Chinese is GBK, but you could also use CP936 or windows-936.
- Most encoding are non Unicode since Unicode encoding names contain the string "UTF".
- An encoding name can vary depending on how the application is processing files (
java.nio
APIs vs.java.io
/java.lang
APIs.). For example, if performing some I/O on Turkish files on Windows:- If the
java.nio.*
classes are used, specify -Dfile.encoding=windows-1254 at runtime. - If the
java.lang.*
&java.io.*
classes are used, specify -Dfile.encoding=Cp1254 at runtime.
此 DZone文章提供了一段有用的代码,以显示在运行时设置 -Dfile.encoding 如何影响各种设置:
This DZone article provides a useful piece of code to show how setting -Dfile.encoding at runtime can impact various settings:
import java.io.ByteArrayInputStream; import java.io.InputStream; import java.io.InputStreamReader; import java.nio.charset.Charset; import java.util.Locale; import static java.lang.System.out; /** * Demonstrate default Charset-related details. */ public class CharsetDemo { /** * Supplies the default encoding without using Charset.defaultCharset() * and without accessing System.getProperty("file.encoding"). * * @return Default encoding (default charset). */ public static String getEncoding() { final byte [] bytes = {'D'}; final InputStream inputStream = new ByteArrayInputStream(bytes); final InputStreamReader reader = new InputStreamReader(inputStream); final String encoding = reader.getEncoding(); return encoding; } public static void main(final String[] arguments) { out.println("Default Locale: " + Locale.getDefault()); out.println("Default Charset: " + Charset.defaultCharset()); out.println("file.encoding; " + System.getProperty("file.encoding")); out.println("sun.jnu.encoding: " + System.getProperty("sun.jnu.encoding")); out.println("Default Encoding: " + getEncoding()); } }
在Windows 10上使用Java 12指定 -Dfile.encoding = 860 (为 MS-DOS葡萄牙语的别名)时,以下是一些示例输出:
Here's some sample output when specifying -Dfile.encoding=860 (an alias for MS-DOS Portuguese) using Java 12 on Windows 10:
run: Default Locale: en_US Default Charset: IBM860 file.encoding: 860 sun.jnu.encoding: Cp1252 Default Encoding: Cp860 BUILD SUCCESSFUL (total time: 0 seconds)
在所有目标平台上测试您计划在运行时指定的编码.您可能会得到意想不到的结果.例如,当我在Windows 10上使用 -Dfile.encoding = IBM864 ( PC阿拉伯语)运行上述代码时,它可以工作,但对 -Dfile.encoding却失败= IBM420 ( IBM阿拉伯语).
Test the encoding you plan to specify at run time on all target platforms. You may get unexpected results. For example, when I run the code above on Windows 10 with -Dfile.encoding=IBM864 (PC Arabic) it works, but fails with -Dfile.encoding=IBM420 (IBM Arabic).
这篇关于-Dfile.encoding =的非Unicode字符集的示例是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- If the
- 如果使用