从文件读取特殊字符-Java [英] Reading special characters from File - Java

查看:285
本文介绍了从文件读取特殊字符-Java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从具有以下属性的文本文件中读取数据:

I am reading data from a text file with following properties:

编码:ANSI
文件类型:PC

Encoding: ANSI
File Type: PC

现在,该文件包含许多特殊字符,例如度数符号(º)等.我正在使用以下代码读取此文件:

Now, the file contains lot of special characters like degree symbol(º) etc. I am reading this file using the following code:

File file = new File("C:\\X\\Y\\SpecialCharacter.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));

如果文件编码为ANSI,则上述代码无法正确读取特殊字符,例如e.文件中的行:
降低热量并慢炖,直到产品达到内部温度165ºF",reader.readLine()将输出:
降低热量并慢火煮至产品达到内部温度165°F"

If the file encoding is ANSI, the above code does not read the special characters properly e.x. the line in file:
"Lower heat and simmer until product reaches internal temperature of 165ºF" , reader.readLine() would output:
"Lower heat and simmer until product reaches internal temperature of 165�F"

当我将文件的编码更改为UTF-8时,该行将按文件中的原样进行读取,而不会弄乱特殊字符.

When I changed the encoding for the file to UTF-8, the line reads as it is in the file without messing up the special characters.

我的问题是,数据什么时候被弄乱了?将数据存储在文件中还是从文件中读取数据?在记事本中打开文件会正确显示所有特殊字符.这是怎么发生的?

My question, at what point does the data get messed up? When storing the data in the file or when reading it from the file? Opening the file in Notepad displays all the special characters properly. How does that happen ?

Hexdump输出:

Hexdump output:

          -0 -1 -2 -3  -4 -5 -6 -7  -8 -9 -A -B  -C -D -E -F

00000000- 4C 6F 77 65  72 20 68 65  61 74 20 61  6E 64 20 73 [Lower heat and s]
00000001- 69 6D 6D 65  72 20 75 6E  74 69 6C 20  70 72 6F 64 [immer until prod]
00000002- 75 63 74 20  72 65 61 63  68 65 73 20  69 6E 74 65 [uct reaches inte]
00000003- 72 6E 61 6C  20 74 65 6D  70 65 72 61  74 75 72 65 [rnal temperature]
00000004- 20 6F 66 20  31 36 35 BA  46                       [ of 165.F       ]

推荐答案

"ANSI"不是特定的编码-它是整个 collection 编码.读取文件时,您需要使用 right 编码.例如,您完全有可能使用 Windows-1252 编码,这意味着您可以想要尝试传入"Cp1252"作为编码名称.

"ANSI" is not a particular encoding - it's a whole collection of encodings. You need to use the right encoding when reading the file. For example, it's entirely possible that you're using the Windows-1252 encoding, which means you may want to try passing in "Cp1252" as the encoding name.

实际上,您传入的是"UTF-8",它不是 通常被称为ANSI的一种编码.您需要找出文件使用的确切编码,然后在InputStreamReader参数中指定该编码.

In fact, you're passing in "UTF-8" which isn't one of the encodings typically referred to as ANSI. You need to find out the exact encoding that the file uses, and then specify that in the InputStreamReader parameter.

我的问题是,数据什么时候被弄乱了?将数据存储在文件中还是从文件中读取数据?

My question, at what point does the data get messed up? When storing the data in the file or when reading it from the file?

假设编码能够表示您感兴趣的所有字符,那么仅当您读取文件时才可以.基本上,您实际上是试图以一种编码来读取它,而实际上是以另一种编码来读取它.记事本正在执行某种启发式编码检测,或者恰好在这种特定情况下使用正确的默认 .

Assuming the encoding is capable of representing all the characters you're interested in, it's only when you read the file. Basically, you're trying to read it as if it's in one encoding, when it's actually in another. Notepad is either performing some sort of heuristic encoding detection, or it happens to use the right default for this particular situation.

这篇关于从文件读取特殊字符-Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆