为什么我的String返回“ \ufffd\ufffdN a m e”? [英] Why is my String returning "\ufffd\ufffdN a m e"

查看:1534
本文介绍了为什么我的String返回“ \ufffd\ufffdN a m e”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的方法

  public void readFile3()引发IOException 
{
try
{
FileReader fr = new FileReader(Path3);
BufferedReader br =新的BufferedReader(fr);
String s = br.readLine();
int a = 1;
while(a!= 2)
{
s = br.readLine();
a ++;

}
Storage.add(s);

br.close();

}
catch(IOException e)
{
System.out.println(e.getMessage());
}
}

由于某种原因,我无法读取文件仅包含此
名称
Intel(R)Core(TM)i5-2500 CPU @ 3.30GHz



当我调试代码时,字符串s作为 \ufffd\ufffdN ame返回,我不知道这些额外字符从何而来。。这使我无法正确读取文件。

解决方案

\ufffd是unicode中的替换字符,当您尝试读取在其中没有表示形式的代码时使用它unicode。我想您在Windows平台上(或者至少您读取的文件是在Windows上创建的)。 Windows支持多种格式的文本文件,其中最常见的是Ansi:每个字符都可以表示,但是其ASCII代码却可以。



但是Windows可以直接使用UTF16,其中每个字符都可以表示通过其unicode码作为16位整数,因此每个字符2个字节。这些文件使用特殊标记(Windows方言中的字节顺序标记)说:




  • 该文件编码为2(甚至4)。每个字符的字节数

  • 编码是小端字节序还是大端字节



(参考:在MSDN上使用字节顺序标记



在前两个替换字符 N ame 而不是 Name ,我想您有一个UTF16编码的文本文件。记事本可以透明地编辑这些文件(甚至不说出您的实际格式),但是其他工具的确存在这些问题。
出色的 vim 可以读取具有不同编码的文件并在它们之间进行转换。



如果您想直接在Java中使用这种文件,则必须使用UTF-16字符集。从 Charset 上的JaveSE 7 javadoc: UTF-16十六位UCS转换格式,字节顺序由可选的字节顺序标记

This is my method

public void readFile3()throws IOException
{
    try
    {
        FileReader fr = new FileReader(Path3);
        BufferedReader br = new BufferedReader(fr);
        String s = br.readLine();
        int a =1;
        while( a != 2)
        {
            s = br.readLine();
            a ++; 

        }
        Storage.add(s);

        br.close();

    }
    catch(IOException e)
    {
        System.out.println(e.getMessage());
    }
}

For some reason I am unable to read the file which only contains this " Name Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz "

When i debug the code the String s is being returned as "\ufffd\ufffdN a m e" and i have no clue as to where those extra characters are coming from.. This is preventing me from properly reading the file.

解决方案

\ufffd is the replacement character in unicode, it is used when you try to read a code that has no representation in unicode. I suppose you are on a Windows platform (or at least the file you read was created on Windows). Windows supports many formats for text files, the most common is Ansi : each character is represented but its ansi code.

But Windows can directly use UTF16, where each character is represented by its unicode code as a 16bits integer so with 2 bytes per character. Those files uses special markers (Byte Order Mark in Windows dialect) to say :

  • that the file is encoded with 2 (or even 4) bytes per character
  • the encoding is little or big endian

(Reference : Using Byte Order Marks on MSDN)

As you write after the first two replacement characters N a m e and not Name, I suppose you have an UTF16 encoded text file. Notepad can transparently edit those files (without even saying you the actual format) but other tools do have problems with those ... The excellent vim can read files with different encodings and convert between them.

If you want to use directly this kind of file in java, you have to use the UTF-16 charset. From JaveSE 7 javadoc on Charset : UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

这篇关于为什么我的String返回“ \ufffd\ufffdN a m e”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆