为什么 Java BufferedReader() 不能正确读取阿拉伯语和中文字符? [英] Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?

查看:19
本文介绍了为什么 Java BufferedReader() 不能正确读取阿拉伯语和中文字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试读取一个包含英文 & 的文件每一行上的阿拉伯字符和另一个包含英语 & 的文件每行汉字.然而,阿拉伯文和中文的字符未能正确显示——它们只是以问号的形式出现.知道如何解决这个问题吗?

I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem?

这是我用来阅读的代码:

Here is the code I use for reading:

try {
        String sCurrentLine;
        BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
        int counter = 0;

        while ((sCurrentLine = br.readLine()) != null) {
            String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
            System.out.println("The line number "+ counter
                               + " contain : " + sCurrentLine);
            counter++;
        }
    }

<小时>

第 1 版

阅读该行并获取阿拉伯语和中文单词后,我使用一个函数来翻译它们,只需在 ArrayList(包含所有预期单词)中搜索 Given Arab Text(使用 indexOf(); 方法).然后当找到单词的索引时,它被用来调用在另一个 Arraylist 中具有相同索引的英文单词.但是,此搜索始终返回 false,因为它在搜索问号而不是阿拉伯语和中文字符时失败.所以我的 System.out.println 打印显示了空值,每次翻译失败都会显示一个空值.

After reading the line and getting the Arabic and Chinese word I use a function to translate them by simply searching for Given Arabic Text in an ArrayList (which contain all expected words) (using indexOf(); method). Then when the word's index is found it's used to call the English word which has the same index in another Arraylist. However this search always returns false because it fails when searching the question marks instead of the Arabic and Chinese characters. So my System.out.println print shows me nulls, one for each failure to translate.

*我使用的是 Netbeans 6.8 Mac 版 IDE

*I'm using Netbeans 6.8 Mac version IDE

第 2 版

这是搜索翻译的代码:

        int testColor = dbColorArb.indexOf(wordToTranslate);
        int testBrand = -1;
        if ( testColor != -1 ) {
            String result = (String)dbColorEng.get(testColor);
            return result;
        } else {
            testBrand = dbBrandArb.indexOf(wordToTranslate);
        }
        //System.out.println ("The testBrand is : " + testBrand);
        if ( testBrand != -1 ) {
            String result = (String)dbBrandEng.get(testBrand);
            return result;
        } else {
            //System.out.println ("The first null");
            return null;
        }

我实际上正在搜索 2 个可能包含要翻译的单词的 Arraylist.如果在两个 ArrayList 中都找不到它们,则返回 null.

I'm actually searching 2 Arraylists which might contain the the desired word to translate. If it fails to find them in both ArrayLists, then null is returned.

第 3 版

当我调试时,我发现正在读取的行存储在我的 String 变量中,如下所示:

When I debug I found that lines being read are stored in my String variable as the following:

 "3;0000000000;0000001001;1996-06-22;;2010-01-27;����;;01989;������;"

<小时>

第 3 版

我正在阅读的文件在被另一个程序修改后给了我(除了它是在 VB 中制作的,我一无所知)该程序使显示不正确的阿拉伯字母出现.当我在 Notepad++ 上检查文件的编码时,它显示它是 ANSI.但是,当我将其转换为 UTF8(将阿拉伯字母替换为其他英文字母)然后将其转换回 ANSI 时,阿拉伯语变成了问号!

The file I'm reading has been given to me after it has been modified by another program (which I know nothing about beside it's made in VB) the program made the Arabic letters that are not appearing correctly to appear. When I checked the encoding of the file on Notepad++ it showed that it's ANSI. however when I convert it to UTF8 (which replaced the Arabic letter with other English one) and then convert it back to ANSI the Arabic become question marks!

推荐答案

FileReader javadoc:

FileReader javadoc:

读取字符文件的便利类.此类的构造函数假定默认字符编码和默认字节缓冲区大小是合适的.要自己指定这些值,请在 FileInputStream 上构造 InputStreamReader.

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

所以:

Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);

如果这仍然不起作用,那么可能您的控制台没有设置为正确显示 UTF-8 字符.配置取决于所使用的 IDE,而且相当简单.

If this still doesn't work, then perhaps your console is not set to properly display UTF-8 characters. Configuration depends on the IDE used and is rather simple.

更新:在上面的代码中将utf-8替换为cp1256.这对我来说很好用(WinXP,JDK6)

Update : In the above code replace utf-8 with cp1256. This works fine for me (WinXP, JDK6)

但我建议您坚持使用 UTF-8 生成文件.因为cp1256对中文不起作用,你会再次遇到类似的问题.

But I'd recommend that you insist on the file being generated using UTF-8. Because cp1256 won't work for Chinese and you'll have similar problems again.

这篇关于为什么 Java BufferedReader() 不能正确读取阿拉伯语和中文字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆