为什么Java BufferedReader()不正确地读取阿拉伯语和汉字? [英] Why is Java BufferedReader() not reading Arabic and Chinese characters correctly?

查看:139
本文介绍了为什么Java BufferedReader()不正确地读取阿拉伯语和汉字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试阅读一个包含英文和每行上的阿拉伯字符和另一个文件,其中包含英文和每行汉字。然而,阿拉伯语和中文的人物无法正确显示 - 他们只是出现问号。任何想法如何解决这个问题?



这是我用于阅读的代码:

  try {
String sCurrentLine;
BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
int counter = 0;

while((sCurrentLine = br.readLine())!= null){
String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
System.out.println(行号+ counter
+包含:+ sCurrentLine);
counter ++;
}
}






版本01



阅读完毕,获得阿拉伯语和中文单词后,我使用一个函数来翻译它们,只需搜索 ArrayList(其中包含所有预期单词)(使用indexOf();方法)中的阿拉伯文本。然后,当找到该单词的索引时,它用于调用另一个Arraylist中具有相同索引的英文单词。但是,该搜索总是返回false,因为在搜索问号而不是阿拉伯语和汉字时失败。所以我的System.out.println打印显示我的空值,一个为每个失败的翻译。



*我使用Netbeans 6.8 Mac版本IDE






版本02



翻译:

  int testColor = dbColorArb.indexOf(wordToTranslate); 
int testBrand = -1;
if(testColor!= -1){
String result =(String)dbColorEng.get(testColor);
返回结果;
} else {
testBrand = dbBrandArb.indexOf(wordToTranslate);
}
//System.out.println(testBrand is:+ testBrand);
if(testBrand!= -1){
String result =(String)dbBrandEng.get(testBrand);
返回结果;
} else {
//System.out.println(第一个null);
返回null;
}

我实际上正在搜索2个可能包含所需单词的ArrayList 。如果没有在两个ArrayLists中找到它们,则返回null。






版本03



当我调试时,我发现读取的行存储在我的String变量中,如下所示:

 3; 0000000000; 0000001001; 1996-06-22 ;; 2010-01-27;    ;; 01989;      ; 






版本03 / p>

我正在阅读的文件已经被另一个程序修改了(在VB旁边我完全不知道),该程序使阿拉伯语没有正确显示的字母。当我在Notepad ++上检查该文件的编码时,表明它是ANSI。然而,当我把它转换成UTF8(用其他英文替换阿拉伯语的信件)然后将其转换回ANSI时,阿拉伯语成为问号!

解决方案

FileReader javadoc


阅读字符文件的便利类。该类的构造函数假定默认字符编码和默认字节缓冲区大小是适当的。要自己指定这些值,请在FileInputStream上构造一个InputStreamReader。


所以:

  reader = new InputStreamReader(new FileInputStream(fileName),utf-8); 
BufferedReader br = new BufferedReader(reader);

如果仍然不起作用,那么也许您的控制台未设置为正确显示UTF-8字符。配置取决于使用的IDE,而且很简单。



更新:在上面的代码中,替换为 utf-8 cp1256 。这对我来说很好(WinXP,JDK6)



但是我建议您坚持使用UTF-8生成的文件。因为 cp1256 对中文不起作用,您将再次遇到类似的问题。


I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem?

Here is the code I use for reading:

try {
        String sCurrentLine;
        BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
        int counter = 0;

        while ((sCurrentLine = br.readLine()) != null) {
            String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
            System.out.println("The line number "+ counter
                               + " contain : " + sCurrentLine);
            counter++;
        }
    }


Edition 01

After reading the line and getting the Arabic and Chinese word I use a function to translate them by simply searching for Given Arabic Text in an ArrayList (which contain all expected words) (using indexOf(); method). Then when the word's index is found it's used to call the English word which has the same index in another Arraylist. However this search always returns false because it fails when searching the question marks instead of the Arabic and Chinese characters. So my System.out.println print shows me nulls, one for each failure to translate.

*I'm using Netbeans 6.8 Mac version IDE


Edition 02

Here is the code which search for translation:

        int testColor = dbColorArb.indexOf(wordToTranslate);
        int testBrand = -1;
        if ( testColor != -1 ) {
            String result = (String)dbColorEng.get(testColor);
            return result;
        } else {
            testBrand = dbBrandArb.indexOf(wordToTranslate);
        }
        //System.out.println ("The testBrand is : " + testBrand);
        if ( testBrand != -1 ) {
            String result = (String)dbBrandEng.get(testBrand);
            return result;
        } else {
            //System.out.println ("The first null");
            return null;
        }

I'm actually searching 2 Arraylists which might contain the the desired word to translate. If it fails to find them in both ArrayLists, then null is returned.


Edition 03

When I debug I found that lines being read are stored in my String variable as the following:

 "3;0000000000;0000001001;1996-06-22;;2010-01-27;����;;01989;������;"


Edition 03

The file I'm reading has been given to me after it has been modified by another program (which I know nothing about beside it's made in VB) the program made the Arabic letters that are not appearing correctly to appear. When I checked the encoding of the file on Notepad++ it showed that it's ANSI. however when I convert it to UTF8 (which replaced the Arabic letter with other English one) and then convert it back to ANSI the Arabic become question marks!

解决方案

FileReader javadoc:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

So:

Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);

If this still doesn't work, then perhaps your console is not set to properly display UTF-8 characters. Configuration depends on the IDE used and is rather simple.

Update : In the above code replace utf-8 with cp1256. This works fine for me (WinXP, JDK6)

But I'd recommend that you insist on the file being generated using UTF-8. Because cp1256 won't work for Chinese and you'll have similar problems again.

这篇关于为什么Java BufferedReader()不正确地读取阿拉伯语和汉字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆