Java代码错误地读取了UTF-8文本 [英] Java code reads UTF-8 text incorrectly

查看:99
本文介绍了Java代码错误地读取了UTF-8文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

读取代码中的UTF-8字符时遇到问题(在Eclipse上运行).

I'm having a problem reading UTF-8 characters in my code (running on Eclipse).

我有一个文件text,其中有几行,例如:

I have a file text which has a few lines in it, for example:

אך  1234

注意:该词前面有一个\t,该词应该出现在左侧,数字在右侧...我不知道如何在此处将其取反,对不起.

NOTE: There is a \t before the word, and the word should appear on the left, the number on the right... I don't know how to reverse them here, sorry.

即希伯来语单词,然后是数字.

That is, a Hebrew word and then a number.

我需要以某种方式将单词与数字分开.我试过了:

I need to separate the word from the number somehow. I tried this:

        BufferedReader br = new BufferedReader(new FileReader(text));
        String content;

        while ((content = br.readLine()) != null) 
        {
            String delims = "[ ]+";
            String[] tokens = content.split(delims);
        }

问题是由于某种原因,代码读取content(文件的第一行)的方式如下:

The problem is that for some reason, the code reads content (the first line in the file) as follows:

אך\t1234

...表示该空间不在正确的位置.

...meaning that the space isn't in its correct place.

我想我可以使用\t标记文本,但是我不确定是否应该这样做,因为文件未正确读取...

I suppose I could tokenize the text using the \t, but I'm not sure I should do it, as the file isn't being read correctly...

有人知道为什么会这样吗?

Does anyone have any idea why this happens?

非常感谢:-)

推荐答案

我认为当实际上有一个选项卡时,您正在匹配一个空格?

I think you are matching a space when there actually is a tab there?

您可以尝试以下方法吗?

Can you try this:

BufferedReader br = new BufferedReader(new FileReader(text));
String content;

while ((content = br.readLine()) != null) 
{
    String delims = "\\s";
    String[] tokens = content.split(delims);
}

这篇关于Java代码错误地读取了UTF-8文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆