在java中将一个字符串从一个编码解释为另一个 [英] Interpret a string from one encoding to another in java
问题描述
我已经看了周围的答案(我确定他们在那里),我不知道这是可能的。
I've looked around for answers to this (I'm sure they're out there), and I'm not sure it's possible.
所以,我收到一个包含单词för的巨大文件。我正在使用RandomAccessFile,因为我知道它在哪里(种),因此可以使用seek()函数到达那里。
So, I got a HUGE file that contains the word "för". I'm using RandomAccessFile because I know where it is (kind of) and can therefore use the seek() function to get there.
要知道我已经找到我在我的程序中有一个字符串för,我检查平等。这是问题,我运行调试器,当我得到för我得到的比较是för。
To know that I've found it I have a String "för" in my program that I check for equality. Here's the problem, I ran the debugger and when I get to "för" what I get to compare is "för".
所以我的程序终止,没有找到任何för。
So my program terminates without finding any "för".
这是我用来得到一个字的代码:
This is the code I use to get a word:
private static String getWord(RandomAccessFile file) throws IOException {
StringBuilder stb = new StringBuilder();
String word;
char c;
c = (char)file.read();
int end;
do {
stb.append(c);
end = file.read();
if(end==-1)
return "-1";
c = (char)end;
} while (c != ' ');
word = stb.toString();
word.trim();
return word;
}
所以基本上我将文件中当前点的所有字符从第一个特征所以基本上我得到这个词,但是从(char)file.read();读一个字节(我想),UTF-8'ö'成为两个字符'Ã'和'¶'?
So basically I return all the characters from the current point in the file to the first ' '-character. So basically I get the word, but since (char)file.read(); reads a byte (I think), UTF-8 'ö' becomes the two characters 'Ã' and '¶'?
这个猜测的一个原因是,如果我打开我的文件与UTF-8编码它是för,但如果我打开文件与ISO-8859-15在同一个地方,我们现在具有我的getWord方法返回:för
One reason for this guess is that if I open my file with encoding UTF-8 it's "för" but if I open the file with ISO-8859-15 in the same place we now have exactly what my getWord method returns: "för"
所以我的问题:
当我坐在一个för和fÃrr时,有没有办法解决这个问题?喜欢说读fÃr好像是一个UTF-8字符串来获取för?
When I'm sitting with a "för" and a "för", is there any way to fix this? Like saying "read "för" as if it was an UTF-8 string" to get "för"?
推荐答案
import java.nio.charset.Charset;
String encodedString = new String(originalString.getBytes("ISO-8859-15"), Charset.forName("UTF-8"));
这篇关于在java中将一个字符串从一个编码解释为另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!