如何在java中阅读非英文文本?它们以错误的编码表示 [英] How to read non-english texts in java? They are represented in wrong encoding
问题描述
实际上,它在Windows-1252中表示,但应该是UTF -8。我如何解决这个问题?
我试图使用 InputStreamReader(inputStream,Charset.forName(UTF-8))
,但它没有帮助(错误的符号转换成????)。
如果该文件在Windows-1252中,那么告诉它使用UTF-8是不行的。给它Windows-1252作为字符集名称,然后您可以读取正确的数据。知道什么样的格式数据应该不应该像在知道中实际之间的格式一样有用:)
这取决于您是否以UTF-8 ...
重写
I use apache HttpClient. And when I'm trying to "read site", all non-english content is represented wrongly.
Actually, it's represented in windows-1252 but it should be in UTF-8. How can I fix this?
I tried to use InputStreamReader (inputStream, Charset.forName ("UTF-8"))
, but it didn't help (wrong symbols transformed into ????????).
If the file is in Windows-1252, then telling it to use UTF-8 isn't going to work. Give it Windows-1252 as the charset name, and then you can read the correct data. Knowing what format data should be in isn't nearly as useful as knowing what format it's actually in :)
It's up to you whether you then rewrite it in UTF-8...
这篇关于如何在java中阅读非英文文本?它们以错误的编码表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!