如何在java中阅读非英文文本?它们以错误的编码表示 [英] How to read non-english texts in java? They are represented in wrong encoding

查看:129
本文介绍了如何在java中阅读非英文文本?它们以错误的编码表示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用apache HttpClient。而当我试图阅读网站时,所有的非英语内容都是错误的。



实际上,它在Windows-1252中表示,但应该是UTF -8。我如何解决这个问题?



我试图使用 InputStreamReader(inputStream,Charset.forName(UTF-8)),但它没有帮助(错误的符号转换成????)。

解决方案

如果该文件在Windows-1252中,那么告诉它使用UTF-8是不行的。给它Windows-1252作为字符集名称,然后您可以读取正确的数据。知道什么样的格式数据应该不应该像在知道中实际之间的格式一样有用:)



这取决于您是否以UTF-8 ...


重写

I use apache HttpClient. And when I'm trying to "read site", all non-english content is represented wrongly.

Actually, it's represented in windows-1252 but it should be in UTF-8. How can I fix this?

I tried to use InputStreamReader (inputStream, Charset.forName ("UTF-8")), but it didn't help (wrong symbols transformed into ????????).

解决方案

If the file is in Windows-1252, then telling it to use UTF-8 isn't going to work. Give it Windows-1252 as the charset name, and then you can read the correct data. Knowing what format data should be in isn't nearly as useful as knowing what format it's actually in :)

It's up to you whether you then rewrite it in UTF-8...

这篇关于如何在java中阅读非英文文本?它们以错误的编码表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆