需要帮助确定文本的编码 [英] Need help determining encoding of the text

查看：108 发布时间：2016/11/19 17:08:57 character-encoding

本文介绍了需要帮助确定文本的编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是在Windows-1251中显示的未知编码的西里尔文本。很确定它不是UTF8，ISO8859-5或KOI8。我无法确定实际的编码，有没有人有线索？

因此，原始字符串首先被编码为utf8，然后在iso-8859-1中解释，然后结果再次编码为utf-8。解决方案在java中给出。假设你有原始字节访问，否则需要更多的代码来获取它们。

 基础字节是基于显示在windows-1251 
 
 byte [] rawBytes = {（字节）0xc3，（字节）0x90，（字节）0xc2，（字节）0x9f， （字节）0xc3，（字节）0x90，（字节）0xc2，
（字节）0xbe，（字节）0xc3，（字节）0x90，（字节）0xc2，（字节）0xbb，字节）0x90，
（字节）0xc2，（字节）0xbd，（字节）0xc3，（字节）0x91，（字节）0xc2，（字节）0x8b， 0x90，（字节）0xc2，（字节）0xb9，（字节）0x20，（字节）0xc3，（字节）0x90，（字节）0xc2，
 （字节）0xc2，（字节）0xb0，（字节）0xc3，（字节）0x90，
（字节）0xc2，（字节）0xb4， （byte）0x80}; 
 
 //或者这也可以工作：
 // Charset windows1251 = Charset.forName（Windows-1251）; 
 // byte [] rawBytes = windows1251.encode（ГђВџГђѕђђђ»ђЅЅЅЅЅ<°ґґґґЂ）。 
 
 Charset utf8 = Charset.forName（utf-8）; 
 String asUTF8 = utf8.decode（ByteBuffer.wrap（rawBytes））。toString（）; 
 
 //将中间字符串
 //再次转换为byte []所需的中间步骤。使用Iso-8859-1，因为它映射256第一
 // unicode指向正确的字节值0-255 
 
 Charset iso88591 = Charset.forName（ISO-8859-1 ）; 
 byte [] bytes = iso88591.encode（asUTF8）.array（）; 
 
 String finalResult = utf8.decode（ByteBuffer.wrap（bytes））。toString（）; 
 System.out.println（finalResult）; 
 //Полныйкадр

This is cyrillic text of unknown encoding displayed in windows-1251. Pretty sure it's not UTF8, ISO8859-5 or KOI8. I couldn't determine the actual encoding, does anyone has a clue?



ГђВџГђВѕГђВ»ГђВЅГ‘В‹ГђВ№ ГђВєГђВ°ГђВґГ‘ВЂ
 解决方案 
So the original string has first been encoded as utf8, then interpreted in iso-8859-1 and then the result again encoded as utf-8. Solution given in java. Assumes you have the raw byte access, otherwise more code is required to get them.
//The underlying bytes are these, based on the characters being displayed in windows-1251

byte[] rawBytes = {(byte)0xc3,(byte)0x90,(byte)0xc2,(byte)0x9f,(byte)0xc3,(byte)0x90,(byte)0xc2,
                    (byte)0xbe,(byte)0xc3,(byte)0x90,(byte)0xc2,(byte)0xbb,(byte)0xc3,(byte)0x90,
                    (byte)0xc2,(byte)0xbd,(byte)0xc3,(byte)0x91,(byte)0xc2,(byte)0x8b,(byte)0xc3,
                    (byte)0x90,(byte)0xc2,(byte)0xb9,(byte)0x20,(byte)0xc3,(byte)0x90,(byte)0xc2,
                    (byte)0xba,(byte)0xc3,(byte)0x90,(byte)0xc2,(byte)0xb0,(byte)0xc3,(byte)0x90,
                    (byte)0xc2,(byte)0xb4,(byte)0xc3,(byte)0x91,(byte)0xc2,(byte)0x80};

//alternatively this will work just as well:
//Charset windows1251 = Charset.forName("Windows-1251");
//byte[] rawBytes = windows1251.encode("ГђВџГђВѕГђВ»ГђВЅГ‘В‹ГђВ№ ГђВєГђВ°ГђВґГ‘ВЂ").array();

Charset utf8 = Charset.forName("utf-8");
String asUTF8 = utf8.decode(ByteBuffer.wrap(rawBytes)).toString();

//Intermediate step required to convert the intermediate string
//to byte[] again. Iso-8859-1 is used because it maps 256 first 
//unicode points exactly to byte values of 0-255

Charset iso88591 = Charset.forName( "ISO-8859-1");
byte[] bytes = iso88591.encode(asUTF8).array();

String finalResult = utf8.decode( ByteBuffer.wrap(bytes)).toString();
System.out.println(finalResult);
//Полный кадр


                        
这篇关于需要帮助确定文本的编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

需要帮助确定文本的编码 [英] Need help determining encoding of the text

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

需要帮助确定文本的编码 [英] Need help determining encoding of the text

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭