为什么我的简化DES的实现在Cp1252编码下工作正常,但不是UTF-8? [英] Why is my implementation of Simplified DES working fine under Cp1252 encoding but not under UTF-8?

查看:410
本文介绍了为什么我的简化DES的实现在Cp1252编码下工作正常,但不是UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我昨天问了下面的问题,但是由于我没有真正包含有关我实际问题的细节,所以没有太多关注。





我将尽可能地分析我的问题,以便让您清楚地了解发生了什么。



我有一个大学项目,我应该为了教育目的实施简化DES算法。该算法是一种使用10位密钥加密8位数据的加密算法。



在实现中,我想加入任何字符串。



所以我写了8位加密的代码,它对于各种输入都工作得很好。为了包含String加密支持,我使用函数 String.getBytes(),保存String中所有字节的变量 byte [] data



然后我按照这个逻辑:

  int i; (i = 0; i< data.length; i ++)
data [i] = encrypt(data [i]);

和解密我按照这个逻辑:

  int i; (i = 0; i< data.length; i ++)
data [i] = encrypt(data [i]);

以下是 main中的实际代码函数

  public static void main(String [] args){

short K =(short)的Integer.parseInt( 1010010001,2);
SDEncryption sdes = new SDEncryption(K); // K是10位键

String test =INFO BOB 57674;

//让加密字符串测试
String enc = sdes.encrypt(test.getBytes());

//让我们解密初始字符串的加密字符串
String dec = sdes.decrypt(enc.getBytes());
}

使用Cp1252的默认编码。我尝试加密字符串并获得以下结果:

 初始文本:INFO BOB 57674 
加密文本:ÅO [áa[aá»j×jt
解密文本:INFO BOB 57674

为了每次加密和解密数据时,请查看实际位表示,我创建了以下函数,以显示每个字符串的所有数据:

  public void show(byte [] data){
//ÎμÎφÎÎÎÏÏÏÏÏων$ $ $ b b b b b b b b b b b b under under under under under under $ $ $ $

int i; (i = 0; i
short mask =(short)(1 // 10000000
while(mask> 0){
if((data [i]& mask)== 0)
System.out.print(0);
else
System.out.print(1);

mask =(short)(mask>> 1);
}
if(i< data.length - 1){

System.out.print();
}
}
System.out.println();
}

所以我得到以下结果:

 初始文本(二进制):01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100 
加密文本(二进制):11000101 01001111 00100000 01011011 11100001 01100001 01011011 01100001 11100001 10111011 01101010 11010111 01101010 01110100
解密文本(二进制):01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100

似乎一切都按预期工作。为了在代码编辑器中支持希腊字母,我不得不将编码更改为UTF-8。



再次运行一切后,我得到以下结果:

 初始文本:INFO BOB 57674 
加密文本: O[ a[a j jt
解密文本: NFO BOB 7 74

注意解密文本的一些单词如何正确显示,例如 NFO BOB 。在我看来,好像有一些位操作的问题,好像Eclipse不能识别符合UTF-8规则的位序列。



以下是二进制格式的结果:

 初始文本(二进制):01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100 
加密文本(二进制):11101111 10111111 10111101 01001111 00100000 01011011 11101111 10111111 10111101 01100001 01011011 01100001 11101111 10111111 10111101 01101010 11101111 10111111 10111101 01101010 01110100
解密文本(二进制):11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01001110 01000110 01001111 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01000010 01001111 01000010 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 00110111 11101111 10111111 10111101 11101111 10111111 10111 101 11101111 10111111 10111101 00110111 00110100

现在我可以清楚地看到问题。看起来UTF-8为String增加了更多的字节。但是我不知道为什么。我的意思是,初始文本似乎具有相同的字节数,所以为什么这些字节在加密之后添加,甚至更多的加密后解密?



我会赞赏为任何帮助提供。谢谢你提前!

解决方案

每次你做 String.getBytes(),您隐式使用平台默认编码将字符转换为字节。如果字符串包含使用平台的默认编码无法表示的字符,则会丢失信息。因此,使用支持地球上每个字符的显式编码,如UTF8: string.getBytes(UTF8)



同样地,当您执行 new String(bytes)时,您可以使用平台的默认编码将字节转换为字符。如果字节实际上是使用其他编码的文本编码,或者根本不是字符,但纯粹是二进制信息,那么您也将丢失信息。



加密是一个二进制操作。它需要字节并返回其他字节。由于并不是所有的字节都代表一个有效的字符,所以无法将字节盲目地转换为字符。如果要将二进制信息(如加密文本)转换为字符串,请使用十六进制或Base64编码。



所以加密过程应该是:

  String clearText = ...:
byte [] clearTextAsBytes = clearText.getBytes(UTF8);
byte [] encryptedBinary = encrypt(clearTextAsBytes);
String encryptedBinaryAsPrintableChars = toBase64(encryptedBinary);

解密过程应该是对称的:

  String encryptedBinaryAsPrintableChars = ...; 
byte [] encryptedBinary = fromBase64(encryptedBinaryAsPrintableChars);
byte [] decryptptedTextAsBytes = decrypt(encryptedBinary);
String decryptptedText = new String(decryptedTextAsBytes,UTF8);


I asked the following question yesterday but it didn't get much attention due to the fact that I didn't really include any details about my actual problem.

Eclipse:Using UTF-8 encoding in the text editor make the Strings not work properly, how can I fix that?

I will try to analyze my problem as much as possible in order to give you a clear insight on what's going on.

I have a university project where I am supposed to implement the Simplified DES algorithm for educational purposes. This algorithm is an encryption algorithm which uses a 10 bit key in order to encrypt 8 bit data.

In the implementation I wanted to include encrypting any String.

So I wrote the code for the encryption of 8 bits and it worked perfectly fine for all kinds of inputs. In order to include String encryption support I used the function String.getBytes(), saved all the bytes of the String inside a variable byte[] data

and then I followed this logic:

int i;
for(i=0; i< data.length; i++)
    data[i] = encrypt(data[i]);

and for decryption I followed this logic:

int i;
for(i=0; i< data.length; i++)
    data[i] = encrypt(data[i]);

Here is the actual code in the main function

public static void main(String[] args){

    short K = (short) Integer.parseInt("1010010001",2);
    SDEncryption sdes = new SDEncryption(K); //K is the 10 bit key

    String test = "INFO BOB 57674";

    //let's encrypt the String test
    String enc = sdes.encrypt(test.getBytes());

    //let's decrypt the encrypted String of the initial String
    String dec = sdes.decrypt(enc.getBytes());
}

By using the default encoding which is Cp1252. I tried to encrypt the String and got the following results:

Initial Text: INFO BOB 57674
Encrypted Text: ÅO [áa[aá»j×jt
Decrypted Text: INFO BOB 57674

In order to see the actual bit representation each time I encrypt and decrypt the data I created the following function in order to display all the data of each String:

public void show(byte[] data){
    //εμφάνιση των 
    //note how the Greek letters aren't displayed at all under Cp1252

    int i;
    for(i=0;i<data.length;i++){

        short mask = (short) (1<<7); //10000000
        while(mask>0){
            if((data[i]&mask) == 0)
                System.out.print("0");
            else
                System.out.print("1");

            mask = (short) (mask >> 1);
        }
        if(i < data.length - 1){

            System.out.print(" ");
        }
    }
    System.out.println();
}

So I got the following results:

Initial Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100
Encrypted Text(binary): 11000101 01001111 00100000 01011011 11100001 01100001 01011011 01100001 11100001 10111011 01101010 11010111 01101010 01110100
Decrypted Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100

Seems like everything is working as expected. In order to support Greek letters in the code editor though, I had to change the encoding to be UTF-8.

After running everything again, I got the following results:

Initial Text: INFO BOB 57674
Encrypted Text: �O [�a[a�j�jt
Decrypted Text: ���NFO���BOB���7���74

Notice how some words of the decrypted text are displayed correctly, for example NFO and BOB. It seems to me as if there's some kind of problems with the bit manipulation, as if Eclipse doesn't recognize a sequence of bits which follows the rules of UTF-8.

Here are the results in binary form:

Initial Text(binary): 01001001 01001110 01000110 01001111 00100000 01000010 01001111 01000010 00100000 00110101 00110111 00110110 00110111 00110100
Encrypted Text(binary): 11101111 10111111 10111101 01001111 00100000 01011011 11101111 10111111 10111101 01100001 01011011 01100001 11101111 10111111 10111101 01101010 11101111 10111111 10111101 01101010 01110100
Decrypted Text(binary): 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01001110 01000110 01001111 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 01000010 01001111 01000010 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 00110111 11101111 10111111 10111101 11101111 10111111 10111101 11101111 10111111 10111101 00110111 00110100

Now I can see the problem ore clearly. It seems like UTF-8 adds more bytes to the String. However I'm not sure why. I mean the Initial Text seems to have the same amount of bytes so why do these bytes get added after the encryption and even more are added after the decryption?

I would appreciate for any help provided. Thank you in advance!

解决方案

Every time you do String.getBytes(), you implicitly use your platform default encoding to transform chars to bytes. If the String contains characters that can't be represented using your platform's default encoding, you lose information. So use an explicit encoding supporting every character on earth, like UTF8: string.getBytes("UTF8").

Similarly, when you do new String(bytes), you use your platform's default encoding to transform the bytes into chars. If the bytes actually are text encoded using another encoding, or aren't chars at all, but purely binary information, you'll also lose information.

Encryption is a binary operation. It takes bytes and returns other bytes. You can't blindly transform bytes into chars, whatever the encoding is, because not all bytes represent a valid character. If you want to transform binary information (like encrypted text) to a String, use Hex or Base64 encoding.

So the encryption process should be:

String clearText = ...:
byte[] clearTextAsBytes = clearText.getBytes("UTF8");
byte[] encryptedBinary = encrypt(clearTextAsBytes);
String encryptedBinaryAsPrintableChars = toBase64(encryptedBinary);

And the decryption process should be symmetric:

String encryptedBinaryAsPrintableChars = ...;
byte[] encryptedBinary  = fromBase64(encryptedBinaryAsPrintableChars);
byte[] decryptedTextAsBytes = decrypt(encryptedBinary);
String decryptedText = new String(decryptedTextAsBytes, "UTF8");

这篇关于为什么我的简化DES的实现在Cp1252编码下工作正常,但不是UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆