将 UTF-8 字符编码函数从 PHP 翻译成 Java [英] Translate UTF-8 character encoding function from PHP to Java

查看:30
本文介绍了将 UTF-8 字符编码函数从 PHP 翻译成 Java的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将一种 PHP 编码函数转换为 Android Java 方法.因为 Java 字符串长度函数以不同的方式处理 UTF-8 字符串.在转换第二个 UTF-8 str2 时,我未能使翻译的 Java 代码与 PHP 代码一致.第一个非 UTF-8 字符串确实有效.

I am trying to translate one PHP encoding function to Android Java method. Because Java string length function handles UTF-8 string differently. I failed to make the translated Java codes consistent with PHP code in converting the second UTF-8 str2. The first non UTF-8 string does work.

原来的PHP代码是:

 function myhash_php($string,$key) {
    $strLen = strlen($string);
    $keyLen = strlen($key);
    $j=0 ; $hash = "" ; 
    for ($i = 0; $i < $strLen; $i++) {
        $ordStr = ord(substr($string,$i,1));
        if ($j == $keyLen) { $j = 0; }
        $ordKey = ord(substr($key,$j,1));
        $j++;
        $hash .= strrev(base_convert(dechex($ordStr + $ordKey),16,36));

    }
    return $hash;  
}
$str1 = "good friend" ;
$str2 = "好友" ;    //  strlen($str2) == 6
$key  = "iuyhjf476" ;
echo "php encode str1 '". $str1 ."'=".myhash_php($str1, $key)."<br>";
echo "php encode str2 '". $str2 ."'=".myhash_php($str2, $key)."<br>";

PHP 输出为:

    php encode str1 'good friend'=s5c6g6o5u3o5m4g4b4z516
    php encode str2 '好友'=a9u7m899x6p6

当前产生错误结果的已翻译 Java 代码是:

Current translated Java codes that produce wrong result are:

    public static String   hash_java(String  string, String  key) {
        //Integer strLen  = byteLenUTF8(string) ; // consistent with php strlen("好友")==6
        //Integer keyLen  = byteLenUTF8(key) ;    //   byteLenUTF8("好友") == 6
        Integer strLen  = string.length() ;      //     "好友".length()  ==  2
        Integer keyLen  = key.length() ;
        int j=0 ;
        String  hash = "" ;
        int ordStr, ordKey ;
        for (int i = 0; i < strLen; i++) {
            ordStr = ord_java(string.substring(i,i+1));  //string is String,  php  substr($string,$i,$n)  ==  java string.substring(i, i+n)
            // ordStr = ord_java(string[i]);  //string is byte[], php  substr($string,$i,$n)  ==  java string.substring(i, i+n)
            if (j == keyLen) { j = 0; }
            ordKey = ord_java(key.substring(j,j+1));
            j++;
            hash += strrev(base_convert(dechex(ordStr + ordKey),16,36));
        }
        return hash;
    }
    // return the ASCII code of the first character of str
    public static int      ord_java( String str){
        return( (int)  str.charAt(0)  ) ;
    }
    public static String   dechex(int input  ) {
        String hex  = Integer.toHexString(input ) ;
        return hex ;
    }
    public static String   strrev(String str){
        return  new StringBuilder(str).reverse().toString() ;
    }
    public static String   base_convert(String str, int fromBase, int toBase) {
        return Integer.toString(Integer.parseInt(str, fromBase), toBase);
    }

    String  str1 = "good friend" ;
    String  str2 = "好友" ;
    String  key  = "iuyhjf476" ;
    Log.d(LogTag,"java encode str1 '"+ str1  +"'="+hash_java(str1, key)) ;
    Log.d(LogTag,"java encode str2 '"+ str2  +"'="+hash_java(str2, key)) ;

Java 输出为:

java encode str1 'good friend'=s5c6g6o5u3o5m4g4b4z516
java encode str2 '好友'=arh4ng

Java 方法中 UTF-8 str2 的编码输出不正确.如何解决问题?

The encoded output of UTF-8 str2 in Java method is not correct. How to fix the problem?

推荐答案

在 Java 中,使用 UTF-8 字符编码将字符串转换为字节数组.然后,将您的编码算法应用于此字节数组而不是字符串.

In Java, convert the string to a byte array, using UTF-8 character encoding. Then, apply your encoding algorithm to this byte array instead of the string.

您的 PHP 程序似乎隐含地做同样的事情,例如处理字符 作为三个单独的字节值,根据 UTF-8 编码.

Your PHP program seems to implicitly do the same thing, to treat e.g. the character as three individual byte values, according to UTF-8 encoding.

在评论中,您说您收到了用户在 Android 上输入的字符串.因此,您从来自某个 UI 小部件的 Java String 开始.

In the comments, you say you receive the string from the user entering it on Android. So, you start with a Java String coming from some UI widget.

并且您需要该 Java 字符串给出与给定 PHP 函数在输入相同 UTF-8 字符串时产生的相同结果.生成的字符串将仅使用 ASCII 字符,因此其字符编码问题较少(例如 ISO-8859-1 或 UTF-8 无关紧要).

And you need that Java String to give the same result that the given PHP function will produce when fed with the same UTF-8 string. The resulting string will only use ASCII characters, so its character encoding is less problematic (doesn't matter whetherit's e.g. ISO-8859-1 or UTF-8).

PHP string 数据类型不知道编码,只存储一个字节序列,所以通常它可能包含 ISO-8859-1 字节,其中一个字节代表一个字符,或 UTF-8字节序列,其中字符通常占用多个字节,或任何其他编码.PHP string 不知道如何将字节解释为字符,它只是查看和计算字节数.

The PHP string datatype is ignorant about the encoding, just stores a sequence of bytes, so in general it might contain ISO-8859-1 bytes where one byte represents one character, or UTF-8 byte sequences, where characters often occupy multiple bytes, or any other encoding. The PHP string does not know how the bytes are meant to be interpreted as characters, it just sees and counts bytes.

因此,您的 PHP 字符串所称的字符"实际上是 UTF-8 编码的字节,Java 端在执行其算法时必须模拟这种行为.

So, what your PHP string calls "characters", effectively is the bytes of the UTF-8 encoding, and the Java side must emulate this behaviour when doing its algorithm.

Java 的 String 数据类型与 PHP 非常不同,它不是基于字节序列,而是(主要)将字符串视为字符序列.因此,如果您使用 Java 字符串的字符,您将不会看到 PHP 看到的相同元素序列.

Java has a String data type very different from PHP, not based on byte sequences, but (mainly) seeing a string as a sequence of characters. So, if you work with the characters of the Java String, you'll not see the same sequence of elements that PHP sees.

Java对"好友"这样的String进行迭代时,有两步,两个字符各一个(看字符的Unicode码位编号),而PHP有六步, UTF-8 表示的每个字节一个,查看字节值.

When Java iterates over a String like "好友", there are two steps, one for each of the two characters (seeing the character's Unicode code point number), while PHP has six steps, one for each byte of the UTF-8 representation, seeing the byte value.

因此,要模拟 PHP,在 Java 中您必须使用 UTF-8 编码将 String 转换为 byte[] 数组.这样,一个 Java byte 将对应一个 PHP 字符.

So, to emulate PHP, in Java you have to convert the String to a byte[] array using UTF-8 encoding. This way, one Java byte will correspond to one PHP character.

顺便说一下,UTF-8 字符串"这个词在 Java 中没有意义.

By the way, the wording "UTF-8 string" does not make sense in Java.

这与 PHP 不同,例如Maß" 作为 ISO-8859-1 字符串(长度为 3)不同于 Maß" 作为 UTF-8 字符串(长度为 4).

That is different from PHP where e.g. "Maß" as ISO-8859-1 string (having a length of 3) differs from "Maß" as UTF-8 string (having a length of 4).

在 Java 中,字符串是字符序列,这就是为什么例如"好友" 的长度为 2,因为它只是碰巧来自非拉丁文字的两个字符.[这适用于您通常会遇到的大多数 Unicode 字符,但也有例外.] 在 Java 中,像 UTF-8 这样的术语仅在字符串和字节序列之间进行转换时才重要.

In Java, Strings are sequences of characters, and that's the reason why e.g. "好友" has a length of 2, as it's just two characters that happen to come from a non-Latin script. [This is true for most Unicode characters you'll typically encounter, but there are exceptions.] In Java, terms like UTF-8 matter only when converting between strings and byte sequences.

这篇关于将 UTF-8 字符编码函数从 PHP 翻译成 Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆