Android的，在日文字符的文件名比较问题 [英] Android, problem with file name comparison in Japanese characters

查看：302 发布时间：2016/3/11 19:59:26 android string unicode utf-8 character

本文介绍了Android的，在日文字符的文件名比较问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想，以配合在Android上使用递归的目录搜索文件名的搜索字符串。问题是，该字符是日文，而且它不会在某些情况下的匹配。例如，搜索字符串我试图用文件名的开头匹配是呼ぶ。当我打印文件名，从file.getName（），这是正确地反映，例如打印到控制台的文件名以呼ぶ。但是，当我做一个匹配的搜索字符串，如fileName.startwith（呼ぶ），它不匹配。

I'm trying to match a search string with a file name with a recursive directory search on Android. The problem is that the characters are Japanese, and it's not matching in some cases. For example, the search string I'm trying to match the start of the file name with is "呼ぶ". When I print the file names, from file.getName(), this is accurately reflected, e.g. the file name printed to the console starts with "呼ぶ". But when I do a match on the search string, e.g. fileName.startwith("呼ぶ"), it doesn't match.

原来，当我打印的文件名的子字符串被搜索，第二个字符是不同的 - 这个词是呼ふ而不是呼ぶ。如果我提取字节和打印十六进制字符，最后一个字节是关闭的1 - presumablyぶ和ふ之间的区别

It turns out that when I print the substring of the file name being searched, the second character is different – the word is "呼ふ" instead of "呼ぶ". If I extract the bytes and print the hex characters, the last byte is off by 1 – presumably the difference between "ぶ" and "ふ".

下面是用于以示区别的code：

Here is the code used to show the difference:

    String name = soundFile.getName();
    String string1 = question.kanji;


    Log.d(TAG, "searching for : s1:" + question.kanji + " + " + question.hiragana + " + " + question.english);
    Log.d(TAG, "name is: " + name);

    Log.d(TAG, "question.kanaji.length(): " + question.kanji.length());
    Log.d(TAG, "question.hiragana.length(): " + question.hiragana.length());


    String compareStart = name.substring(0, string1.length() );

    Log.d(TAG, "string1.length(): " + string1.length());
    Log.d(TAG, "compareStart.length(): " + compareStart.length());      

        byte[] nameUTF8 = null; 
    byte[] s1UTF8 = null;
    byte[] csUTF8 = null;

    nameUTF8 = name.getBytes();
    s1UTF8 = string1.getBytes();
    csUTF8 = compareStart.getBytes();


    Log.d(TAG, "nameUTF8.length: " + s1UTF8.length);            
    Log.d(TAG, "s1UTF8.length: " + s1UTF8.length);
    Log.d(TAG, "csUTF8.length: " + csUTF8.length);

    for (int i = 0; i < s1UTF8.length; i++) {
        Log.d(TAG, "s1UTF8[i]: " + Integer.toString(s1UTF8[i] & 0xff, 16).toUpperCase());
    }

    for (int i = 0; i < csUTF8.length; i++) {
        Log.d(TAG, "csUTF8[i]: " + Integer.toString(csUTF8[i] & 0xff, 16).toUpperCase());
    }

    for (int i = 0; i < nameUTF8.length; i++) {
        Log.d(TAG, "nameUTF8[i]: " + Integer.toString(nameUTF8[i] & 0xff, 16).toUpperCase());
    }

的部分输出如下：

The partial output is as follows:

D/AnswerView(12078): searching for : s1:呼ぶ + よぶ + to call out,to invite
D/AnswerView(12078): name is: 呼ぶ                                                     よぶ                 to call out,to invite.mp3
D/AnswerView(12078): question.kanaji.length(): 2
D/AnswerView(12078): question.hiragana.length(): 2
D/AnswerView(12078): string1: 呼ぶ
D/AnswerView(12078): compareStart: 呼ふ
D/AnswerView(12078): string1.length(): 2
D/AnswerView(12078): compareStart.length(): 2
D/AnswerView(12078): string1.length(): 2
D/AnswerView(12078): compareStart.length(): 2
D/AnswerView(12078): nameUTF8.length: 6
D/AnswerView(12078): s1UTF8.length: 6
D/AnswerView(12078): csUTF8.length: 6
D/AnswerView(12078): s1UTF8[i]: E5
D/AnswerView(12078): s1UTF8[i]: 91
D/AnswerView(12078): s1UTF8[i]: BC
D/AnswerView(12078): s1UTF8[i]: E3
D/AnswerView(12078): s1UTF8[i]: 81
D/AnswerView(12078): s1UTF8[i]: B6
D/AnswerView(12078): csUTF8[i]: E5
D/AnswerView(12078): csUTF8[i]: 91
D/AnswerView(12078): csUTF8[i]: BC
D/AnswerView(12078): csUTF8[i]: E3
D/AnswerView(12078): csUTF8[i]: 81
D/AnswerView(12078): csUTF8[i]: B5
D/AnswerView(12078): nameUTF8[i]: E5
D/AnswerView(12078): nameUTF8[i]: 91
D/AnswerView(12078): nameUTF8[i]: BC
D/AnswerView(12078): nameUTF8[i]: E3
D/AnswerView(12078): nameUTF8[i]: 81
D/AnswerView(12078): nameUTF8[i]: B5
D/AnswerView(12078): nameUTF8[i]: E3
D/AnswerView(12078): nameUTF8[i]: 82
D/AnswerView(12078): nameUTF8[i]: 99
D/AnswerView(12078): nameUTF8[i]: 20
D/AnswerView(12078): nameUTF8[i]: 20
D/AnswerView(12078): nameUTF8[i]: 20
D/AnswerView(12078): nameUTF8[i]: 20

显示该文件名的所提取的串，以及文件名本身的第六字节，是B5，而不是B6，因为它是在搜索字符串。但是，正确显示打印的文件名。我难倒。为什么文件名正确显示在控制台当底层人物有什么不同？为什么有额外的3个非空白字节的文件名的开头 - 这在某种程度上并不需要在搜索字符串重新present了ぶ字？

Showing that the sixth byte of the extracted substring of the file name, as well as the file name itself, is "B5" instead of "B6" as it is in the search string. However, the printed file name is correctly displayed. I'm stumped. Why is the file name being correctly displayed to the console when the underlying characters are different? Why are there an additional 3 non-blank bytes at the beginning of the file name - which somehow aren't needed in the search string to represent the "ぶ" character?

Android的，在日文字符的文件名比较问题 [英] Android, problem with file name comparison in Japanese characters

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

Android的，在日文字符的文件名比较问题 [英] Android, problem with file name comparison in Japanese characters

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭