如何对非英语字符串进行排序? [英] how to sort non-english strings?

查看:99
本文介绍了如何对非英语字符串进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确实查找了答案,并且它们对于标准字母很有用。但是我的情况与此不同。

I did look up answers, and they are good for the standard alphabet. but I have a different situation than that.

所以,我在用Java编程。我正在写某个程序。该程序在某些地方具有一些字符串项目列表。
我想根据字母对那些字符串项进行排序。

so, I am programming in Java. I am writing a certain program. this program has at some place some list of string items. I would like to sort those string items according to the alphabet.

如果我要按英文字母对其进行排序,这将很容易,因为通常所有代码页与美国信息交换标准代码(ASCII)兼容,并且它们已经对所有英文字母进行了排序,因此,如果我想对列表进行排序,则只需比较char的值来确定哪个字母可以

if I would sort it by English alphabet, it would be easy since usually all code pages are compatible with American standard code for information interchange (ASCII), and they have all letters of English alphabet already sorted, so, if I would like to sort my list, I would only have to compare the values of chars to determine which letter goes where.

,但是我的问题是,我不想使用英文字母对列表进行排序。
我的程序可以选择以英语或其他语言显示。
问题是某些语言的字母与英语字母不同,因此字母与英语字母中的字母不同,因此简单的<和>验证char值不起作用,因为字母在代码页中未正确排序。

but my problem is, that I do not want to sort a list by using the English alphabet. my program has the option to display in English or some other languages. the problem is that some of those languages have different alphabet from the English alphabet, therefore letters are not the same as those in the English alphabet, and thus simple <, and > validation of char values does not work because letters are not sorted correctly in the code page.

出于这个问题的目的,可以说英语字母如下:

for the purposes of this question lets say English alphabet is as follows:

a,
b,
c,
d,
e,
f,
g.

假设有一个名为 ABC的国家,其字母是这样的:

let's say there is a certain country named "ABC" whose alphabet goes like this:

d,
b,
g,
e,
a,
c,
f.

首先,如果a等于代码页上的97,b 98,c 99等,由于第二个字母的第一个字母等于100,第二个等于98,第三等于103等,因此在本例中如何使用第二个字母对列表进行排序?

first of all, if a is equal to 97 on code page, b 98, c 99 et cetera, how can I sort my list using the second alphabet in this example, since the second alphabet has its first letter equal to 100, second equal to 98, third to 103 et cetera?

和我的第二个问题:不幸的是,我正在翻译的某些国家/地区也有字母,其中某些字母组合被视为一个字母。在第二个示例中,假设
为国家/地区,则 def具有以下字母:

and my second question: unfortunately, some of the countries I am translating my program too has alphabet where some combinations of letters are treated as one letter. for my second example, let's say that country "def" has the following alphabet:

d,
g,
be,
e,
fe,
c,
f.

其中:
d-字母表中的第一个字母,
g-第二个字母在字母表中,
是-字母表中的第三个字母(一个字母,尽管被写为两个字母,但被认为只是一个字母,并且在字母表中具有位置),
e-字母表中的第四个字母,
-字母表中的第五个字母(也写为两个字母,但被视为一个字母),
c-字母表中的第六个字母,
f-第七个字母

here: d - the first letter in the alphabet, g - second letter in the alphabet, be - third letter in the alphabet (ONE letter, although it is written as two letters, it is considered to be just one letter, and has its position in the alphabet), e - fourth letter in the alphabet, the - fifth letter in the alphabet (also written as two letters, but treated as ONE letter), c - sixth letter in the alphabet, f - seventh letter in the alphabet.

在这个虚构的国家 def的虚构示例2中可以看到,这个国家确实弄糟了字母表。
,并给出了两个虚构国家的这两个字母的这两个示例之后,您了解了为什么我不能使用标准方法对字符串进行排序。

as you can see in this imaginary example number 2 of imaginary country "def", this country has really screwed up the alphabet. and after presenting these two examples of these two alphabets of two imaginary countries, you understand why I cannot use the standard method for sorting strings.

所以,可以您可以帮助我进行此类排序。我不确定如何根据这个错误的字母进行排序。

so, can you please help me out with this sorting. I am not sure what I can do to sort according to this screwed up alphabet.

后记:
下面的行对问题,但如果有人想知道我在哪里发现这样搞砸的字母,它们只是更多信息

好吧,我举了包含7个示例仅出于此问题的目的而随机排序的字母-使其更简单。以防万一,我真正的问题是什么-我正尝试将我的程序翻译成克罗地亚语。克罗地亚语的字母确实搞砸了,因为它如下所示:

well, i gave those examples which consists of 7 randomly ordered letters just for the purpose of this question - to make it more simple. in case you wonder, what my real problem is - i am trying to translate my program to croatian. croatian alphabet is really screwed up because it goes as follows:

1 |a
2 |b
3 |c
4 |č
5 |ć
6 |d
7 |đ
8 |đž
9 |e
10|f
11|g
12|h
13|i
14|j
15|k
16|l
17|lj
18|m
19|n
20|nj
21|o
22|p
23|r
24|s
25|š
26|t
27|u
28|v
29|z
30|ž

如您所见,克罗地亚字母与英语字母有些相似,但是大多数字母与英语字母不在同一位置,其中几个根本不存在英文字母,几个字母是一个字母,写成两个字母。所以真的很难排序。所以我希望有人知道这样做的方法。
当然,有一个最愚蠢的排序方法,它将始终有效并且可以对任何内容进行排序,这就是带有switch语句的方法,其中我比较两个字符串项,对于每个字母,我使用switch语句,其中switch语句具有31 + default = 32个案例,每个案例都有32个案例的开关。总共有1024种情况,如果我的平均情况有4行代码,我最终得出结论,如果我想使用非英语字母对字符串进行排序,那么我的排序方法将至少有4096行。
,这是一个巨大的方法。
这是最愚蠢的排序方式,但目前我只能弄清楚。
,所以我在这里问是因为我希望有人会知道任何更简单的方法来执行此操作。该方法不如4k行代码那么大,仅用于对愚蠢的字符串进行排序。
我有一种对英语字符串进行排序的方法,它只占用了10行以上的代码。
我希望有人可以向我建议少于4k行的代码。

as you can see, Croatian alphabet is somewhat similar to the English alphabet, but most of the letters are not at the same location as English ones, and several of them do not exist in English alphabet at all, and several letters are one letter which is written as two letters. so really difficult to sort. so I hope someone knows some method of doing it. of course, there is the dumbest method for sorting which will always work and can sort anything, and that is method with switch statement, where I compare two string items, and for each letter i use switch statement where switch statement has 31+default=32 cases from which, each of them has its own switch with 32 cases. what is in total 1024 cases, and if my average case has 4 lines of code, I end up that if I want to sort strings using the non-English alphabet, that my sort method would be at least 4096 lines long. and that is a huge method. this is the dumbest way of sorting, but only one I can figure out at the moment. so I am asking here because I hope someone would know any simpler method to do this. the method which is not so big as 4k lines of code just to sort stupid strings. I have a method for sorting English strings and it takes up only a bit more than 10 lines of code. I hope someone can suggest me something less than 4k lines of code.

因此,如果有人知道更简单的解决方案,我将不胜感激。

so if anyone knows the simpler solution, I would appreciate it.

谢谢。

推荐答案

您使用的是 Collat​​or 。 Collat​​ors是Java处理国际化比较的方法。

You use a Collator for that. Collators are Java's way to handle internationalized comparisons.

List<String> mylist = ...;
Locale croatian = new Locale("hr", "HR");
// Put whatever Locale you need as the argument to the getInstance method.
Collator collator = Collator.getInstance(croatian);
Collections.sort(mylist, collator);

本地不仅是语言,而且还有许多其他约定。可以根据国家或地区或国家/地区内的惯例对同一语言进行不同的排序-这就是为什么语言环境最多由3个部分标识的原因:国家,地区和变体。

Local is not just "language" but also many other conventions. It is possible for the same language to be sorted differently depending on the country or region or convention within the country - that's why a Locale is identified by at most 3 parts: "country", "region" and "variant".

这篇关于如何对非英语字符串进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆