排序在非字母(即亚洲)语言中意味着什么? [英] What does sorting mean in non-alphabetic (i.e, Asian) languages?

查看:164
本文介绍了排序在非字母(即亚洲)语言中意味着什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码按对象属性对表列进行排序。我想到,在日语或中文(非字母语言)中,发送到sort函数的字符串将按照字母语言的方式进行比较。



以日本姓氏列表为例:

 寿拘(铃木)
松坂(松坂)
松井(松井)
山田(山田)
藤本(藤本)

当我通过Javascript对上面的列表进行排序时,结果是:

 寿拘(铃木)
山田(山田)
松井(松井)
松坂(松坂)
藤本(藤本)

这与日语音节,它将按照发音方式排列列表(日语词典的方式):

 寿拘(铃木)
藤本(藤本)
松井(松井)
松坂(松坂)
山田(山田)

我想知道的是:


  1. 一个双字节字符是否真正得到在排序函数中与另一个进行比较?

  2. W.帽子真的继续这样吗?

  3. (额外信用)这种结果是否意味着什么?排序的概念真的适用于亚洲(和其他)语言吗?如果是这样,它是什么意思,在为这些语言创建比较函数时应该努力做什么?






附录总结答案和结论:



首先,感谢所有参与讨论的人。这提供了非常丰富的信息和帮助。特别关注 bobince Lie Ryan Gumbo Jeffrey Zheng Larry K ,进行深入细致的分析。我将复选标记给了 Larry K ,因为我指的是我的问题无法预见的解决方案,但是我找到了有用的每个答案。



共识似乎是:


  1. 中文和日文字符串按Unicode代码点排序,并按顺序排序可能基于一个理论基础,可能在某种程度上可以让知识渊博的读者理解,但在帮助用户找到他们正在寻找的信息方面不太可能具有很大的实用价值。


  2. 在语义上或语音上有用的那种比较函数的种类考虑过于繁琐,特别是因为结果可能不太令人满意,并且在任何情况下比较算法都会必须为每种语言进行更改。最好只是为了让排序在没有尝试比较功能的情况下继续进行。


  3. 我可能在这里提出了错误的问题。也就是说,我在考虑真正的问题并没有考虑如何使排序在这些语言中有用,但我如何为用户提供在列表中查找项目的有用方法。西方人自动想到为此目的排序,我对此感到内疚。 Larry K向我指出一篇维基百科的文章,该文章暗示过滤功能可能对亚洲读者更有用。这就是我打算追求的目标,因为它至少和排序一样快,客户端。我将保留列排序,因为它在西方语言中很好理解,并且因为任何语言的发言者都会发现日期和其他基于数字的数据类型的排序很有用。但我还将添加过滤机制,这对于任何语言的长列表都很有用。



解决方案

如果你想要的话,你可以在Javascript中实现 Unicode整理算法比字符串的默认JS排序更好。可能会改善一些事情。虽然Unicode doc说明:


整理不统一;根据语言和文化,它会变化

德国人,法国人和瑞典人对
相同的字符进行不同的排序。可能
也因特定应用而异:即使在同一种语言中,

词典可能与
电话簿或书籍索引不同。对于
非字母脚本,例如East
亚洲表意文字,整理可以是
语音或基于角色的
外观。


维基百科文章指出,在非字母脚本中整理是非常困难的,现在有一天,答案是通过输入字符来查找信息非常容易,而不是通过查看列表。



<我建议您与真正知识渊博的应用程序最终用户交谈,看看他们最喜欢的行为方式。订购中文字符的问题并不是您的应用程序所特有的。



此外,如果您不想在系统中实施排序规则,另一种解决方案是创建一个Ajax服务,将名称存储在MySql或其他数据库中,然后使用订单语句查找数据。


I have some code that sorts table columns by object properties. It occurred to me that in Japanese or Chinese (non-alphabetical languages), the strings that are sent to the sort function would be compared the way an alphabetical language would.

Take for example a list of Japanese surnames:

寿拘 (Suzuki)
松坂 (Matsuzaka)
松井 (Matsui)
山田 (Yamada)
藤本 (Fujimoto)

When I sort the above list via Javascript, the result is:

寿拘 (Suzuki)
山田 (Yamada)
松井 (Matsui)
松坂 (Matsuzaka)
藤本 (Fujimoto)

This is different from the ordering of the Japanese syllabary, which would arrange the list phonetically (the way a Japanese dictionary would):

寿拘 (Suzuki)
藤本 (Fujimoto)
松井 (Matsui)
松坂 (Matsuzaka)
山田 (Yamada)

What I want to know is:

  1. Does one double-byte character really get compared against the other in a sort function?
  2. What really goes on in such a sort?
  3. (Extra credit) Does the result of such a sort mean anything at all? Does the concept of sorting really work in Asian (and other) languages? If so, what does it mean and what should one strive for in creating a compare function for those languages?


ADDENDUM TO SUMMARIZE ANSWERS AND DRAW CONCLUSIONS:

First, thanks to all who contributed to the discussion. This has been very informative and helpful. Special shout-outs to bobince, Lie Ryan, Gumbo, Jeffrey Zheng, and Larry K, for their in-depth and thoughtful analyses. I awarded the check mark to Larry K for pointing me toward a solution my question failed to foresee, but I up-ticked every answer I found useful.

The consensus appears to be that:

  1. Chinese and Japanese character strings are sorted by Unicode code points, and their ordering may be predicated on a rationale that may be in some way intelligible to knowledgeable readers but is not likely to be of much practical value in helping users to find the information they're seeking.

  2. The kind of compare function that would be required to make a sort semantically or phonetically useful is far too cumbersome to consider pursuing, especially since the results would probably be less than satisfactory, and in any case the comparison algorithms would have to be changed for each language. Best just to allow the sort to proceed without even attempting a compare function.

  3. I was probably asking the wrong question here. That is, I was thinking too much "inside the box" without considering that the real question is not how do I make sorting useful in these languages, but how do I provide the user with a useful way of finding items in a list. Westerners automatically think of sorting for this purpose, and I was guilty of that. Larry K pointed me to a Wikipedia article that suggests a filtering function might be more useful for Asian readers. This is what I plan to pursue, as it's at least as fast as sorting, client-side. I will keep the column sorting because it's well understood in Western languages, and because speakers of any language would find the sorting of dates and other numerical-based data types useful. But I will also add that filtering mechanism, which would be useful in long lists for any language.

解决方案

You could implement the Unicode Collation Algorithm in Javascript if you want something better than the default JS sort for strings. Might improve some things. Though as the Unicode doc states:

Collation is not uniform; it varies according to language and culture: Germans, French and Swedes sort the same characters differently. It may also vary by specific application: even within the same language, dictionaries may sort differently than phonebooks or book indices. For non-alphabetic scripts such as East Asian ideographs, collation can be either phonetic or based on the appearance of the character.

The Wikipedia article points out that since collation is so tough in non-alphabetic scripts, now a days the answer is to make it very easy to look up information by entering characters, rather than by looking through a list.

I suggest that you talk to truly knowledgeable end users of your application to see how they would best like it to behave. The problem of ordering Chinese characters is not unique to your application.

Also, if you don't want to implement the collation in your system, another solution would for you to create a Ajax service that stores the names in a MySql or other database, then looks up the data with an order statement.

这篇关于排序在非字母(即亚洲)语言中意味着什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆