字符编码独立的性格互换 [英] Character Encoding independent character swap
问题描述
我喜欢当我想扭转一个字符串,用这块code的。<子> [当我不使用的std ::字符串
或其他在ç内置的功能
] 。当我最初以为这个我有 ASCII
表记初学者。我想,这可以做工精良的Uni code
太。我认为自从值(ASCII等)的区别是固定的,所以它的工作原理。
I like to use this piece of code when I want to reverse a string. [When I am not using std::string
or other inbuilt functions in C
]. As a beginner when I initially thought of this I had ASCII
table in mind. I think this can work well with Unicode
too. I assumed since the difference in values (ASCII etc) is fixed, so it works.
是否有任何字符编码在此code可能不工作?
char a[11],t;
int len,i;
strcpy(a,"Particl");
printf("%s\n",a);
len = strlen(a);
for(i=0;i<(len/2);i++)
{
a[i] += a[len-1-i];
a[len-1-i] = a[i] - a[len-1-i];
a[i] -= a[len-1-i];
}
printf("%s\n",a);
<分> 更新:
<子> 链接与此问题相关联信息。
This link is informative in association with this question.
推荐答案
这不会与任何编码工作,其中一些(不一定是全部)codepoints需要一个以上的字符
单元重新present,因为你在倒车逐字节而不是$ C $连接点逐$ C $连接点。对于通常的8位字符
这包括的所有的编码可以重新present所有的Uni code的。
This will not work with any encoding in which some (not necessarily all) codepoints require more than one char
unit to represent, because you are reversing byte-by-byte instead of codepoint-by-codepoint. For the usual 8-bit char
this includes all encodings that can represent all of Unicode.
例如:在UTF-16BE,字符串你好映射到字节序列 00 68 00 65 00 00 6C 00 6C 6F
。你的算法应用于此字节序列将产生序列 1207米00 6C 00 6C 00 65 00 68 00
,这是字符串漀氀氀攀的UTF-16BE编码栀。
For example: in UTF-16BE, the string "hello" maps to the byte sequence 00 68 00 65 00 6c 00 6c 00 6f
. Your algorithm applied to this byte sequence will produce the sequence 6f 00 6c 00 6c 00 65 00 68 00
, which is the UTF-16BE encoding of the string "漀氀氀攀栀".
更糟糕 - 做一个统一code字符串的$ C $连接点逐$ C $连接点反转仍不会产生在所有情况下正确的结果,因为统一code有许多codepoints作用于周围的环境,而不是独自站在为字符。作为一个简单的例子,$ C $连接点反转字符串腰穿,它包含U + 0308 COMBINING二分法,将会产生帕特länıpS - 怎么看二分法已经从N到A迁移? $ C $连接点逐$ C $连接点反转一个字符串包含双向替代或conjoining JAMO后果将更加可怕。
It gets worse -- doing a codepoint-by-codepoint reversal of a Unicode string still won't produce the correct results in all cases, because Unicode has many codepoints that act on their surroundings rather than standing alone as characters. As a trivial example, codepoint-reversing the string "Spın̈al Tap", which contains U+0308 COMBINING DIAERESIS, will produce "paT länıpS" -- see how the diaeresis has migrated from the N to the A? The consequences of codepoint-by-codepoint reversal on a string containing bidirectional overrides or conjoining jamo would be even more dire.
这篇关于字符编码独立的性格互换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!