这是奇怪的UTF-8编码转换吗? [英] Is this case a weird UTF-8 encoding conversion?
问题描述
我正在使用一个似乎的远程应用程序来对编码做一些魔术.该应用程序根据用户输入呈现清晰的响应(我将其称为True和False).我知道两个有效值,它们将呈现"True",其他所有值均应为"False".
I am working with a remote application that seems to do some magic with the encoding. The application renders clear responses (which I'll refer as True and False), depending on user input. I know two valid values, that will render 'True', all the others should be 'False'.
(偶然地)我发现有趣的是,提交损坏的值会导致"True".
What I found (accidently) interesting is, that submitting corrupted value leads to 'True'.
示例输入:
USER10 //gives True
USER11 //gives True
USER12 //gives False
USER.. //gives False
OTHERTHING //gives False
所以基本上只有这两个第一个值呈现True响应.
so basically only these two first values render True response.
令人惊讶的是,我注意到USER±0(十六进制\ x55 \ x53 \ x45 \ x52 \ C0 \ xB1 \ x30)被接受为True.我确实检查了其他十六进制字节,但没有成功.这使我得出一个结论,\ xC0 \ xB1可以某种方式转换为0x31(='1').
What I noticed is that USER˱0 (hex-wise \x55\x53\x45\x52\C0\xB1\x30) is accepted as True, surprisingly. I did check other hex bytes, with no such success. It leads me to a conclusion that \xC0\xB1 could be somehow translated into 0x31 (='1').
我的问题是-它怎么可能发生?该应用程序是否正在执行从UTF-16(或其他)到UTF-8的怪异转换?
My question is - how it could happen? Is that application performing some weird conversion from UTF-16 (or sth else) to UTF-8?
我将不胜感激任何评论/想法/提示.
I'd appreciate any comments/ideas/hints.
推荐答案
C0
对于两个字节的UTF-8序列是无效的起始字节,但是如果错误的UTF-8解码器接受该字节 C0 B1
将解释为ASCII 31h(字符 1
).
C0
is an invalid start byte for a two-byte UTF-8 sequence, but if a bad UTF-8 decoder accepts it C0 B1
would be interpreted as ASCII 31h (the character 1
).
引用维基百科:
...(C0和C1)仅可用于ASCII字符的无效超长编码"(即,尝试使用两个字节而不是一个字节来编码介于0和127之间的7位ASCII值...
...(C0 and C1) could only be used for an invalid "overlong encoding" of ASCII characters (i.e., trying to encode a 7-bit ASCII value between 0 and 127 using two bytes instead of one....
这篇关于这是奇怪的UTF-8编码转换吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!