为什么我可以使用不是字符集的一部分的字符(Windows-1252)? [英] Why am I able to use a character that's not part of a charset (windows-1252)?

查看:374
本文介绍了为什么我可以使用不是字符集的一部分的字符(Windows-1252)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一点帮助,了解charsets的工作原理。这个问题是从使用Windows- 1252而不是UTF-8



我有一个测试ColdFusion网站使用...

 < CFHEADER NAME =Content-Typevalue =text / html; charset = windows-1252> 
< meta http-equiv =Content-Typecontent =text / html; charset = windows-1252/>

和使用...的测试Oracle数据库...

  NLS_CHARACTERSET:WE8MSWIN1252 
NLS_NCHAR_CHARACTERSET:AL16UTF16

根据windows-1252字符集,没有平方根符号(alt + 251):√但是我可以将其输入到网页表单上的一个字段中,保存到数据库,查询并再次显示在屏幕上精细。当它在数据库中时,它被存储为:&#8730; 。我如何输入,存储它,查询并显示它,如果它甚至不是字符集的一部分?根据字符集,十进制251是这样的: Hex:FB | û| 00FB |

解决方案

您没有真正使用页面外的字符,数据库的字符集。



由于页面是Windows-1252编码,如果您将Alt + 251输入到表单域,然后发布数据,浏览器会显示:

 嘿,这个字符不是windows-1252,我只需要发回数据
这是在Windows-1252 ,所以我会做最好的,我可以发回
html字符代码的char&#8730; - 哦,我希望我可以发回
1个字符,因为我不能我会发回7.

如果您注意到,这是Windows-1252字符集中的7个不同的字符。 p>

如果页面使用多字节字符集编码,浏览器将发回一个被认为是1个字符的内容。



那么你可以如何查询?

  select * from tab其中字段像'%&#8730;%'

你有的是平方根符号的html字符: https://www.google.com/#q=html+character+codes



更新:



这是一篇很好的文章,解释发生了什么: http://htmlpurifier.org/docs/enduser-utf8.html

 ...一旦你开始添加你的编码之外的字符... 
[浏览器可能]用字符替换字符ity参考....

此外,当您在Windows机器上输入Alt + 251时,它插入在Unicode中的平方根符号是U-221A。



按Alt + 251只是一个键盘宏来插入Unicode就是U-221A。 p>

I'm looking for a little help in understanding how charsets work. This question is a continuation from Anything wrong with using windows-1252 instead of UTF-8

I have a test ColdFusion site using...

<CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />

and a test Oracle DB using...

NLS_CHARACTERSET: WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16

According to the windows-1252 charset there is no square root symbol (alt+251): √ But I can type that into a field on a webpage form, save it to the DB, query it and show it on the screen again just fine. When it's in the DB it's stored as: &#8730;. How can I enter that, store it, query and show it if it's not even part of the charset? According to the charset, decimal 251 is this: Hex:FB | û | 00FB | LATIN SMALL LETTER U WITH CIRCUMFLEX

解决方案

You're not really using characters outside of the page and database's charset.

Because the page is windows-1252 encoded, if you enter Alt+251 into a form field and then post the data, the browser says:

"Hey this char is not apart of windows-1252 and I need to only send back data
 which is in windows-1252, so I will do the best I can and send back the 
 html character code of char &#8730;  -- oh well, I wish I could send back
 1 character, since I cannot I will send back 7."

And if you notice, this is 7 different characters which are in the windows-1252 charset.

Had the page been encoded with a multibyte charset, the browser would send back something which is considered 1 character.

So how can you query it?

 select * from tab where field like '%&#8730;%'

What you have is the html character of the square root symbol: https://www.google.com/#q=html+character+codes

Update:

Here is a very good article explaining what is happening: http://htmlpurifier.org/docs/enduser-utf8.html

 "...once you start adding characters outside of your encoding... 
 [the browser might] replace the character with a character entity reference...."

Also when you enter Alt+251 on a windows machine, it inserts the square root symbol which in Unicode it is U-221A.

Pressing Alt+251 is just a like a keyboard macro to insert Unicode it is U-221A.

这篇关于为什么我可以使用不是字符集的一部分的字符(Windows-1252)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆