如何确保表单中编码的文本是utf8 [英] How do I ensure that the text encoded in a form is utf8

查看:201
本文介绍了如何确保表单中编码的文本是utf8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个HTML框,用户可以输入文本。我想确保在用户完成打字时,框中输入的所有文本都以UTF-8编码或转换为UTF-8。此外,我不太了解如何在输入文本框时选择各种UTF编码。



一般来说,我对以下内容感到好奇:




  • 确定用户在输入文本框时使用哪些编码?

  • 如何在HTML文本框中确定字符串值的编码?

  • 我可以强制浏览器只能使用UTF-8编码吗?

  • 如何将任意编码编码为UTF-8我假设有一个JavaScript库?



**编辑**



删除了一些不需要我的目标的问题。



我更好地理解JavaScript字符代码,但是在所有情况下都是错误的,实际上并不将字符代码翻译为utf-8。
http://www.webtoolkit.info/javascript-base64.html

解决方案



  • 浏览器如何确定使用哪个编码用户正在键入文本框?


它使用默认情况下解压页面的编码。根据规格,您应该可以覆盖这与< form> 元素的 accept-charset 属性相关,但IE是错误的,所以你应该'我依靠这个(我已经看到几个不同的消息来源描述了几个不同的bug,而且我没有所有的相关版本的IE在我面前进行测试,所以我会留下来)。



  • 如何在HTML文本框中确定字符串值的编码?


JavaScript中的所有字符串都以UTF-16编码。浏览器会将所有内容映射到UTF-16 for JavaScript,从UTF-16映射到页面编码的任何位置。



UTF-16是一种编码, UCS-2。最初,据认为,65,536个代码点对于所有的Unicode都是足够的,因此16位字符编码就足够了。事实证明情况并非如此,因此字符集扩展为1,114,112个代码点。为了保持向后兼容性,为替代对设置了16位字符集的一些未使用的范围,其中使用两个16位代码单元对单个字符进行编码。有关详细信息,请阅读维基百科上的UTF-16和UCS-2。



结果是当您在JavaScript中有一个字符串 str 时, str.length 不给你字符数,它给出了代码单元的数量,其中两个代码单元可以用于对单个字符进行编码,如果该字符不在基本多语言平面内。例如,abc.length 给你3,但

I have an html box with which users may enter text. I would like to ensure all text entered in the box is either encoded in UTF-8 or converted to UTF-8 when a user finishes typing. Furthermore, I don't quite understand how various UTF encoding are chosen when being entered into a text box.

Generally I'm curious about the following:

  • How does a browser determine which encodings to use when a user is typing into a text box?
  • How can javascript determine the encoding of a string value in an html text box?
  • Can I force the browser to only use UTF-8 encoding?
  • How can I encode arbitrary encodings to UTF-8 I assume there is a JavaScript library for this?

** Edit **

Removed some questions unnecessary to my goals.

This tutorial helped me understand JavaScript character codes better, but is buggy and does not actually translate character codes to utf-8 in all cases. http://www.webtoolkit.info/javascript-base64.html

解决方案

  • How does a browser determine which encodings to use when a user is typing into a text box?

It uses the encoding the page was decoded as by default. According to the spec, you should be able to override this with the accept-charset attribute of the <form> element, but IE is buggy, so you shouldn't rely on this (I've seen several different sources describe several different bugs, and I don't have all the relevant versions of IE in front of me to test, so I'll leave it at that).

  • How can javascript determine the encoding of a string value in an html text box?

All strings in JavaScript are encoded in UTF-16. The browser will map everything into UTF-16 for JavaScript, and from UTF-16 into whatever the page is encoded in.

UTF-16 is an encoding that grew out of UCS-2. Originally, it was thought that 65,536 code points would be enough for all of Unicode, and so a 16 bit character encoding would be sufficient. It turned out that the is not the case, and so the character set was expanded to 1,114,112 code points. In order to maintain backwards compatibility, a few unused ranges of the 16 bit character set were set aside for surrogate pairs, in which two 16 bit code units were used to encode a single character. Read up on UTF-16 and UCS-2 on Wikipedia for details.

The upshot is that when you have a string str in JavaScript, str.length does not give you the number of characters, it gives you the number of code units, where two code units may be used to encode a single character, if that character is not within the Basic Multilingual Plane. For instance, "abc".length gives you 3, but "

这篇关于如何确保表单中编码的文本是utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆