检查字节序列在Javascript中是否为有效的UTF-8序列 [英] Check if the bytes sequence is valid UTF-8 sequence in Javascript
问题描述
是否有一种简单的方法来检查字符串在Javascript中是否为有效的UTF-8序列?
Is there a simple way to check if string is valid UTF-8 sequence in Javascript?
我真的不想以这样的regexp结尾:
I really do not want to end with regexp like this:
P.S .:我从外部API接收数据,有时(非常少见,但确实如此)它返回的数据带有无效的utf-8序列.尝试将它们放入postgres会导致适当的错误
P.S.: I am receiving data from external API and sometimes (very rarely but it happens) it returns data with invalid utf-8 seqences. Trying to put them into postgres results in appropriate error
推荐答案
UTF-8实际上是一种简单的编码,但是您要问的是单线无法完成.您必须:
UTF-8 is in fact a simple encoding, still what you are asking can't be done with a one-liner. You have to:
- 重写响应的
Content-Type
以在脚本中包含字节数组,并阻止浏览器/库解释响应本身 - 遍历字节以生成字符.请注意,UTF-8是可变长度编码,这就是为什么某些序列无效的原因.
- 如果发现无效的八位位组,请跳过
- 如果需要,可以通过处理故障将JSON/XML/任何字符串反序列化为Javascript对象
- Override the
Content-Type
of the response to have a byte array in your script and prevent the browser/library to interpret the response itself - Looping over the bytes to make characters. Note that UTF-8 is a variable-length encoding, that's why some sequences are invalid.
- If an invalid octet is found, skip it
- If needed deserialize the JSON/XML/whatever string to a Javascript object, possibly by handing failures
确定某个数组是否为有效的UTF-8序列是非常简单的任务(只是一堆if
语句和移位),但这又不是一行内容.
Deciding if a certain array is a valid UTF-8 sequence is quite a straightforward task (just a bunch of if
statements and bit shiftings), but again it's not a one line thing.
这篇关于检查字节序列在Javascript中是否为有效的UTF-8序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!