检查字节序列在Javascript中是否为有效的UTF-8序列 [英] Check if the bytes sequence is valid UTF-8 sequence in Javascript

查看:93
本文介绍了检查字节序列在Javascript中是否为有效的UTF-8序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种简单的方法来检查字符串在Javascript中是否为有效的UTF-8序列?

Is there a simple way to check if string is valid UTF-8 sequence in Javascript?

我真的不想以这样的regexp结尾:

I really do not want to end with regexp like this:

用于检测无效UTF-8字符串的正则表达式

P.S .:我从外部API接收数据,有时(非常少见,但确实如此)它返回的数据带有无效的utf-8序列.尝试将它们放入postgres会导致适当的错误

P.S.: I am receiving data from external API and sometimes (very rarely but it happens) it returns data with invalid utf-8 seqences. Trying to put them into postgres results in appropriate error

推荐答案

UTF-8实际上是一种简单的编码,但是您要问的是单线无法完成.您必须:

UTF-8 is in fact a simple encoding, still what you are asking can't be done with a one-liner. You have to:

  1. 重写响应的Content-Type以在脚本中包含字节数组,并阻止浏览器/库解释响应本身
  2. 遍历字节以生成字符.请注意,UTF-8是可变长度编码,这就是为什么某些序列无效的原因.
  3. 如果发现无效的八位位组,请跳过
  4. 如果需要,可以通过处理故障将JSON/XML/任何字符串反序列化为Javascript对象
  1. Override the Content-Type of the response to have a byte array in your script and prevent the browser/library to interpret the response itself
  2. Looping over the bytes to make characters. Note that UTF-8 is a variable-length encoding, that's why some sequences are invalid.
  3. If an invalid octet is found, skip it
  4. If needed deserialize the JSON/XML/whatever string to a Javascript object, possibly by handing failures

确定某个数组是否为有效的UTF-8序列是非常简单的任务(只是一堆if语句和移位),但这又不是一行内容.

Deciding if a certain array is a valid UTF-8 sequence is quite a straightforward task (just a bunch of if statements and bit shiftings), but again it's not a one line thing.

这篇关于检查字节序列在Javascript中是否为有效的UTF-8序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆