将JSON响应转换为JavaScript中正确的编码 [英] Converting JSON response into correct encoding in JavaScript

查看:58
本文介绍了将JSON响应转换为JavaScript中正确的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用API​​中的数据.我正在使用请求进行API访问,但是还尝试了

I am trying to use data from an API. I am using request for the API access, but have also tried axios.

const request = require('request')
request('https://remoteok.io/api', function (error, response, body) {
  const data = JSON.parse(body)
  console.log(data)
})

在浏览器中访问网站 remoteok.io/api 时,我可以看到类似 \ u00e2 \ u0080 \ u0099 的序列.此序列应该是反引号撇号,但是当我使用JavaScript登录到控制台或使用表达呈现res.json(body),我得到的字符是â€.

When accessing the website remoteok.io/api in a browser, I can see sequences like \u00e2\u0080\u0099. This sequence should be a backtick apostrophe, but when I log to the console in JavaScript or use express to render res.json(body), I get the characters †instead.

如何解决此编码问题?JSON是否不应该总是单纯的UTF-8?

How can I fix this encoding issue? Shouldn't JSON always just be plain UTF-8?

更新:这是一个简单故障项目,用于显示行为.

UPDATE: Here is a simple glitch project that shows the behavior.

推荐答案

问题出在源数据中:JSON序列"\ u00e2 \ u0080 \ u0099" 不代表右引号.这里有三个Unicode代码点,第一个表示â",其他两个表示控制字符.

The problem is in the source data: the JSON sequence "\u00e2\u0080\u0099"does not represent a right closing quotation mark. There are three Unicode code points here, and the first represent "â", while the other two are control characters.

您可以在开发控制台中或通过运行以下代码段进行验证:

You can verify this in a dev console, or by running the snippet below:

console.log(JSON.parse('"\u00e2\u0080\u0099"'));

显然,该JSON的作者混淆了两件事:

Apparently the author of that JSON mixed up two things:

  • JSON以UTF编码
  • \ u 表示法是Unicode代码点
  • JSON is encoded in UTF
  • A \u notation represents a Unicode Code Point

第一个表示将JSON文本编码为字节的 file stream 应该是UTF编码的(首选UTF8).第二个与此无关.JSON语法允许使用 \ u 语法指定16位Unicode代码点.它不打算产生具有 \ u 编码的sequence 1 的UTF8字节序列.定义JSON文本时,您不必担心较低级别的UTF8字节流编码.

The first means that the file or stream, encoding the JSON text into bytes, should be UTF encoded (preference for UTF8). The second has nothing to do with that. JSON syntax allows to specify 16-bit Unicode Code Points using the \u syntax. It is not intended to produce a UTF8 byte sequence with a sequence1 of \u encodings. One should not be concerned about the lower-level UTF8 byte stream encoding when defining JSON text.

1 我可能需要至少提及代理对,但是它们与UTF8确实无关,但更多的是关于16以外的Unicode代码点位范围可以用JSON编码.

1 I may need to at least mention the surrogate pairs, but they are really unrelated to UTF8, but more with how Unicode Code Points beyond the 16-bit range can be encoded in JSON.

因此,尽管右右引号 具有E2 80 99 的UTF8序列,对于这三个字节中的每个字节,都不应使用 \ u 表示法进行编码.

So although the right closing quotation mark has an UTF8 sequence of E2 80 99, this is not to be encoded with a \u notation for each of those three bytes.

右引号具有Unicode代码点 \ u2019 .因此,源JSON应该具有该名称,或者它应该仅具有字符’从字面上看(确实是 byte流中的UTF8序列,但这是以下 JSON的级别)

The right closing quotation mark has Unicode Code Point \u2019. So either the source JSON should have that, or it should just have the character ’ literally (which will indeed be a UTF8 sequence in the byte stream, but that is a level below JSON)

查看这两种可能性:

console.log(JSON.parse('"’"'));
console.log(JSON.parse('"\u2019"'));

我建议您与该特定API的服务提供商联系.他们的JSON生成服务中有一个错误.

I would advise you to contact the service provider of this particular API. They have a bug in their JSON producing service.

无论您做什么,都不要尝试在使用此服务的客户端中修复此问题,不要尝试识别格式错误的序列,并替换它们,就像这些字符代表UTF8字节一样.这样的解决方案将很难维护,甚至可能会误判误判.

Whatever you do, do not try to fix this in your client that is using this service, trying to recognise such malformed sequences, and replacing them as if those characters represented UTF8 bytes. Such a fix will be hard to maintain, and may even hit false positives.

这篇关于将JSON响应转换为JavaScript中正确的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆