Json转换&在一个字符串\\\& [英] Json converts & in a String to \u0026
问题描述
我试图从pdf中提取文本并将其写入json文件中。在提取unicode字符时,Json将所有&以\\\&。例如,我的实际字符串是&#1588
。 (代表ش)。它正确打印到.txt文件,控制台等等。但是,当我尝试将此字符串打印到Json文件时,它显示 \\\ش
。
I am trying to extract text from pdf and write it into a json file. While extracting unicode characters the Json converts all & to \u0026. For example my actual String is ش
. (which represents ش). It prints correctly to a .txt file, to console etc. But when I try to print this string to a Json file it shows \u0026#1588;
.
我使用的是Java,代码是
I am using Java, and the code is
Gson gson = new Gson();
String json = gson.toJson(pdfDoc);
注意: pdfDoc
是一个对象,包含输入PDF文档中所有字符的详细信息(位置,颜色,字体等)。我正在使用 gson-2.2.1.jar
。
Note: pdfDoc
is an object, that contains all the details (position, color, font.. etc) of characters inside the input PDF document. I am using gson-2.2.1.jar
.
推荐答案
实际上是一个有效的(但不是必需的)编码。 任何字符可以使用JSON中的Unicode转义进行编码,任何有效的JSON解析库必须都能够解释这些转义。
That's actually a valid (but not required) encoding. Any character may be encoded using the unicode escape in JSON and any valid JSON parsing library must be able to interpret those escapes.
&
不是需要编码的字符的一部分(参见 string
json.org ),但有几个JSON库是在编码上相当积极。这通常不是问题,除非您不真正使用符合JSON解析器处理生成的JSON。
&
is not part of the characters that need encoding (see the definition of string
at json.org), but there are a few JSON libraries that are quite "aggressive" in their encoding. That's not usually a problem, unless you don't really handle the resulting JSON with a conforming JSON parser.
GsonBuilder.disableHtmlEscaping()将帮助您将这一点功能关闭,如果你绝对需要。
GsonBuilder.disableHtmlEscaping() will help you turn that feature off if you absolutely need to.
这篇关于Json转换&在一个字符串\\\&的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!