使用QJsonDocument将子字符串解析为JSON [英] Parse a substring as JSON using QJsonDocument
问题描述
我有一个包含(不是 )JSON编码数据的字符串,如下例所示:
I have a string which contains (not is) JSON-encoded data, like in this example:
foo([1, 2, 3], "some more stuff")
| |
start end (of JSON-encoded data)
我们在应用程序中使用的完整语言嵌套JSON编码的数据,而其余的语言是微不足道的(只是递归的东西)。当在递归解析器中从左到右解析字符串时,我知道当我遇到一个JSON编码的值,就像这里 [1,2,3]
index 4.在解析这个子字符串之后,我需要知道结束位置以继续解析字符串的其余部分。
The complete language we use in our application nests JSON-encoded data, while the rest of the language is trivial (just recursive stuff). When parsing strings like this from left to right in a recursive parser, I know when I encounter a JSON-encoded value, like here the [1, 2, 3]
starting at index 4. After parsing this substring, I need to know the end position to continue parsing the rest of the string.
我想将这个子字符串传递给一个井测试的JSON解析器,如Qt5中的 QJsonDocument
。但是,阅读文档时,不可能只解析一个子字符串作为JSON,意味着一旦解析的数据结束(在消费]
后)控制返回,而不报告解析错误。此外,我需要知道结束位置继续解析我自己的东西(这里剩余的字符串,一些更多的东西)
)。
I'd like to pass this substring to a well-tested JSON-parser like QJsonDocument
in Qt5. But as reading the documentation, there is no possibility to parse only a substring as JSON, meaning that as soon as the parsed data ends (after consuming the ]
here) control returns without reporting a parse error. Also, I need to know the end position to continue parsing my own stuff (here the remaining string is , "some more stuff")
).
为了做到这一点,我曾经使用一个自定义的JSON解析器,它通过引用获取当前位置,并在完成解析后更新它。但是,由于它是业务应用程序的安全关键部分,我们不想再坚持我的自制解析器了。我的意思是有 QJsonDocument
,所以为什么不使用它。 (我们已经使用Qt5了。)
To do this, I used to use a custom JSON parser which takes the current position by reference and updates it after finishing parsing. But since it's a security-critical part of a business application, we don't want to stick to my self-crafted parser anymore. I mean there is QJsonDocument
, so why not use it. (We already use Qt5.)
作为一种解决方法,我想到这种方法:
As a work-around, I'm thinking of this approach:
- 让
QJsonDocument
解析从当前位置开始的子字符串(这是无效的JSON) - 错误报告了一个意想不到的字符,这是超越JSON的一些位置
- 让
QJsonDocument
再次解析,结束位置
- Let
QJsonDocument
parse the substring starting from the current position (which is no valid JSON) - The error reports an unexpected character, this is some position beyond the JSON
- Let
QJsonDocument
parse again, but this time the substring with the correct end position
第二个想法是写一个JSON结束扫描器,它接受整个字符串,开始位置并返回结束位置JSON编码数据的位置。这也需要解析,因为不匹配的括号/括号可以出现在字符串值中,但是与完全手工制作的JSON解析器相比,写入(和使用)这样的类应该容易得多(更安全)。
A second idea is to write a "JSON end scanner" which takes the whole string, a start position and returns the end position of the JSON-encoded data. This also requires parsing, as unmatched brackets / parentheses can appear in string values, but it should be much easier (and safer) to write (and use) such a class in comparison to a fully hand-crafted JSON-parser.
有人有更好的主意吗?
推荐答案
*],具体取决于 http://www.ietf.org/rfc/rfc4627.txt 使用精神。
I rolled a quick parser[*] based on http://www.ietf.org/rfc/rfc4627.txt using Spirit Qi.
它实际上不会解析成AST,但它会解析所有的JSON有效负载,这实际上是一个比这里需要的更多。
It doesn't actually parse into an AST, but it parses all of the JSON payload, which is actually a bit more than required here.
范例 这里(http://liveworkspace.org/code/3k4Yor$2) 输出:
The sample here (http://liveworkspace.org/code/3k4Yor$2) outputs:
Non-JSON part of input starts after valid JSON: ', "some more stuff")'
输入的非JSON部分在有效的JSON后面启动:基于OP给出的测试:
Based on the test given by the OP:
const std::string input("foo([1, 2, 3], \"some more stuff\")");
// set to start of JSON
auto f(begin(input)), l(end(input));
std::advance(f, 4);
bool ok = doParse(f, l); // updates f to point after the start of valid JSON
if (ok)
std::cout << "Non-JSON part of input starts after valid JSON: '" << std::string(f, l) << "'\n";
我已经测试了一些其他更多涉及的JSON文档(包括多行)。
I have tested with several other more involved JSON documents (including multiline).
几句话:
- 我做了基于迭代器的解析器,所以它很可能很容易与Qt字符串(?)
- 如果要禁止多行片段,请将
qi :: space
的船长更改为qi :: blank
- 有关数字解析(见TODO)的一致性快捷方式不会影响此答案的有效性。
- I made the parser Iterator-based so it will likely easily work with Qt strings(?)
- If you want to disallow multi-line fragments, change the skipper from
qi::space
toqi::blank
- There is a conformance shortcut regarding number parsing (see TODO) that doesn't affect validity for this answer (see comment).
[*]从技术上讲,这更像是一个解析器别的。它基本上是一个lexer承担太多的工作:)
[*] technically, this is more of a parser stub since it doesn't translate into something else. It is basically a lexer taking on too much work :)
// #define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, Skipper>
{
parser() : parser::base_type(json)
{
// 2.1 values
value = qi::lit("false") | "null" | "true" | object | array | number | string;
// 2.2 objects
object = '{' >> -(member % ',') >> '}';
member = string >> ':' >> value;
// 2.3 Arrays
array = '[' >> -(value % ',') >> ']';
// 2.4. Numbers
// Note out spirit grammar takes a shortcut, as the RFC specification is more restrictive:
//
// However non of the above affect any structure characters (:,{}[] and double quotes) so it doesn't
// matter for the current purpose. For full compliance, this remains TODO:
//
// Numeric values that cannot be represented as sequences of digits
// (such as Infinity and NaN) are not permitted.
// number = [ minus ] int [ frac ] [ exp ]
// decimal-point = %x2E ; .
// digit1-9 = %x31-39 ; 1-9
// e = %x65 / %x45 ; e E
// exp = e [ minus / plus ] 1*DIGIT
// frac = decimal-point 1*DIGIT
// int = zero / ( digit1-9 *DIGIT )
// minus = %x2D ; -
// plus = %x2B ; +
// zero = %x30 ; 0
number = qi::double_; // shortcut :)
// 2.5 Strings
string = qi::lexeme [ '"' >> *char_ >> '"' ];
static const qi::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
char_ = ~qi::char_("\"\\") |
qi::char_("\x5C") >> ( // \ (reverse solidus)
qi::char_("\x22") | // " quotation mark U+0022
qi::char_("\x5C") | // \ reverse solidus U+005C
qi::char_("\x2F") | // / solidus U+002F
qi::char_("\x62") | // b backspace U+0008
qi::char_("\x66") | // f form feed U+000C
qi::char_("\x6E") | // n line feed U+000A
qi::char_("\x72") | // r carriage return U+000D
qi::char_("\x74") | // t tab U+0009
qi::char_("\x75") >> _4HEXDIG ) // uXXXX U+XXXX
;
// entry point
json = value;
BOOST_SPIRIT_DEBUG_NODES(
(json)(value)(object)(member)(array)(number)(string)(char_));
}
private:
qi::rule<It, Skipper> json, value, object, member, array, number, string;
qi::rule<It> char_;
};
template <typename It>
bool tryParseAsJson(It& f, It l) // note: first iterator gets updated
{
static const parser<It, qi::space_type> p;
try
{
return qi::phrase_parse(f,l,p,qi::space);
} catch(const qi::expectation_failure<It>& e)
{
// expectation points not currently used, but we could tidy up the grammar to bail on unexpected tokens
std::string frag(e.first, e.last);
std::cerr << e.what() << "'" << frag << "'\n";
return false;
}
}
int main()
{
#if 0
// read full stdin
std::cin.unsetf(std::ios::skipws);
std::istream_iterator<char> it(std::cin), pte;
const std::string input(it, pte);
// set up parse iterators
auto f(begin(input)), l(end(input));
#else
const std::string input("foo([1, 2, 3], \"some more stuff\")");
// set to start of JSON
auto f(begin(input)), l(end(input));
std::advance(f, 4);
#endif
bool ok = tryParseAsJson(f, l); // updates f to point after the end of valid JSON
if (ok)
std::cout << "Non-JSON part of input starts after valid JSON: '" << std::string(f, l) << "'\n";
return ok? 0 : 255;
}
这篇关于使用QJsonDocument将子字符串解析为JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!