模拟php数组语言构造或使用regexp解析? [英] Simulate php array language construct or parse with regexp?
问题描述
我从外部来源获得
array(1,2,3)
而且还有更大的数组,例如
but also a larger arrays like
array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")
我需要它们成为php中的实际数组.我知道我可以使用eval,但由于它是不受信任的来源,所以我宁愿不这样做.我也无法控制外部资源.
I need them to be an actual array in php. I know I could use eval but since it are untrusted sources I'd rather not do that. I also have no control of the external sources.
我应该为此使用一些正则表达式(如果是,是什么)还是有其他方法?
Should I use some regular expressions for this (if so, what) or is there some other way?
推荐答案
虽然使用Tokenizer编写了一个解析器,但事实并非如我所愿,但我想到了另一个主意:为什么不使用eval
解析数组,但首先要验证它是否不含有害物质?
Whilst writing a parser using the Tokenizer which turned out not as easy as I expected, I came up with another idea: Why not parse the array using eval
, but first validate that it contains nothing harmful?
因此,代码的作用是:它根据一些允许的标记和字符检查数组的标记,然后执行eval.我确实希望我包括所有可能的无害标记,如果没有,只需添加它们即可. (我故意不包括HEREDOC和NOWDOC,因为我认为它们不太可能被使用.)
So, what the code does: It checks the tokens of the array against some allowed tokens and chars and then executes eval. I do hope I included all possible harmless tokens, if not, simply add them. (I intentionally didn't include HEREDOC and NOWDOC, because I think they are unlikely to be used.)
function parseArray($code) {
$allowedTokens = array(
T_ARRAY => true,
T_CONSTANT_ENCAPSED_STRING => true,
T_LNUMBER => true,
T_DNUMBER => true,
T_DOUBLE_ARROW => true,
T_WHITESPACE => true,
);
$allowedChars = array(
'(' => true,
')' => true,
',' => true,
);
$tokens = token_get_all('<?php '.$code);
array_shift($tokens); // remove opening php tag
foreach ($tokens as $token) {
// char token
if (is_string($token)) {
if (!isset($allowedChars[$token])) {
throw new Exception('Disallowed token \''.$token.'\' encountered.');
}
continue;
}
// array token
// true, false and null are okay, too
if ($token[0] == T_STRING && ($token[1] == 'true' || $token[1] == 'false' || $token[1] == 'null')) {
continue;
}
if (!isset($allowedTokens[$token[0]])) {
throw new Exception('Disallowed token \''.token_name($token[0]).'\' encountered.');
}
}
// fetch error messages
ob_start();
if (false === eval('$returnArray = '.$code.';')) {
throw new Exception('Array couldn\'t be eval()\'d: '.ob_get_clean());
}
else {
ob_end_clean();
return $returnArray;
}
}
var_dump(parseArray('array("a", "b", "c", array("1", "2", array("A", "B")), array("3", "4"), "d")'));
我认为这是安全性和便利性之间的完美折衷-无需剖析自己.
I think this is a good comprimise between security and convenience - no need to parse yourself.
例如
parseArray('exec("haha -i -thought -i -was -smart")');
会抛出异常:
Disallowed token 'T_STRING' encountered.
这篇关于模拟php数组语言构造或使用regexp解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!