正则表达式 - 匹配类似IRC的参数? [英] Regular Expressions - Matching IRC-like parameters?
问题描述
我希望创建一个类似IRC的命令格式:
I am looking to create a IRC-like command format:
/commandname parameter1 "parameter 2" "parameter \"3\"" parameter"4 parameter\"5
哪些(理想情况下)会给我参数列表:
Which would (ideally) give me a list of parameters:
parameter1
parameter 2
parameter "3"
parameter"4
parameter\"5
现在从我所看到的,这不是在所有其他方法中都可以完成。
Now from what I have read, this isn't at all trivial and might as well be done in some other method.
思考?
以下是完成我需要的工作的C#代码:
Below is C# code that does the job I need:
public List<string> ParseIrcCommand(string command)
{
command = command.Trim();
command = command.TrimStart(new char[] { '/' });
command += ' ';
List<string> Tokens = new List<string>();
int tokenStart = 0;
bool inQuotes = false;
bool inToken = true;
string currentToken = "";
for (int i = tokenStart; i < command.Length; i++)
{
char currentChar = command[i];
char nextChar = (i + 1 >= command.Length ? ' ' : command[i + 1]);
if (!inQuotes && inToken && currentChar == ' ')
{
Tokens.Add(currentToken);
currentToken = "";
inToken = false;
continue;
}
if (inQuotes && inToken && currentChar == '"')
{
Tokens.Add(currentToken);
currentToken = "";
inQuotes = false;
inToken = false;
if (nextChar == ' ') i++;
continue;
}
if (inQuotes && inToken && currentChar == '\\' && nextChar == '"')
{
i++;
currentToken += nextChar;
continue;
}
if (!inToken && currentChar != ' ')
{
inToken = true;
tokenStart = i;
if (currentChar == '"')
{
tokenStart++;
inQuotes = true;
continue;
}
}
currentToken += currentChar;
}
return Tokens;
}
推荐答案
你已经展示了你的代码 - 这很好,但似乎你没有想过解析这样的命令是否合理:
You have shown your code - that's good, but it seems that you haven't thought about whether it is reasonable to parse the command like that:
- 首先,你的代码将在命令名和参数中允许换行符号。如果你假设换行符号是合理的永远不会在那里。
- 其次,
\
也需要像$ c一样进行转义$ c>,因为在参数末尾无法指定单个
\
而不会引起任何混淆。 - 第三,它是ab将命令名称解析为与参数相同的方式很奇怪 - 命令名称通常是确定和修复的,因此不需要允许灵活的方式来指定它。
- Firstly, your code will allow new line character inside the command name and parameters. It would be reasonable if you assume that new line character can never be there.
- Secondly,
\
also needs to be escaped like"
, since there will be no way to specify a single\
at the end of a parameter without causing any confusion. - Thirdly, it is a bit weird to have the command name parsed the same way as parameters - command names are usually per-determined and fixed, so there is no need to allow for flexible ways to specify it.
我无法想到JavaScript中的单行解决方案 general 。 JavaScript正则表达式缺少 \ G
,它断言最后一个匹配边界。所以我的解决方案将不得不处理字符串断言的开始 ^
并在匹配令牌时选择字符串。
I cannot think of one-line solution in JavaScript that is general. JavaScript regex lacks \G
, which asserts the last match boundary. So my solution will have to make do with beginning of string assertion ^
and chomping off the string as a token is matched.
(这里的代码不多,主要是评论)
(There is not much code here, mostly comments)
function parseCommand(str) {
/*
* Trim() in C# will trim off all whitespace characters
* \s in JavaScript regex also match any whitespace character
* However, the set of characters considered as whitespace might not be
* equivalent
* But you can be sure that \r, \n, \t, space (ASCII 32) are included.
*
* However, allowing all those whitespace characters in the command
* is questionable.
*/
str = str.replace(/^\s*\//, "");
/* Look-ahead (?!") is needed to prevent matching of quoted parameter with
* missing closing quote
* The look-ahead comes from the fact that your code does not backtrack
* while the regex engine will backtrack. Possessive qualifier can prevent
* backtracking, but it is not supported by JavaScript RegExp.
*
* We emulate the effect of \G by using ^ and repeatedly chomping off
* the string.
*
* The regex will match 2 cases:
* (?!")([^ ]+)
* This will match non-quoted tokens, which are not allowed to
* contain spaces
* The token is captured into capturing group 1
*
* "((?:[^\\"]|\\[\\"])*)"
* This will match quoted tokens, which consists of 0 or more:
* non-quote-or-backslash [^\\"] OR escaped quote \"
* OR escaped backslash \\
* The text inside the quote is captured into capturing group 2
*/
var regex = /^ *(?:(?!")([^ ]+)|"((?:[^\\"]|\\[\\"])*)")/;
var tokens = [];
var arr;
while ((arr = str.match(regex)) !== null) {
if (arr[1] !== void 0) {
// Non-space token
tokens.push(arr[1]);
} else {
// Quoted token, needs extra processing to
// convert escaped character back
tokens.push(arr[2].replace(/\\([\\"])/g, '$1'));
}
// Remove the matched text
str = str.substring(arr[0].length);
}
// Test that the leftover consists of only space characters
if (/^ *$/.test(str)) {
return tokens;
} else {
// The only way to reach here is opened quoted token
// Your code returns the tokens successfully parsed
// but I think it is better to show an error here.
return null;
}
}
这篇关于正则表达式 - 匹配类似IRC的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!