如何通过正则表达式删除尾随注释? [英] How to remove trailing comments via regexp?
问题描述
对于不懂MATLAB的读者:不确定他们属于哪个家族,但是).更复杂的是,矩阵转置运算符也是 撇号(A'
(Hermitian)或A.'
(常规)).
For non-MATLAB-savvy readers: not sure what family they belong to, but the MATLAB regexes are described here in full detail. MATLAB's comment character is %
(percent) and its string delimiter is '
(apostrophe). A string delimiter inside a string is written as a double-apostophe ('this is how you write "it''s" in a string.'
). To complicate matters more, the matrix transpose operators are also apostrophes (A'
(Hermitian) or A.'
(regular)).
现在,由于黑暗的原因(我将不详细说明:),我试图用MATLAB自己的语言解释MATLAB代码.
Now, for dark reasons (that I will not elaborate on :), I'm trying to interpret MATLAB code in MATLAB's own language.
当前,我正在尝试删除字符串单元格数组中的所有结尾注释,每个字符串包含一行MATLAB代码.乍一看,这似乎很简单:
Currently I'm trying to remove all trailing comments in a cell-array of strings, each containing a line of MATLAB code. At first glance, this might seem simple:
>> str = 'simpleCommand(); % simple trailing comment';
>> regexprep(str, '%.*$', '')
ans =
simpleCommand();
但是,当然可能会出现类似的情况:
But of course, something like this might come along:
>> str = ' fprintf(''%d%*c%3.0f\n'', value, args{:}); % Let''s do this! ';
>> regexprep(str, '%.*$', '')
ans =
fprintf(' %// <-- WRONG!
很显然,我们需要从匹配中排除字符串中所有的注释字符,同时还要考虑到紧跟一条语句的单个撇号(或点撇号)是 operator ,而不是字符串定界符.
Obviously, we need to exclude all comment characters that reside inside strings from the match, while also taking into account that a single apostrophe (or a dot-aposrotphe) directly following a statement is an operator, not a string delimiter.
基于这样的假设:注释前的字符 之前的字符串开头/结尾字符的数量必须为 even (由于矩阵转置,我知道这是不完整的运算符),我想出了以下动态正则表达式来处理这种情况:
Based on the assumption that the amount of string opening/closing characters before the comment character must be even (which I know is incomplete, because of the matrix-transpose operator), I conjured up the following dynamic regex to handle this sort of case:
>> str = {
'myFun( {''test'' ''%''}); % let''s '
'sprintf(str, ''%*8.0f%*s%c%3d\n''); % it''s '
'sprintf(str, ''%*8.0f%*s%c%3d\n''); % let''s '
'sprintf(str, ''%*8.0f%*s%c%3d\n''); '
'A = A.'';%tight trailing comment'
};
>>
>> C = regexprep(str, '(^.*)(?@mod(sum(\1==''''''''),2)==0;)(%.*$)', '$1')
但是
C =
'myFun( {'test' '%'}); ' %// sucess
'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess
'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess
'sprintf(str, '%*8.0f%*s%c' %// FAIL
'A = A.';' %// success (although I'm not sure why)
所以我几乎在这里,但还不是很:em
so I'm almost there, but not quite yet :)
不幸的是,我已经花了很多时间思考这个问题,需要继续做其他事情,所以也许其他人有更多的时间很友善地思考这些问题:
Unfortunately I've exhausted the amount of time I can spend thinking about this and need to continue with other things, so perhaps someone else who has more time is friendly enough to think about these questions:
- 字符串中的注释字符是否是我需要关注的 only 例外?
- 正确和/或更有效的方法是什么?
- Are comment characters inside strings the only exception I need to look out for?
- What is the correct and/or more efficient way to do this?
推荐答案
这通过检查在一个字符之前允许哪些字符来匹配共轭转置大小写
This matches conjugate transpose case by checking what characters are allowed before one
- 数字
2'
- 字母
A'
- 点
A.'
- 左括号,括号和括号
A(1)'
,A{1}'
和[1 2 3]'
- Numbers
2'
- Letters
A'
- Dot
A.'
- Left parenthesis, brace and bracket
A(1)'
,A{1}'
and[1 2 3]'
这些是我现在唯一想到的情况.
These are the only cases I can think of now.
C = regexprep(str, '^(([^'']*''[^'']*''|[^'']*[\.a-zA-Z0-9\)\}\]]''[^'']*)*[^'']*)%.*$', '$1')
在您的示例中,我们返回了
on your example we it returns
>> C = regexprep(str, '^(([^'']*''[^'']*''|[^'']*[\.a-zA-Z0-9\)\}\]]''[^'']*)*[^'']*)%.*$', '$1')
C =
'myFun( {'test' '%'}); '
'sprintf(str, '%*8.0f%*s%c%3d\n'); '
'sprintf(str, '%*8.0f%*s%c%3d\n'); '
'sprintf(str, '%*8.0f%*s%c%3d\n'); '
'A = A.';'
这篇关于如何通过正则表达式删除尾随注释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!