如何使用Python正则表达式匹配MATLAB的函数语法? [英] How do I use a Python regex to match the function syntax of MATLAB?

查看:116
本文介绍了如何使用Python正则表达式匹配MATLAB的函数语法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在我们的内部库中找到所有MATLAB函数的所有输入/输出.我是新来的(第一次)正则表达式,并且一直在尝试在Python的re库中使用多行模式.

I am trying to find all the inputs/outputs of all MATLAB functions in our internal library. I am new (first time) to regex and have been trying to use the multiline mode in Python's re library.

MATLAB函数语法如下:

The MATLAB function syntax looks like:

function output = func_name(input)

签名可以跨越多行.

我从以下模式开始:

re.compile(r"^.*function (.*)=(.*)\([.\n]*\)$", re.M)

但是我一直收到不受支持的模板运算符错误.任何指针都表示赞赏!

but I keep getting an unsupported template operator error. Any pointer is appreciated!

现在我有

pattern = re.compile(r"^\s*function (.*?)= [\w\n.]*?\(.*?\)", re.M|re.DOTALL)

匹配项如下:

        function [fcst, spread] = ...
                VolFcstMKT(R,...
                           mktVol,...
                           calibrate,...
                           spread_init,...
                           fcstdays,...
                           tsperyear)

        if(calibrate)
            if(nargin < 6)
                tsperyear = 252;
            end
            templen = length(R)

我的问题是,为什么要给多余的行而不是在第一个)停下来?

My question is why does it give the extra lines instead of stopping at the first )?

推荐答案

如果通过re.T而不是re.M作为re.compile的第二个参数,则应该收到特殊的(内部)错误. c5>(当前未记录的条目)是旨在使用它的条目,并且简而言之,模板RE不支持重复或回溯.您可以在调用此re.compile之前print re.M在代码中显示其值吗?

The peculiar (internal) error you're getting should come if you pass re.T instead of re.M as the second argument to re.compile (re.template -- a currently undocumented entry -- is the one intended to use it, and, in brief, template REs don't support repetition or backtracking). Can you print re.M to show what's its value in your code before you call this re.compile?

一旦确定,我们可以讨论您想要的RE的详细信息(简而言之:如果input部分可以包含括号,那么您不走运,否则re.DOTALL以及对模式的一些重写会有所帮助)- -但是解决这种奇怪的内部错误似乎是优先考虑的事情.

Once that's fixed, we can discuss the details of your desired RE (in brief: if the input part can include parentheses you're out of luck, otherwise re.DOTALL and some rewriting of your pattern should help) -- but fixing this weird internal error occurrence seems to take priority.

编辑:在诊断出该错误(根据此问题下方的注释)之后,转到OP的当前问题:re.DOTALL|re.MULTINE,在模式末尾加"$" ,再加上到处贪婪的匹配项(对于非贪婪,使用.*而不是.*?),一起确保,如果正则表达式匹配,则将尽可能广泛地匹配.这正是此组合所要求的.也许最好用一个特定的例子来打开另一个Q:什么是输入,什么匹配,您希望匹配的正则表达式 是什么,等等.

Edit: with this bug diagnosed (as per the comments below this Q), moving on to the OP's current question: the re.DOTALL|re.MULTINE, plus the '$' at the end of the pattern, plus the everywhere-greedy matches (using .*, instead of .*? for non-greedy), all together ensure that if the regex matches it will match as broad a swathe as possible... that's exactly what this combo is asking for. Probably best to open another Q with a specific example: what's the input, what gets matched, what would you like the regex to match instead, etc.

这篇关于如何使用Python正则表达式匹配MATLAB的函数语法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆