Matlab正则表达式-仅在尖括号内替换子字符串 [英] Matlab regex - replace substring ONLY within angled brackets

查看:290
本文介绍了Matlab正则表达式-仅在尖括号内替换子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到一种方法来替换用空格包围的字符串s中的字符_.唯一困难的部分是,仅当在尖括号内找到子字符串时,我才尝试替换子字符串.例如(空格是有意的):

I would like to find a way to find a way to replace the character _ inside string s with itself surrounded by spaces. The only difficult part is that I'm trying to replace the substring only if the substring is found inside angled brackets. For example (spaces are intentional):

s= 'the quick <_brown _little_fox >, jumped over_the_fence .

wantedresult = the quick < _ brown _ little _ fox>, jumped over_the_fence.

如果两边已经有空间了,那么增加多余的空间就很好了.

If there is already space on either side, getting extra spaces is perfectly fine.

我尝试了以下方法,但是没有运气: regexprep(s, <[\w ]+(\_)[\w ]+>', ' $1 ');

I tried the following but to no luck: regexprep(s, <[\w ]+(\_)[\w ]+>', ' $1 ');

我想我理解以上原因不起作用的原因,但是考虑到括号中可能有多个_,所以我不确定如何进行操作.

I think I understand the reason why the above doesn't work, but considering that more than one _ may be inside brackets, I'm not sure how to proceed.

任何建议将不胜感激.预先感谢!

Any advice would be appreciated. Thanks in advance!

推荐答案

为什么不尝试在单个regex中执行此操作,为什么不尝试提取所有在<>之间的字符串,而是查找那些有_字符并用空格括起来,然后重新构造最终的字符串?我会先使用regexp来查找字符串中包含<>的字符串的那些区域,然后执行我在上面提到的内容....因此,请首先执行以下操作:

Instead of trying to do it in a single regex, why don't you try extracting all strings that are between <>, find those characters that have the _ character and surround those characters with spaces, then reconstruct the final string? I would use regexp first to find those areas in your string that have <> surrounding the string, then do what I mentioned above.... so do this first:

[st, en, match] = regexp(s, '<.*?>', 'start', 'end', 'match')

这将找到所有在子字符串中包含<>的字符串. startend标志确定字符串中开始和结束索引中的哪些索引与我们要查找的匹配.在我们的例子中,start告诉您每个<字符在哪里,而end告诉您每个>字符在哪里. match是字符串的单元格数组,与我们对<>子字符串的搜索匹配.它们分别存储在stenmatch中.完成后,让我们在match上执行regexprep,并在_字符之前和之后放置空格.

This finds all strings that have <> surrounding the substring. The start and end flags determine which indices in the start and ending indices in the string that match what we're looking for. In our case, start tells you where each < character is and end tells you where each > character is. match is a cell array of strings that matches our search for <> substrings. These are respectively stored in st, en and match. Once we're done, let's do a regexprep on match and put spaces before and after the _ characters.

final_match = regexprep(match, '_', ' _ ');

现在,要重建最终的字符串,我们首先将字符从头开始放置到第一个<出现,然后编写一个将所有内容组合在一起的循环,然后在找到最后一个>字符时,将所有字符加到最后....类似这样:

Now to reconstruct the final string, we first place the characters from the beginning up to the first < occurrence, then we'll write a loop that pieces everything all together, then when we find the last > character, add up all of the characters to the end.... so something like:

final_string = s(1:st(1)-1);
for idx = 1 : numel(final_match)-1
    final_string = [final_string final_match{idx}];
    final_string = [final_string s(en(idx)+1:st(idx+1)-1)];
end
final_string = [final_string final_match{end} s(en(end)+1:end)];

第一行从原始字符串开始填充,直到第一次出现<字符为止.接下来,对于<>字符(还包括那些字符)之间的每个子字符串,我们放置修改后的字符串,在_字符之间放置空格,然后访问这些字符在>字符之间.当前子字符串到下一个子字符串的<字符,然后重复此过程,直到命中最后一个<字符.敲到最后一个<字符后,我们将放置最后一个<>修改后的子字符串,最后将原始字符串的最后一个结尾.如果我们在您的示例中使用上面的代码,则会得到:

The first line takes stuff from the original string up until the first occurrence of the < character. Next, for each substring that is between the <> characters (also including those characters), we place our modified string that puts spaces in between the _ characters and then we access those characters in between the > character of the current substring to the < character of the next substring and we repeat this process until we hit the last < character. Once we hit this last < character, we place the final <> modified substring, and finally piece the last of the original string at the end. If we use the above code with your example, we get:

final_string = 

the quick < _ brown  _ little _ fox >, jumped over_the_fence.

如果我们修改字符串s,我们将得到:

If we modified the string s so we get:

s =

the quick <_brown _little_fox >, < _jumped _over _the_ fence>.

我用上面的代码得到的输出是:

The output I get with the above code is:

final_string =

the quick < _ brown  _ little _ fox >, <  _ jumped  _ over  _ the _  fence>.

如您所见,<>字符之间的所有单词在_字符之间都有空格.但是,这仅在至少有一个<>字符序列时才有效.如果不行,则上面的代码将不起作用.因此,您需要检查match(或startend)是否至少包含一个元素.如果是这样,则继续执行上面的代码.如果不是这样,只需吐出原始字符串即可.

As you can see, there are spaces in between the _ characters for all words that are between the <> characters. However, this only works if there is at least one sequence of <> characters. If it doesn't, then the above code doesn't work. As such, you'll need to check if match (or start or end) has at least one element. If it does, then proceed with the above code. If it doesn't, just spit out the original string.

这篇关于Matlab正则表达式-仅在尖括号内替换子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆