如何使用正则表达式获得子串的位置? [英] how to using regular expression for earn positions of substring?

查看:80
本文介绍了如何使用正则表达式获得子串的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含0和1个字符串的字符串。我如何获得与给定模式匹配的子串的位置(输入字符串中每个子字符串的开始和结束索引):1 * 100 * 11 *。

(11 * == 111111111 ...,00 * == 0000000 ....)



例如:

regexp pattern = 1 * 100 * 11 *



  string  str =   10100011101101 
index1 = { 0 2 }; // 101
index2 = { 2 8 }; // 1000111
index3 = { 6 11 }; // 111011
index4 = { 10 13 }; // 1101

解决方案

这不是正则表达式的全部 - 他们找到匹配项,然后丢弃构成比赛一部分的所有字符,然后检查还剩下什么。



他们不会为单个输入返回所有可能的匹配。

所以你的示例输入和匹配字符串只能获得两场比赛:

 101 
111011

因为第一场比赛被删除,只留下00011101101再次扫描。



我怀疑你需要编写自己的更复杂的代码才能找到它们,或者找到第一个,丢弃第一个匹配的字符,然后再次运行正则表达式,直到你的匹配用完。


你需要了解 RegEx ,阅读文档,尝试tutos。

文档将为您提供所有功能与regEx相关联。



要根据字符串测试表达式,可以尝试 https ://www.debuggex.com/ [ ^ ]



获得起始位置长度可能更容易比结束位置,但你可以从2中推断最后一个。


以下hack会产生类似于你的例子的结果:

  string  str =   10100011101101\" ; 
Regex rex = new 正则表达式( @ 1 * 100 * 11 *);
Console.WriteLine( string str = \{0} \ ,str);
int lastend = -1;
for int i = 0 ; i < str.Length; i ++)
{
匹配匹配= rex.Match(str.Substring(i));
if (match.Success)
{
int pos = match.Index + i;
int len = match.Length;
int end = pos + len - 1 ; // 可能会导致结束< pos if len == 0
if (lastend < end)
{
Console.WriteLine( index = {{{0},{1}}} ; // {2},pos,end,match.Value);
}
lastend = end;
}
}

如何告诉教师这是如何工作的? ; - )

 string str =10100011101101
index = {0,2}; // 101
index = {2,8} ; // 1000111
index = {6,11}; // 111011
index = {10,13}; // 1101

问候

岸堤


I have a string with 0 and 1 charachers.How do i get the positons of substrings(start and end index of each substring in a input string) that matched with a given pattern: 1*100*11*.
(11*==111111111... ,00*==0000000....)

example:
regexp pattern=1*100*11*

string str="10100011101101"
index1={0,2};//101
index2={2,8};//1000111
index3={6,11};//111011
index4={10,13};//1101

解决方案

That isn't what Regexes are all about - they find a match, then discard all the characters that form part of the match, and then check again on what is left.

They don't return "all possible matches" for a single input.
So your example input and match string will only get two matches:

101
111011

because the first match is removed, leaving only "00011101101" to scan again.

I suspect that you need to write your own rather more complicated code to find them all, or find the first, discard the first matched character, and then run the regex again until you run out of matches.


You need to learn about RegEx, read documentation, try tutos.
The documentation will give you all functions associated with regEx.

To test an expression against a string, you can try https://www.debuggex.com/[^]

It may be easier to get start position and length than end position, but you can deduce the last one from the 2 first.


The following hack produces a result similar to your examples:

string str = "10100011101101";
Regex rex = new Regex(@"1*100*11*");
Console.WriteLine("string str=\"{0}\"", str);
int lastend = -1;
for (int i = 0; i < str.Length; i++)
{
    Match match = rex.Match(str.Substring(i));
    if (match.Success)
    {
        int pos = match.Index + i;
        int len = match.Length;
        int end = pos + len - 1; // may result in end < pos if len == 0
        if (lastend < end)
        {
            Console.WriteLine("index={{{0},{1}}};//{2}", pos, end, match.Value);
        }
        lastend = end;
    }
}

How to tell the instructor how this works? ;-)

string str="10100011101101"
index={0,2};//101
index={2,8};//1000111
index={6,11};//111011
index={10,13};//1101

Regards
Andi


这篇关于如何使用正则表达式获得子串的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆