正则表达式来比较字符串,看看哪里是型差分 [英] Regex to compare string and see where is the differece

查看:162
本文介绍了正则表达式来比较字符串,看看哪里是型差分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建一个正则表达式来看看,如果在所有文件上的版权信息被正确格式化。

I am creating a regex to see if the copyright info at the top of all documents is formated correctly.

复制权长,所以我的正则表达式是长了。

The copy right is long therefore my regex is long too.

比方说是这个副本正确信息如下:

Lets say that the copy right info looks like:

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  Tono Nam

/////////////////////////////////////////////////////////////////////////*/

然后,我将使用正则表达式:

Then I will use the regex:

var pattern = 

@"/\*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  (?<ProgammerName>[\w '\.]+)

/////////////////////////////////////////////////////////////////////////\*/";

如果我申请了正则表达式的第一个文本,它会给我的比赛一切都很好。 的问题是,当正则表达式不匹配让我们说,一个程序员放置一个额外的 / 上方。我正则表达式将不再匹配。这个例子很简单的通知,但真正的版权是更长的时间,这将是很好,知道哪里是错误。或有时有拼写错误的错误。 例如,您可能会遇到PROGRAMER,而不是程序员。只是因为,我将要看看整个版权和尝试发现错误。我认为应该做的事情,我需要

If I apply the regex to the first text it will give me a match everything is great. the problem is when the regex does not matches Let's say that a programmer placed an extra / at the top. My regex will not match anymore. With this example it is simple to notice but the real copyright is much longer and it will be nice to know where is the error. Or sometimes there are mispelled errors. For example you might encounter Programer instead of Programmer. Just because of that I will have to look into the whole copyright and try to discover the error. I think there should be a simpler way of doing what I need

修改

如果这个问题恰好是:

/ * ////////// ////////////////////////////////////////////////// /////////////

/*/////////////////////////////////////////////////////////////////////////

版权内容这是很多去这里SOME_MISPELED_WORD。

Copyright content which is a lot goes in here SOME_MISPELED_WORD.

程序员:远野南

/////////////////////////// ////////////////////////////////////////////// * /

/////////////////////////////////////////////////////////////////////////*/

那么正则表达式不会因为 SOME_MISPELED_WORD 因此,我会想知道发生错误,这样我可以看看指数:

then the regex will not match because of SOME_MISPELED_WORD therefore I will like to know the index where the error occurred so that I can look at:

/ * ////////////// ////////////////////////////////////////////////// /////////

/*/////////////////////////////////////////////////////////////////////////

版权内容这是很多进去此处<这里--------------

Copyright content which is a lot goes in here <-------------- here

,而不是整个事情。

另一个例子是,如果版权信息是:

Another example would be if the copyright info is:

/ * /////////// ////////////////////////////////////////////////// ////////////

/*/////////////////////////////////////////////////////////////////////////

版权内容这是很多去在这里了。

Copyright content which is a lot goes in here.

程序员:远野南

///////////////////////////// ///////////////////////////////////////////// * /

//////////////////////////////////////////////////////////////////////////*/

我会想在最后一行得到一个错误,因为有一个额外的 /

I will like to get an error at the last line because there is an extra / .

推荐答案

基本上,我们希望知道在哪里正则表达式失败。如果我们到哪里都不会改变,我们将能够对它们进行比较,看看它是不同的字符串。换句话说,如果我哪里有:

Finally I have the solution:

Basically we want to know where the regex fails. If we where to have to strings that do not change we will be able to compare them and see the character where it is different. In other words if I where to have:

var a = "12345";
var b = "1234A";



那么,我们就可以比较 A [0] b [0] 然后 A [1] b [1] ,直到我们有一个区别。

then we could compare a[0] with b[0] then a[1] with b[1] until we have a difference.

让我们做到这一点!

让我们说我们的拷贝权必须看起来像

let's say our copy right must look like:

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/

让我们删除所有可能会有所不同,所以我们可以运用我们的第一个例子的事情:

let's remove all the things that can vary so we can apply our first example:

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

那么复杂,将创建一个正则表达式,将删除所有可能为了与该字符串结束了因事而异的唯一的事。这样的模式将是:

Then the only thing complicated will be to create a regex that will remove all the things that could vary in order to end up with that string. so that pattern will be:

 var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";



与模式,我们将能够把:

with that pattern we will be able to turn:

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam bla bla bla

Description:THIS IS A DIFFERENT DESCRIPTION

/////*/

INTO

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

现在我们有两个字符串比较!

Now we have two string to compare!

// the subject we want to test
            var subject =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/";

            // the actual pattern this should be a readonly constant type on a real program cause it never should change
            var pattern =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/";

            // we use this pattern to turn the first subject into the second if we can
            var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";

            // note $1 means group 1 so here we are basically removing the groups name and desc
            var newSubject = Regex.Replace(subject, regexPattern, "$1$2$3");

            // at this point if newSubject = pattern we know that the header is formatted correctly!

            // Let's see where they are different!
            for (int i = 0; i < pattern.Length; i++)
            {
                if (pattern[i] != newSubject[i])
                {
                    throw new Exception("There is a problem at index " + i);
                }
            }

在这个例子中它应该工作,因为我的主题是格式化正确。但如果我把一个额外的/在乞讨,然后看看会发生什么:(我强调了6 / 字符应该有5

on this example it should work because my subject is formated correctly. but if I place an extra / at the begging then look what happens: (I highlighted the 6 / chars there should have been 5

这篇关于正则表达式来比较字符串,看看哪里是型差分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆