-正则表达式公式问题- [英] -Regular Expression formula problem-

查看:80
本文介绍了-正则表达式公式问题-的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有26个文件,其中包含文本,我想从所有文件中删除一些[特殊词组].我现在要删除一组特定的文本.
我对使用正则表达式以外的其他解决方案很满意,但我希望能在这个方向上找到一个解决方案(如果可能的话).
---------------------------------------
样本:
< I>(îáëþäàõâððññòîðàíå)< /I>
< I>(÷åå,î-ë.of of)< /I>
<我< n< /I>
<我>áèáë.< /I>
---------------------------------------

我正在考虑在其上使用RegularExpressions,但我需要一个正则表达式来查找<我> ;,里面的任何单词,在找到< /I> ;.
我知道我可以使用@< I> \ w *",但进一步我无法想象任何可能的组合...

I have 26 files with text inside them and I want to remove some [special groups of words] from all of them. I have a specific group of text to remove, right now.
I''m comfortable with other solutions different than using regex, but I wish though to find a solution in this direction(if possible).
---------------------------------------
sample:
< I>(î áëþäàõ â ðåñòîðàíå)< /I>
< I>(÷åã,î-ë. — of)< /I>
< I>n< /I>
< I>áèáë.< /I>
---------------------------------------

I am thinking at using RegularExpressions on it but I need a regex formula for finding < I>, any word inside ,and stop After finding < /I>.
I know I can use @"< I>\w*" but further I can''t imagine any combination possible...

//obs: there is no space between < and I>; 
//i put it here because interfere with this  html page.
                     if (line[1].Contains("< I>"))
                     {
                        string[] segment = Regex.Split(line[1], "< I>");
                     }


(PS-我的英语不如母语(英语);我的C#水平也不那么高级.谢谢您的理解.)

---续:
我发现了一个不错的正则表达式片段,看起来很有希望:


(PS- my English is not as good as a native one; also my level in c# is not so advanced. Thank you for understanding.)

---continued:
I found a nice regex snippet that look promising:

"[^"]*"  [solution to match any string within double quotes]


现在,我正在研究正则表达式,要花一些时间才能熟悉它.在此之前,不幸的是,此案将继续进行.最后,我将其关闭.如果您在此期间发现有用的内容,我将进行仔细研究.谢谢.


Right now I am delving into regex, and it will took some time until I will familiarize with it. Until then this case will remain open unfortunately. In the end I will close it. If you will find something useful in the meantime, I will look over it. Thanks.

推荐答案

我希望这能给您完成这项工作的总体思路:
I hope this give you the general idea for doing the job :
string pattern = @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>";
Regex regex = new Regex(pattern, RegexOptions.Multiline);

StringBuilder sb = new StringBuilder();
sb.Append(@"<I>abc</I>");
sb.Append(@"<I>def</I>");
sb.Append(@"<I>gfi</I>");
sb.Append(@"<I>jkl</I>");
var input = sb.ToString();

var matches = regex.Matches(input);
for (int i = 0; i < matches.Count-1; i+=2)
    Console.WriteLine(input.Substring(matches[i].Index + matches[i].Length, matches[i].Index - matches[i].Index + matches[i].Length));



希望对您有帮助.



Hope It helps.


您好_q12_,

您的问题有点模棱两可.假设您有一个预定义的冗余条目列表,那么Regex并没有多大帮助.但是,尽管如此,followong可能会有所帮助:


Hello _q12_,

your question is a bit ambiguous. Assuming that you have a predefined list of redundant entries, Regex does not help a lot. But nonetheless, the followong might help:


static void Main(string[] args)
{
    List<string> redundant = new List<string>()
    {
        "abc",
        "xyz",
        "...",
    };
    string file = "datafileX.txt";

    string data = File.ReadAllText(file);
    data = ReplaceRedundantContent(data, redundant);
    File.WriteAllText(file, data);

}

private static string ReplaceRedundantContent(string data, List<string> redundant)
{
    string result = data;
    foreach (string remove in redundant)
    {
        // all characters to be taken literally
        string pattern = Regex.Escape("<I>"+remove+"</I>");
        result = Regex.Replace(result, pattern, "");
    }
    return result;
}



如果您要搜索< I>之间的任何文本,和</I> ;,您可以使用以下模式:



If you want to search for any text between the <I> and </I>, you may use the following pattern:

"<I>.*?</I>"


这将尽可能少地匹配所有文本,如问号所示.如果问号不存在,则匹配将为贪婪",也就是说,将尽可能多地进行匹配.

干杯

Andi


This matches all text by taking as little as possible, indicated by the question mark. If the question mark was not there, the match would be "greedy", meaning, that as much as possible is taken.

Cheers

Andi


_q12_写道:现在我不在乎是否可以使用正则表达式,我希望它简单-不复杂."

好的,现在您已经打开那扇门 :):尝试以下操作:
_q12_ wrote: "Right now I dont care if it can be done with or without regex, i want it simple - not complicated."

Okay, now that you''ve opened that door :) : Try this:
private string testString = @"
    < I>(î áëþäàõ â ðåñòîðàíå)< /I>
    < I>(÷åã,î-ë. — of)< /I>
    < I>n< /I>
    < I>áèáë.< /I>";

private string[] stringSeparators = new string[] { "< I>" };

private char[] charsToTrim = { '<', '/', '>' };

private List<string> cleanStrings = new List<string>();

// assumes you have a Button on a Form named 'button1
// with this Click EventHandler "wired-up"
private void button1_Click(object sender, EventArgs e)
{
    string[] splitTestString = testString.Split(stringSeparators, StringSplitOptions.RemoveEmptyEntries);

    foreach(string theStr in splitTestString)
    {
        cleanStrings.Add(theStr.Trim().TrimEnd(charsToTrim));

        // seeing is believing
        Console.WriteLine(cleanStrings.Last());
    }
}

p.s.我毫不怀疑,这里的专家"会进一步简化这一过程!

p.s. I have no doubt one of our "virtuosos" here will simplify this even further !


这篇关于-正则表达式公式问题-的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆