如何在c#中的字符串中找到重复的单词? [英] How can i find repeated words in a string in c#?

查看:126
本文介绍了如何在c#中的字符串中找到重复的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





i有一系列字符紧跟着相同的序列然后我希望程序删除重复的单词,如字符串'' abcdabcd ''我需要 abcd



我该怎么办呢。



谢谢inadvance

解决方案

正则表达式是你的朋友!

 使用系统; 
使用 System.Diagnostics;
使用 System.Text.RegularExpressions;
命名空间 ConsoleApplication16
{
class 程序
{
static readonly string [ ]测试= { abcdabcd xabcdabcd abcdabc xaaabcdabcd};
静态 readonly 正则表达式FindDup = new Regex( @ (。+)\ 1,RegexOptions.IgnoreCase);
静态 void Main( string [] args)
{
foreach string t 测试中)
{
MatchCollection allMatches = FindDup.Matches(t);
Trace.WriteLine( string .Format( {0}:{1},t,allMatches.Count));
}
}
}
}



结果:

 abcdabcd:1 
xabcdabcd:1
abcdabc:0
xaaabcdabcd:2





这也将在 allMatches 集合中识别匹配的字符串。



编辑:添加对Collin的回复的评论:

allMatches 中的每个匹配值包含信息关于 Groups 属性中的加倍文本。 Groups [0] 包含整个匹配的字符串(两个副本), Groups [1] 包含字符串单一副本。

如果您将上面的循环更改为:

  foreach  string  t  in 测试)
{
MatchCollection allMatches = FindDup.Matches(t);
Trace.WriteLine( string .Format( {0}:{1},t,allMatches.Count));
foreach (匹配项 in allMatches)
{
Trace。 WriteLine( string .Format( @ {0 }加倍,item.Groups [ 1 ]));
}
}



你会看到的。

如果目标是< b>删除重复,然后使用替换()正则表达式的方法将完成这项工作:

  string  t2 = FindDup.Replace(t, string  .Empty); 
Trace.WriteLine( string .Format( @ 最终:{0},t2));



当然,可以用一个不同的字符串代替 string.Empty


如果字符串只有一个字重复两次,那么你可以尝试下面



  string  str =   abcdabcd; 
string temp = str.Substring( 0 ,str.Length / 2 );


我对你要找的东西做了一些假设(你的问题并不完全清楚),但我认为这样做的伎俩。



您可以使用 System.Text.StringBuilder 然后使用构建的字符串拆分原始字符串。一旦你将所有被解析的项目作为字符串为空,这是你重复的字符串,你就会突破循环。



  string  val =   abcabcabc; 
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string result = string .Empty;
foreach var c in val)
{
sb.Append(c);
var 已解析= val.Split( new string [] {sb.ToString()},StringSplitOptions.None);
var stringFound =!parsed.Any(s = > s!= string .Empty); // 所有项目均为空

if (stringFound)
{
result = sb.ToString();
break ;
}
}





执行此操作后,重复的字符串将在中结果。请注意,算法在第一次出现后会中断,因为当sb包含原始字符串的所有字符时,它也符合条件。该算法将找到任意数量的重复但假设字符串仅包含重复序列。


Hi,

i have sequence of characters immediately followed by the same sequence then i want the program to remove the repeated words like the string ''abcdabcd'' i need abcd

how can i do that.

Thanks inadvance

解决方案

Regex is your friend!

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace ConsoleApplication16
{
  class Program
  {
    static readonly string[] Tests = { "abcdabcd", "xabcdabcd", "abcdabc", "xaaabcdabcd" };
    static readonly Regex FindDup = new Regex(@"(.+)\1", RegexOptions.IgnoreCase);
    static void Main(string[] args)
    {
      foreach (string t in Tests)
      {
        MatchCollection allMatches = FindDup.Matches(t);
        Trace.WriteLine(string.Format("{0}: {1}", t, allMatches.Count));
      }
    }
  }
}


Results:

abcdabcd: 1
xabcdabcd: 1
abcdabc: 0
xaaabcdabcd: 2



This will also identify what the matching strings are, in the allMatches collection.

EDIT: Add response to Collin''s comments:
Each of the Match values in the allMatches contains the information about the doubled text in the Groups property. Groups[0] contains the whole matched string (both copies), and Groups[1] contains the string of the single copy.
If you change the loop above to:

foreach (string t in Tests)
{
  MatchCollection allMatches = FindDup.Matches(t);
  Trace.WriteLine(string.Format("{0}: {1}", t, allMatches.Count));
  foreach (Match item in allMatches)
  {
    Trace.WriteLine(string.Format(@"  ""{0}"" is doubled", item.Groups[1]));
  }
}


You''ll see that.
If the objective is to remove the duplications, then use of Replace() method of the Regex will do the job:

string t2 = FindDup.Replace(t, string.Empty);
Trace.WriteLine(string.Format(@"Final: ""{0}""", t2));


Of course, a different string can be substituted in instead of string.Empty


If string has only one word repeated twice then you can try below

string str = "abcdabcd";
string temp = str.Substring(0, str.Length / 2);


I am making some assumptions on what you are looking for (your question isn''t entirely clear) but I think this does the trick.

You can use a System.Text.StringBuilder and then split the original string using the built up string. Once you have all of the parsed items as string empty that is your repeated string and you break out of the loop.

string val = "abcabcabc";
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string result = string.Empty;
foreach (var c in val)
{
   sb.Append(c);
   var parsed = val.Split(new string[] {sb.ToString()}, StringSplitOptions.None);
   var stringFound = !parsed.Any(s => s != string.Empty);//All items are empty
   
   if (stringFound)
   {
      result = sb.ToString();
      break;
   }
}



After this runs your string that is repeated will be in result. Note the algorithm breaks out after the first occurance because it will also meet the criteria when the sb contains all characters of the original string. This algorithm will find any number of it being repeated but assumes the string only contains the repeated sequence.


这篇关于如何在c#中的字符串中找到重复的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆