查找最长的子串的字符串数组,阵列中的所有元素中删除 [英] Find longest substring in an array of strings and remove it from all the elements in the array

查看:144
本文介绍了查找最长的子串的字符串数组,阵列中的所有元素中删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的阵列,例如(尺寸是可变的):

  X = [10111,10122,10250,10113]
 

我需要找到那就是每个数组元素(在这种情况下,10)的子串的最长的字符串(它不需要是字符串的preFIX)。我必须从所有字符串中删除。对于本示例的输出将是:

  X = [111,222,250,113] //共同的价值=10
 

解决方案

该扩展查找最长最常见的子串(S)。需要注意的是1也包含每个字符串中甚至往往比10。 (C#只):

 公共静态类StringExtensions
{
    公共静态的IEnumerable<字符串> GetMostCommonSubstrings(这个IList的<字符串>字符串)
    {
        如果(字符串== NULL)
            抛出新ArgumentNullException(字符串);
        如果(strings.Any()|| strings.Any(S =>!string.IsNullOrEmpty(多个)))
            抛出新的ArgumentException(无字符串必须是空的,弦);

        VAR allSubstrings =新的名单,其中,名单,其中,串>>();
        的for(int i = 0; I< strings.Count;我++)
        {
            VAR子=新的名单,其中,串>();
            字符串str =字符串[我]
            对于(INT C = 0;℃下str.Length  -  1; C ++)
            {
                对于(INT CC = 1; C + CC< = str.Length; CC ++)
                {
                    串SUBSTR = str.Substring(C,CC);
                    如果(allSubstrings.Count< 1 || allSubstrings.Last()包含(SUBSTR))
                        substrings.Add(SUBSTR);
                }
            }
            allSubstrings.Add(子);
        }
        如果(allSubstrings.Last()。在任何())
        {
            VAR mostCommon = allSubstrings.Last()
                .GroupBy(海峡=> STR)
                .OrderByDescending(G => g.Key.Length)
                .ThenByDescending(G => g.Count())
                。选择(G => g.Key);
            返回mostCommon;
        }
        返回Enumerable.Empty<字符串>();
    }
}
 

现在很容易:

 字符串[]×=新的[] {10111,10122,10250,10113};
。串mostCommonSubstring = x.GetMostCommonSubstrings()FirstOrDefault();
如果(mostCommonSubstring!= NULL)
{
    的for(int i = 0; I< x.Length;我++)
        X [I] = X [i]于.Replace(mostCommonSubstring,);
}
Console.Write(的string.join(,中,x));
 

输出:

  111,122,250,113
 

<大骨节病> DEMO


修改:如果你只是想找个时间最长的公共子串不考虑发生的频率到帐户,您可以使用使用的 的HashSet&LT;字符串&GT;

 公共静态字符串GetLongestCommonSubstring(这个IList的&LT;字符串&GT;字符串)
{
    如果(字符串== NULL)
        抛出新ArgumentNullException(字符串);
    如果(strings.Any()|| strings.Any(S =&GT;!string.IsNullOrEmpty(多个)))
        抛出新的ArgumentException(无字符串必须是空的,弦);

    VAR commonSubstrings =新的HashSet&LT;字符串&GT;(字符串[0] .GetSubstrings());
    的foreach(在strings.Skip字符串str(1))
    {
        commonSubstrings.IntersectWith(str.GetSubstrings());
        如果(commonSubstrings.Count == 0)
            返回null;
    }
    返回commonSubstrings.OrderByDescending(S =&GT; s.Length)。首先();
}

公共静态的IEnumerable&LT;字符串&GT; GetSubstrings(此字符串str)
{
    如果(string.IsNullOrEmpty(STR))
        抛出新的ArgumentException(海峡不能为null或空,海峡);

    对于(INT C = 0;℃下str.Length  -  1; C ++)
    {
        对于(INT CC = 1; C + CC&LT; = str.Length; CC ++)
        {
            收益回报str.Substring(C,CC);
        }
    }
}
 

使用它以这种方式:

 字符串[]×=新的[] {101133110,101233210,102533010,101331310};
字符串longestCommon = x.GetLongestCommonSubstring(); //10
 

I have this array, for example (the size is variable):

x = ["10111", "10122", "10250", "10113"]

I need to find the longest string that is a substring of each array element ("10" in this case) (it need not to be a prefix of the strings). I have to remove it from all the strings. The output for this example would be:

x=["111","222","250","113"] //common value = "10"

解决方案

This extension finds the longest most common substring(s). Note that "1" is also contained in every string even more often than "10". (C# only):

public static class StringExtensions
{
    public static IEnumerable<string> GetMostCommonSubstrings(this IList<string> strings)
    {
        if (strings == null)
            throw new ArgumentNullException("strings");
        if (!strings.Any() || strings.Any(s => string.IsNullOrEmpty(s)))
            throw new ArgumentException("None string must be empty", "strings");

        var allSubstrings = new List<List<string>>();
        for (int i = 0; i < strings.Count; i++)
        {
            var substrings = new List<string>();
            string str = strings[i];
            for (int c = 0; c < str.Length - 1; c++)
            {
                for (int cc = 1; c + cc <= str.Length; cc++)
                {
                    string substr = str.Substring(c, cc);
                    if (allSubstrings.Count < 1 || allSubstrings.Last().Contains(substr))
                        substrings.Add(substr);
                }
            }
            allSubstrings.Add(substrings);
        }
        if (allSubstrings.Last().Any())
        {
            var mostCommon = allSubstrings.Last()
                .GroupBy(str => str)
                .OrderByDescending(g => g.Key.Length)
                .ThenByDescending(g => g.Count())
                .Select(g => g.Key);
            return mostCommon;
        }
        return Enumerable.Empty<string>();
    }
}

Now it's easy:

string[] x = new[] { "10111", "10122", "10250", "10113" };
string mostCommonSubstring = x.GetMostCommonSubstrings().FirstOrDefault();
if (mostCommonSubstring != null)
{
    for (int i = 0; i < x.Length; i++)
        x[i] = x[i].Replace(mostCommonSubstring, "");
}
Console.Write(string.Join(", ", x));

output:

111, 122, 250, 113

DEMO


Edit: If you just want to find the longest common substring without taking the frequency of occurrence into account you can use this optimzed approach(O(n) operation) using a HashSet<string>:

public static string GetLongestCommonSubstring(this IList<string> strings)
{
    if (strings == null)
        throw new ArgumentNullException("strings");
    if (!strings.Any() || strings.Any(s => string.IsNullOrEmpty(s)))
        throw new ArgumentException("None string must be empty", "strings");

    var commonSubstrings = new HashSet<string>(strings[0].GetSubstrings());
    foreach (string str in strings.Skip(1))
    {
        commonSubstrings.IntersectWith(str.GetSubstrings());
        if (commonSubstrings.Count == 0)
            return null;
    }
    return commonSubstrings.OrderByDescending(s => s.Length).First();
}

public static IEnumerable<string> GetSubstrings(this string str)
{
    if (string.IsNullOrEmpty(str))
        throw new ArgumentException("str must not be null or empty", "str");

    for (int c = 0; c < str.Length - 1; c++)
    {
        for (int cc = 1; c + cc <= str.Length; cc++)
        {
            yield return str.Substring(c, cc);
        }
    }
}

Use it in this way:

string[] x = new[] { "101133110", "101233210", "102533010", "101331310" };
string longestCommon = x.GetLongestCommonSubstring();  // "10"

这篇关于查找最长的子串的字符串数组,阵列中的所有元素中删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆