字符出现在字符串数组中的最大次数 [英] Maximum number of occurrences a character appears in an array of strings

查看:89
本文介绍了字符出现在字符串数组中的最大次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C#中,给定数组:

string[] myStrings = new string[] {
  "test#test",
  "##test",
  "######", // Winner (outputs 6)
};

如何找到字符#出现在单个字符串中的最大次数?

我当前的解决方案是:

int maxOccurrences = 0;
foreach (var myString in myStrings)
{
    var occurrences = myString.Count(x => x == '#');
    if (occurrences > maxOccurrences)
    {
        maxOccurrences = occurrences;
    }
}

return maxOccurrences;

使用linq可以直接作用于myStrings[]数组吗?

这可以做成可以在任何IEnumerable<string>上使用的扩展方法吗?

解决方案

首先,让我们将字符串投影到具有匹配计数的序列中:

myStrings.Select(x => x.Count(x => x == '#')) // {1, 2, 6} in your example

然后选择最大值:

int maximum = myStrings
    .Select(s => s.Count(x => x == '#'))
    .Max(); // 6 in your example

让我们做一个扩展方法:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, char ch)
{
    return strings
        .Select(s => s.Count(c => c == ch))
        .Max();
}

但是有一个很大的如何.在C#中,您称为char的不是您在语言中所称的字符.其他帖子对此进行了广泛讨论,例如:将大文本拆分为较小块的最快方法如何通过字符比较来执行可识别Unicode的字符?然后,我将在这里不再赘述.要具有"Unicode意识",您需要使代码更加复杂(请注意,此处编写的代码未经测试):

private static IEnumerable<string> EnumerateCharacters(string s)
{
    var enumerator = StringInfo.GetTextElementEnumerator(s.Normalize());
    while (enumerator.MoveNext())
        yield return (string)enumerator.Value;
}

然后将我们的原始代码更改为:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, string character)
{
    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, StringComparison.CurrentCulture))
        .Max();
}

请注意,仅Max()要求集合不能为空(如果collection可能为空且不是错误,请使用DefaultIfEmpty()).要不要随意决定在这种情况下的处理方式(如果应该发生则抛出异常,或者只是返回0),则可以使此方法的专业性降低,并将此职责留给调用方:

public static int CountOccurrencesOf(this IEnumerable<string> strings,
    string character,
    StringComparison comparison = StringComparison.CurrentCulture)
{
    Debug.Assert(character.EnumerateCharacters().Count() == 1);

    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, comparison ));
}

像这样使用:

var maximum = myStrings.CountOccurrencesOf("#").Max();

如果需要,请区分大小写:

var maximum = myStrings.CountOccurrencesOf("à", StringComparison.CurrentCultureIgnoreCase)
    .Max();

您现在可以想象这种比较不仅限于某些深奥语言,而且还适用于不变文化(en-US),那么对于必须始终与不变文化进行比较的字符串,您应该指定StringComparison.InvariantCulture.不要忘记,您可能还需要为输入字符调用String.Normalize().

In C#, given the array :

string[] myStrings = new string[] {
  "test#test",
  "##test",
  "######", // Winner (outputs 6)
};

How can I find the maximum number of occurrences that the character # appears in a single string ?

My current solution is :

int maxOccurrences = 0;
foreach (var myString in myStrings)
{
    var occurrences = myString.Count(x => x == '#');
    if (occurrences > maxOccurrences)
    {
        maxOccurrences = occurrences;
    }
}

return maxOccurrences;

Is their a simplier way using linq that can act directly on the myStrings[] array ?

And can this be made into an extension method that can work on any IEnumerable<string> ?

解决方案

First of all let's project your strings into a sequence with count of matches:

myStrings.Select(x => x.Count(x => x == '#')) // {1, 2, 6} in your example

Then pick maximum value:

int maximum = myStrings
    .Select(s => s.Count(x => x == '#'))
    .Max(); // 6 in your example

Let's make an extension method:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, char ch)
{
    return strings
        .Select(s => s.Count(c => c == ch))
        .Max();
}

However there is a big HOWEVER. What in C# you call char is not what you call character in your language. This has been widely discussed in other posts, for example: Fastest way to split a huge text into smaller chunks and How can I perform a Unicode aware character by character comparison? then I won't repeat everything here. To be "Unicode aware" you need to make your code more complicate (please note code is wrote here then it's untested):

private static IEnumerable<string> EnumerateCharacters(string s)
{
    var enumerator = StringInfo.GetTextElementEnumerator(s.Normalize());
    while (enumerator.MoveNext())
        yield return (string)enumerator.Value;
}

Then change our original code to:

public static int CountMaximumOccurrencesOf(this IEnumerable<string> strings, string character)
{
    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, StringComparison.CurrentCulture))
        .Max();
}

Note that Max() alone requires collection to don't be empty (use DefaultIfEmpty() if collection may be empty and it's not an error). To do not arbitrary decide what to do in this situation (throw an exception if it should happen or just return 0) you can may make this method less specialized and leave this responsibility to caller:

public static int CountOccurrencesOf(this IEnumerable<string> strings,
    string character,
    StringComparison comparison = StringComparison.CurrentCulture)
{
    Debug.Assert(character.EnumerateCharacters().Count() == 1);

    return strings
        .Select(s => s.EnumerateCharacters().Count(c => String.Equals(c, character, comparison ));
}

Used like this:

var maximum = myStrings.CountOccurrencesOf("#").Max();

If you need it case-insensitive:

var maximum = myStrings.CountOccurrencesOf("à", StringComparison.CurrentCultureIgnoreCase)
    .Max();

As you can now imagine this comparison isn't limited to some esoteric languages but it also applies to invariant culture (en-US) then for strings that must always be compared with invariant culture you should specify StringComparison.InvariantCulture. Don't forget that you may need to call String.Normalize() also for input character.

这篇关于字符出现在字符串数组中的最大次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆