C# - 将完全大写的字符串拆分为单独的单词(无空格) [英] C# - Split Fully Uppercase String Into Separate Words (No Spaces)

查看:39
本文介绍了C# - 将完全大写的字符串拆分为单独的单词(无空格)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在从事一个项目,我需要将单个单词与字符串分开.问题是字符串中的所有单词都大写并且没有空格.以下是程序正在接收的输入类型的示例:

Im currently working on a project where I will need to separate individual words from a string. The catch is that all the words in the string are capitalized and have no spaces. The following is an example of the kind of input the program is receiving:

计算机五色"

这应该分成以下结果:

电脑"五"代码"颜色"

"COMPUTER" "FIVE" "CODE" "COLOR"

到目前为止,我一直在使用以下方法来拆分我的字符串(它适用于除此边缘情况外的所有场景):

So far, I have been using the following method to split my strings (and it has worked for all scenarios except this edge case):

private static List<string> NormalizeSections(List<string> wordList)
        {
            var modifiedList = new List<string>();
            foreach (var word in wordList)
            {
                int index = wordList.IndexOf(word);
                var split = Regex.Split(word, @"(\p{Lu}\p{Ll}+)").ToList();
                split.RemoveAll(i => i == "");

                modifiedList.AddRange(split);
            }
            return modifiedList;
        }

如果有人对如何处理这个问题有任何想法,我会很高兴听到他们的声音.另外,如果我能提供更多信息,请告诉我.

If anyone has any ideas on how to handle this, I would be more than happy to hear them. Also, please let me know if I can provide additional information.

推荐答案

我正在对您希望如何搜索匹配词做一些假设.首先,在给定的字符索引处,将优先考虑字典中最长的匹配词.其次,如果在给定的字符索引处没有找到单词,我们继续下一个字符并再次搜索.

I am making some assumptions on how you want to search for matching words. Firstly, at a given character index, preference will be given to the longest matching word in the dictionary. Secondly, if at a given character index no word is found, we move on to the next character and search again.

下面的实现使用 Trie 来索引所有有效单词的字典.我们不是遍历字典中的每个单词,而是遍历输入字符串中的每个字符,寻找最长的单词.

The implementation below uses a Trie to index the dictionary of all valid words. Rather than looping through each word in the dictionary, we then progress through each character in the input string, looking for the longest word.

我从这个非常方便的 SO 答案中取消了 C# 中的 Trie 实现:https://stackoverflow.com/a/6073004

I lifted the implementation of the Trie in C# from this very handy SO answer: https://stackoverflow.com/a/6073004

修复了添加作为现有词的子字符串的词时 Trie 中的错误,例如 Emergency 然后 Emerge.

代码可在 DotNetFiddle 上获得.

using System;
using System.Collections.Generic;

public class Program
{
    public static void Main()
    {

        var words = new[] { "COMPUTE", "FIVE", "CODE", "COLOR", "PUT", "EMERGENCY", "MERGE", "EMERGE" };

        var trie = new Trie(words);

        var input = "COMPUTEEMERGEFIVECODECOLOR";

        for (var charIndex = 0; charIndex < input.Length; charIndex++)
        {
            var longestWord = FindLongestWord(trie.Root, input, charIndex);

            if (longestWord == null)
            {
                Console.WriteLine("No word found at char index {0}", charIndex);
            }
            else
            {
                Console.WriteLine("Found {0} at char index {1}", longestWord, charIndex);

                charIndex += longestWord.Length - 1;
            }
        }

    }

    static private string FindLongestWord(Trie.Node node, string input, int charIndex)
    {
        var character = char.ToUpper(input[charIndex]);

        string longestWord = null;

        foreach (var edge in node.Edges)
        {
            if (edge.Key.ToChar() == character)
            {
                var foundWord = edge.Value.Word;

                if (!edge.Value.IsTerminal)
                {
                    var longerWord = FindLongestWord(edge.Value, input, charIndex + 1);

                    if (longerWord != null) foundWord = longerWord;
                }

                if (foundWord != null && (longestWord == null || edge.Value.Word.Length > longestWord.Length))
                {
                    longestWord = foundWord;
                }
            }
        }

        return longestWord;
    }
}

//Trie taken from: https://stackoverflow.com/a/6073004
public struct Letter
{
    public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static implicit operator Letter(char c)
    {
        return new Letter() { Index = Chars.IndexOf(c) };
    }
    public int Index;
    public char ToChar()
    {
        return Chars[Index];
    }
    public override string ToString()
    {
        return Chars[Index].ToString();
    }
}

public class Trie
{
    public class Node
    {
        public string Word;
        public bool IsTerminal { get { return Edges.Count == 0 && Word != null; } }
        public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
    }

    public Node Root = new Node();

    public Trie(string[] words)
    {
        for (int w = 0; w < words.Length; w++)
        {
            var word = words[w];
            var node = Root;
            for (int len = 1; len <= word.Length; len++)
            {
                var letter = word[len - 1];
                Node next;
                if (!node.Edges.TryGetValue(letter, out next))
                {
                    next = new Node();

                    node.Edges.Add(letter, next);
                }

                if (len == word.Length)
                {
                    next.Word = word;
                }

                node = next;
            }
        }
    }

}

输出为:

Found COMPUTE at char index 0
Found EMERGE at char index 7
Found FIVE at char index 13
Found CODE at char index 17    
Found COLOR at char index 21

这篇关于C# - 将完全大写的字符串拆分为单独的单词(无空格)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆