如何使用C#,WinForm在文本中找到第二长的单词 [英] How to find the 2nd longest word(s) in the text using C#, WinForm
问题描述
大家好,
我想知道如何在文中找到第二长的单词?
以下是计算字母和单词的代码,但我无法弄清楚如何将它们放在一起以获得第二长单词(红色和)的结果......
我不熟悉编程这么简单的编码(必须在C#中)会很棒:)
提前谢谢大家〜
点菜信件
Hi all,
I'm wondering how to find the 2nd longest words in the text?
The following is the code for counting letters and words but I can't figure out how to put them together to get the out come of the 2nd longest words(red, and)...
I'm new to programming so simpler coding (must be in C#) would be great :)
Thank you all in advance~
Counting Letters
int count = 0;
string st = "I like apples. I like red apples. I like red apples and green bananas.";
foreach (char c in st)
{
if (char.IsLetter(c))
{
count++;
}
}
lblNumOfLetters.Text = count.ToString();
计数单词
Counting Words
string st = "I like apples. I like red apples. I like red apples and green bananas.";
char[] sep = { ' ' };
String[] res = st.Split(sep);
lblNumOfWords.Text = res.Length.ToString();
推荐答案
解决问题的一种方法像这样被称为分而治之:将问题分解为功能块:一次实现和测试一个块。我认为你可以通过将它分成三个任务来处理这个问题:
1.从我们现在的信息中可以清楚地看出,你需要做一些在分析哪些单词是唯一的并且第二长之前,对字符串进行预处理。必须将多个空白区域更改为一个空白区域...或忽略;必须删除标点符号和其他特殊字符。
2.然后你要消除字符串中的重复单词。 br />
3.最后,你想得到字符串中清理过的唯一单词的长度,得到长度等于字符串长度的单词字符串中第二个最长的单词。
注意你可以从处理任何这些任务开始;您可以创建适当的样本数据进行测试,并将其清除,或删除重复项,以便在任务2,3中使用。
让我们关注任务1:
虽然你可以在Linq做一些奇特的东西来处理多个角色和多个白空间变化到一个空白区域,但我认为简单可能在这里更好;我们将使用StringBuilder来提高处理字符的效率。
你已经知道你需要有一个循环并逐个字符地遍历字符串。
One way to approach a problem like this is called "divide-and-conquer:" break the problem into functional chunks: implement, and test, one chunk at a time. I think you can deal with this problem by dividing it up into three tasks:
1. It's clear from the information we have now that you need to do some pre-processing of the string before you analyze which words are unique and "second longest." Multiple-white-space has to be changed to one white-space ... or ignored; and, punctuation, and other special characters, must be removed.
2. Then you want to eliminate duplicate words in the string.
3. Finally, you want to get the lengths of the cleaned-up unique words in the string, and get the words with length equal to the length of a second longest word in the string.
Note you could start with working on any of these tasks; you could create appropriate sample data to test with that was cleaned-up, or had duplicates removed, to use in tasks 2, 3, for example.
Let's focus on task 1:
While you could do some fancy stuff in Linq to handle multi-character and multi-white-space change to one white-space, I think simple may be better here; we'll use a StringBuilder for efficiency in dealing with characters.
You already know you need to have a loop and go through the whole string character by character.
private string StripWords(string theString, bool doRemovePunctuation, params char[] otherCharsToRemove)
{
StringBuilder sb = new StringBuilder();
// keep track of whether the last space was white-space
// note it may be white-space now because we changed
// it to white-space in the code below
bool currentCIsWhiteSpace = false, lastCIsWhiteSpace = false;
char chr;
foreach (var currentc in theString)
{
// can we take a short-cut here ?
if (lastCIsWhiteSpace && char.IsLetterOrDigit(currentc))
{
// what do you need to update here
// so that you continue the loop iteration ?
//continue;
}
lastCIsWhiteSpace = currentCIsWhiteSpace;
currentCIsWhiteSpace = Char.IsWhiteSpace(currentc);
// if we are removing punctuation: replace with space
// remove other optional chars: replace with space
chr = currentc;
if (currentCIsWhiteSpace
|| (doRemovePunctuation && Char.IsPunctuation(currentc))
|| othersCharsToRemove.Contains(currentc))
{
chr = ' ';
currentCIsWhiteSpace = true;
}
sb.Append(chr);
}
return sb.ToString();
}
如果我们正确完成此任务,那么使用如下字符串:
If we get this task right, then with a string like:
string testString = @"I like <apples>. I treasure: mango, pineapple, lychee. I like rambutan, and green bananas; durian stinks";
并调用:
string cleanString = StripWords(testString, true,':','<','>');
我们应该得到这样的输出:
We should get output like this:
"I like apples I treasure mango pineapple lychee I like rambutan and green bananas durian stinks"
关于分隔符列表的讨论仍然有效。我同意BillWoodruff关于重复的问题。如您所见apples
word被迭代3次。如果您想获得第三个或第四个最长的单词,结果将是apple
。为什么?查看返回单词列表:
The discussion about the list of separators is still active. I agreee with BillWoodruff about the duplicates. As you can seeapples
word is iterated 3 times. In case you want to get third or fourth longest word, the result will beapple
. Why? Have a look at the list of returned words:
1 - bananas
2 - apples
3 - apples
4 - apples
5 - green
6 - like
7 - like
8 - like
9 - red
10 - red
11 - and
12 - I
13 - I
14 - I
当我们删除重复的单词时,结果列表应如下所示:
When we remove duplicated words, the result list should looks like:
1 - bananas
2 - apples
3 - green
4 - like
5 - red
6 - and
7 - I
And now the time for sample code. It uses Linq[^]:
string st = "I like apples. I like red apples. I like red apples and green bananas.";
char[] sep = new char[]{'.',',',' '};
string secondLongestWord = (from words in st.Split(sep).Distinct().ToArray()
orderby words.Length descending
select words).Take(2).Last().ToString();
Console.WriteLine("Second longest word is: {0}" , secondLongestWord);
结果:
Result:
Second longest word is: apples
关于代码的解释很少(见评论)。
Few words of explanation about the code (see comments).
//get the array of non-duplicated words, splited by defined characters
(from words in st.Split(sep).Distinct().ToArray()
//order by the length of text
orderby words.Length descending
//list words, returns IOrderedEnumerable<String>
select words)
//get only 2 rows
.Take(2)
//get last row, in this case the second one;)
.Last()
//return string
.ToString()
如需了解更多信息,请参阅:
Take() [ ^ ]
最后( ) [ ^ ]
最后注释: 平均值ei找到了另一个解决方案,使用 Regex类 [ ^ ]。
使用正确的 pattern [ ^ ]我们可以删除标点字符和数字。只需将从
子句替换为:
For futher information, please see:
Take()[^]
Last()[^]
Final note: In a meanwhile i found another solution, using Regex class[^].
Using proper pattern[^] we are able to remove punctuation characters and numbers. Just replace from
clause with:
from words in Regex.Split(st, @"\W").Distinct().ToArray()
并删除此行:
and remove this line:
char[] sep = new char[]{'.',',',' '};
感谢BillWoodruff [ ^ ]有价值的评论。
万一当必须有一个第二长词列表时,解决方案是:
Thanks to BillWoodruff[^] for valuable comment.
In case when there must be a list of "second-longest-words", the solution is:
string st = "I like apples :) I like red apples :P I'd like to eat 1 red apple and 5 (five) yellow bananas.";
int secondvalue = (from words in Regex.Split(st, @"\W").Distinct().ToArray()
orderby words.Length descending
select words.Length).Take(2).Last();
Console.WriteLine("Second length of word is: {0}", secondvalue);
var qry = from words in Regex.Split(st, @"\W").Distinct().ToArray()
where words.Length == secondvalue
select words;
Console.WriteLine();
Console.WriteLine("List of words:");
foreach (string word in qry)
{
Console.WriteLine("{0}", word);
}
我建议写扩展方法;)
扩展方法(C#编程指南) [ ^ ]
如何:为LINQ查询添加自定义方法 [ ^ ]
您可以通过添加更多分隔符(例如标点符号)并指定 http://msdn.microsoft.com/en-us/library/system.stringsplitoptions( v = vs.110).aspx [ ^ ]
然后迭代t他得出并比较每个字符串的长度。
You can improve the second by adding more separators (e.g. punctuation) and by specifying http://msdn.microsoft.com/en-us/library/system.stringsplitoptions(v=vs.110).aspx[^]
Then iterate the results and compare the Lengths of each string.
这篇关于如何使用C#,WinForm在文本中找到第二长的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!