C#正则表达式如何将用户输入匹配到单词/短语数组 [英] c# regex how to match user's input to an array of words/phrases
问题描述
我有一个包含不同单词和短语的数组.用户将输入垃圾邮件,我应该检查数组中已有的单词和短语是否匹配.对于每个匹配项,得分将为+1,如果得分大于5,则可能是垃圾邮件的可能性是.
I have an array with different words and phrases. The user will input a spam message and I'm supposed to check whether there are any matches to the words and phrases already in the array. For each match the score will +1 and if the score is more than 5 then the possibility of it being a spam message is Yes.
我的分数虽然没有增加,但我不确定为什么.
My score doesn't increase though and I'm not sure why.
string[] spam = new string[] {"-different words and phrases provided by programmer"};
Console.Write("Key in an email message: ");
string email = Console.ReadLine();
int score = 0;
string pattern = "^\\[a-zA-Z]";
Regex expression = new Regex(pattern);
var regexp = new System.Text.RegularExpressions.Regex(pattern);
if (!regexp.IsMatch(email))
{
score += 1;
}
推荐答案
您可以使用 Linq 解决问题
// HashSet<String> is for better performance
HashSet<String> spamWords = new HashSet<String>(
"different words and phrases provided by programmer"
.Split(new Char[] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(word => word.ToUpper()));
...
String eMail = "phrases, not words and letters zzz";
...
// score == 3: "phrases" + "words" + "and"
int score = Regex
.Matches(eMail, @"\w+")
.OfType<Match>()
.Select(match => match.Value.ToUpper())
.Sum(word => spamWords.Contains(word) ? 1 : 0);
在此实现中,我以不区分大小写的方式查找垃圾邮件(因此And
,and
,AND
将被视为垃圾邮件).要考虑复数, ings (即word
,wording
),您必须使用 stemmer .
In this implementation I'm looking for spam words in case insensitive manner (so And
, and
, AND
will be count as spam words). To take plurals, ings (i.e. word
, wording
) into account you have to use stemmer.
这篇关于C#正则表达式如何将用户输入匹配到单词/短语数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!