编码挑战:坏词过滤器 [英] Coding challenge: bad word filter

查看:87
本文介绍了编码挑战:坏词过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我们每周编码挑战中的第一个。每周一次,我们会发布一个简单的编程问题,发布最佳答案的人会赢得一件T恤。获胜者由投票决定,或者是评论,或者是他们答案的大胆。在你的答案中使用任何语言,除了亵渎。



想象一下,你允许你网站的用户发表评论,但有些用户,主要是澳大利亚人,得到他们的语言有点丰富多彩。你决定实现一个坏词过滤器,它将用一个更安全的单词替换句子中的某些单词。



坏词及其替换列表是



坏:poophead,PHB,得到

替换:p ** phead,boss,成为



所以我的PHB就是这样一个大佬。自从他的晋升应该是我的老板是这样的人之后,情况变得更糟。自从他的老板变得更糟。促销。



我们还必须允许喊叫。所以



我的PHB就是这样一个POOPHEAD!应该成为我的老板就是这样的P ** PHEAD!



奖励积分:



让我们变得更难。如果坏单词以*开头,则表示以该单词结尾的任何单词。如果它以星号结尾,则以任何单词开头。如果它以!结尾然后它意味着它应该匹配区分大小写。



坏词:大便*,PHB!,得到

替换:p ** p,老板,成为



我的PHB已经开始了他的新博客phblog.com。他是这样的MISBEGOTTEN POOPHEAD!



应该变成



我的老板已经开始了他的新博客phblog.com。他是这个MISBEGOTTEN P ** PHEAD!



我尝试了什么:



请记住:可以使用任何编程语言。

解决方案

是的!有机会使用正规表达而不召唤年长的神! :)



从一个将字符串转换为坏字的结构开始,考虑奖励积分规则:

< pre lang =C#> public struct BadWord
{
public BadWord( string 字)
{
if string .IsNullOrWhiteSpace(word)) throw new ArgumentNullException( nameof (word));

int startIndex = 0 ;
int length = word.Length;

// 跳过前导/尾随空格:
while (长度> 0 & & char .IsWhiteSpace(word [startIndex]))
{
startIndex ++;
length--;
}
while (长度> 0 && char .IsWhiteSpace(word [startIndex + length - 1 ]))
{
length--;
}

// 如果单词以!结尾,则表示区分大小写的匹配:
if (length > 0 && word [startIndex + length - 1 ] == ' !'
{
CaseSensitive = true ;
length--;
}
其他
{
CaseSensitive = false ;
}

// 如果单词以*结尾,请过滤任何内容从单词开始:
if (length > 0 && word [startIndex + length - 1 ] == ' *'
{
后缀= (= \\w * \\b?);
length--;
}
其他
{
后缀= \\b;
}

// 如果单词以*开头,请过滤任何内容以单词结尾:
if (length > 0 && word [startIndex] == ' *'
{
Prefix = (?< = \\\ \\\w *);
startIndex ++;
length--;
}
else
{
Prefix = \\b;
}

Word = length!= 0 ? word.Substring(startIndex,length): null ;
}

public string Word {获得; }
public string 前缀{ get ; }
public string 后缀{ get ; }
public bool CaseSensitive { get ; }

public Regex ToRegularExpression()
{
if string .IsNullOrWhiteSpace(Word)) return null ;

string pattern = Prefix + Regex.Escape(Word)+ Suffix;
var options = CaseSensitive? RegexOptions.ExplicitCapture:RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase;
return new 正则表达式(模式,选项);
}
}



然后是一个代表单个坏词及其替代品的类:

编辑:现在有了规范的一部分,客户忘了提! :)

  public  密封  class  WordReplacement 
{
public WordReplacement(BadWord word,字符串替换
{
如果 string .IsNullOrWhiteSpace(word.Word)) throw new ArgumentNullException( nameof (字));

Pattern = word.ToRegularExpression();
CaseSensitive = word.CaseSensitive;
替换=替换;

if (CaseSensitive || replacement == null || replacement.Any ( char .IsUpper))
{
Replacer =(Match m)= > 替换;
}
else
{
Replacer =(Match m)= > MatchCase(m.Value,Replacement);
}
}

public WordReplacement( string word, string replacement): this new BadWord(单词),替换)
{
}

public 正则表达式{获得; }
public string 替换{ get ; }
public bool CaseSensitive { get ; }
public MatchEvaluator Replacer { get ; }

public static string MatchCase( string wordToReplace, string replacement)
{
< span class =code-keyword> if
null == replacement) return < span class =code-keyword> string
.Empty;
if (wordToReplace.All( char .IsLower))返回替换;
if (wordToReplace.All( char .IsUpper)) return replacement.ToUpperInvariant();

char [] result = replacement.ToCharArray();
bool changed = false ;

if (wordToReplace.Length == replacement.Length)
{
for int index = 0 ; index < result.Length; index ++)
{
if char .IsUpper(wordToReplace [index]))
{
char c = result [index];
result [index] = char .ToUpperInvariant(c);
如果(结果[index]!= c)已更改= true ;
}
}
}
其他
{
if char .IsUpper(wordToReplace [ 0 ]))
{
char c = result [ 0 ];
结果[ 0 ] = char .ToUpperInvariant(c);
if (结果[ 0 ]!= c)changed = ;
}
if char .IsUpper(wordToReplace [wordToReplace.Length - < span class =code-digit> 1
]))
{
int index = result.Length - 1 ;
char c = result [index];
result [index] = char .ToUpperInvariant(c);
如果(结果[index]!= c)已更改= true ;
}
}

返回已更改? new string (结果):replacement;
}

public string 替换( string input)= > Pattern.Replace(input,Replacer);
}



最后,一个代表坏词替换列表的类:

  public   sealed   class  Clbuttifier2000 
{
public Clbuttifier2000(IEnumerable< KeyValuePair< string,string>> replacements)
{
Replacements = replacementments.Select(p = > new WordReplacement(p.Key,p.Value))。ToList()。AsReadOnly( );
}

public IReadOnlyList< WordReplacement>替换{获取; }

public string Clbuttify( string message)
{
if (!string.IsNullOrWhiteSpace(message))
{
foreach var 替换 替换)
{
message = replacement.Replace(message);
}
}

return 消息;
}
}



样本用法:

  var  filter =  new  Clbuttifier2000( new  Dictionary< string,string> 
{
[ poop *] = p ** p
[ PHB!] = boss
[ gotten] = 变为
});

string input = 我的PHB已经在他的新博客phblog.com上开始了。他是一个MISBEGOTTEN Poophead!;
string expected = 我的老板已成为在他的新博客phblog.com上开始。他是一个很糟糕的人!;
string actual = filter.Clbuttify(input);
Debug.Assert(actual == expected);


我正在线上阅读Python,然后偶然发现 pooping oops 我的意思是编码挑战,想到为什么不在Python上试试这个。这里再次来自 loo oops 我的意思是烤箱。

   
poop.py

by Peter Leow the pooper



import re

def functionStartWithPoop(m):
wordFound = m.group( 0

if wordFound [: 5 ]。lower()== ' poop *'
wordRepl = wordFound [ 0 ] + ' **' + wordFound [ 3 ] + wordFound [ 5 :]
其他 wordFound [:4] .lower()== '船尾':
wordRepl = wordFound [ 0 ] + ' **' + wordFound [ 3 ] + wordFound [ 4 :]

return wordRepl

def functionEndWithPoop(m):
wordFound = m.group( 0

if wordFound [ - 5 :]。lower()== ' * poop'
wordRepl = wordFound [: - 5 ] + wordFound [ - 4 ] + ' **' + wordFound [ - 1 ]
else wordFound [-4:]。下() =='poop':
wordRepl = wordFound [: - 4 ] + wordFound [ - 4 ] + ' **' + wordFound [ - 1 ]

返回 wordRepl

def main():
originalSentence = ' ' '
poop * ing在make * poop前面。
无论是大便*还是*大便,都只有pOoP!
一个POOPHEAD不能改变,但是一个惊呼的POOPHEAD!可以。'
' '

print ' 之前:'
print (originalSentence)
print ()
print ' After:'

没有!结束
patternStartWithPoop = r ' (?<!\ S)poop \ * ?[\ S] *'
patternEndWithPoop = r ' [\ S] * \ *?便便(?= [?!,。;]?


| [?!,。;]?\ s +)'

with!结束
patternStartWithPoopEndWithExclamation = r ' (?<!\ S)poop \ * 〔\S] *(= \s |!?

This is the first of our weekly coding challenges. Once a week we'll post a simple programming problem and the person who posts the best answer wins a T-shirt. The winner is decided by votes, or be comments, or be the audacity of their answer. Use any language in your answer except profanity.

Imagine that you allow users of your website to post comments, but some users, mainly the Australians, get a little colourful with their language. You decided to implement a Bad Word Filter that will replace certain words in a sentence with safer versions of that word.

The list of bad words and their replacements is

Bad: "poophead", "PHB", "gotten"
Replacement: "p**phead", "boss", "become"

So "My PHB is such a poophead. It's gotten worse since his promotion" should be "My boss is such a p**phead. It's become worse since his promotion".

We also have to allow shouting. So

"My PHB is such a POOPHEAD!" should become "My boss is such a P**PHEAD!"

Bonus points:

Let's make it harder. If the "bad" word starts with "*" then it means any word that ends with that word. If it ends with a star then any word starting with that. If it ends with an "!" then it means that it should do the match case sensitive.

Bad words: "poop*", "PHB!", "gotten"
Replacement: "p**p", "boss", "become"

"My PHB has started his new blog phblog.com. He's SUCH A MISBEGOTTEN POOPHEAD!"

should become

"My boss has started his new blog phblog.com. He's SUCH A MISBEGOTTEN P**PHEAD!"

What I have tried:

Remember: any programming language can be used.

解决方案

Yay! A chance to use Regular Expressions without summoning the elder gods! :)

Start with a structure to convert a string to a bad word, taking the "bonus points" rules into account:

public struct BadWord
{
    public BadWord(string word)
    {
        if (string.IsNullOrWhiteSpace(word)) throw new ArgumentNullException(nameof(word));
        
        int startIndex = 0;
        int length = word.Length;
        
        // Skip leading / trailing white-space:
        while (length > 0 && char.IsWhiteSpace(word[startIndex]))
        {
            startIndex++;
            length--;
        }
        while (length > 0 && char.IsWhiteSpace(word[startIndex + length - 1]))
        {
            length--;
        }
        
        // If the word ends with "!", then it's a case-sensitive match:
        if (length > 0 && word[startIndex + length - 1] == '!')
        {
            CaseSensitive = true;
            length--;
        }
        else
        {
            CaseSensitive = false;
        }
        
        // If the word ends with "*", filter anything starting with the word:
        if (length > 0 && word[startIndex + length - 1] == '*')
        {
            Suffix = "(?=\\w*\\b)";
            length--;
        }
        else
        {
            Suffix = "\\b";
        }
        
        // If the word starts with "*", filter anything ending with the word:
        if (length > 0 && word[startIndex] == '*')
        {
            Prefix = "(?<=\\b\\w*)";
            startIndex++;
            length--;
        }
        else
        {
            Prefix = "\\b";
        }
        
        Word = length != 0 ? word.Substring(startIndex, length) : null;
    }
    
    public string Word { get; }
    public string Prefix { get; }
    public string Suffix { get; }
    public bool CaseSensitive { get; }
    
    public Regex ToRegularExpression()
    {
        if (string.IsNullOrWhiteSpace(Word)) return null;
        
        string pattern = Prefix + Regex.Escape(Word) + Suffix;
        var options = CaseSensitive ? RegexOptions.ExplicitCapture : RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase;
        return new Regex(pattern, options);
    }
}


Then a class to represent a single bad word and its replacement:
EDIT: Now with the part of the spec the "customer" forgot to mention! :)

public sealed class WordReplacement
{
    public WordReplacement(BadWord word, string replacement)
    {
        if (string.IsNullOrWhiteSpace(word.Word)) throw new ArgumentNullException(nameof(word));
        
        Pattern = word.ToRegularExpression();
        CaseSensitive = word.CaseSensitive;
        Replacement = replacement;
        
        if (CaseSensitive || replacement == null || replacement.Any(char.IsUpper))
        {
            Replacer = (Match m) => Replacement;
        }
        else
        {
            Replacer = (Match m) => MatchCase(m.Value, Replacement);
        }
    }
    
    public WordReplacement(string word, string replacement) : this(new BadWord(word), replacement)
    {
    }
    
    public Regex Pattern { get; }
    public string Replacement { get; }
    public bool CaseSensitive { get; }
    public MatchEvaluator Replacer { get; }
    
    public static string MatchCase(string wordToReplace, string replacement)
    {
        if (null == replacement) return string.Empty;
        if (wordToReplace.All(char.IsLower)) return replacement;
        if (wordToReplace.All(char.IsUpper)) return replacement.ToUpperInvariant();
        
        char[] result = replacement.ToCharArray();
        bool changed = false;
        
        if (wordToReplace.Length == replacement.Length)
        {
            for (int index = 0; index < result.Length; index++)
            {
                if (char.IsUpper(wordToReplace[index]))
                {
                    char c = result[index];
                    result[index] = char.ToUpperInvariant(c);
                    if (result[index] != c) changed = true;
                }
            }
        }
        else
        {
            if (char.IsUpper(wordToReplace[0]))
            {
                char c = result[0];
                result[0] = char.ToUpperInvariant(c);
                if (result[0] != c) changed = true;
            }
            if (char.IsUpper(wordToReplace[wordToReplace.Length - 1]))
            {
                int index = result.Length - 1;
                char c = result[index];
                result[index] = char.ToUpperInvariant(c);
                if (result[index] != c) changed = true;
            }
        }
        
        return changed ? new string(result) : replacement;
    }
    
    public string Replace(string input) => Pattern.Replace(input, Replacer);
}


And finally, a class to represent a list of bad word replacements:

public sealed class Clbuttifier2000
{
    public Clbuttifier2000(IEnumerable<KeyValuePair<string, string>> replacements)
    {
        Replacements = replacements.Select(p => new WordReplacement(p.Key, p.Value)).ToList().AsReadOnly();
    }
    
    public IReadOnlyList<WordReplacement> Replacements { get; }
    
    public string Clbuttify(string message)
    {
        if (!string.IsNullOrWhiteSpace(message))
        {
            foreach (var replacement in Replacements)
            {
                message = replacement.Replace(message);
            }
        }
        
        return message;
    }
}


Sample usage:

var filter = new Clbuttifier2000(new Dictionary<string, string>
{
    ["poop*"] = "p**p",
    ["PHB!"] = "boss",
    ["gotten"] = "become",
});

string input = "My PHB has gotten started on his new blog phblog.com. He's SUCH A MISBEGOTTEN Poophead!";
string expected = "My boss has become started on his new blog phblog.com. He's SUCH A MISBEGOTTEN P**phead!";
string actual = filter.Clbuttify(input);
Debug.Assert(actual == expected);


I was reading up on Python online, then chanced upon this pooping oops I mean coding challenge, thought why not try this out on Python. Here it is fresh from the loo oops again I mean oven.

"""
poop.py

by Peter Leow the pooper

"""

import re

def functionStartWithPoop(m):
    wordFound = m.group(0)

    if wordFound[:5].lower()=='poop*':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[5:]
    else: #wordFound[:4].lower()=='poop':
        wordRepl = wordFound[0] + '**' + wordFound[3] + wordFound[4:] 

    return wordRepl

def functionEndWithPoop(m):
    wordFound = m.group(0)

    if wordFound[-5:].lower()=='*poop':
        wordRepl = wordFound[:-5] + wordFound[-4] + '**' + wordFound[-1]
    else: #wordFound[-4:].lower()=='poop':
        wordRepl = wordFound[:-4] + wordFound[-4] + '**' + wordFound[-1]

    return wordRepl

def main():
    originalSentence = '''
    poop*ing is in front of make*poop.
    Whether poop* or *poop, there are just pOoP!
    A POOPHEAD cannot change but an exclaimed POOPHEAD! can.'''

    print('Before:')
    print(originalSentence)
    print()
    print('After:')
    
    # Without ! ending
    patternStartWithPoop=r'(?<!\S)poop\*?[\S]*'
    patternEndWithPoop=r'[\S]*\*?poop(?=[?!,.;]?


|[?!,.;]?\s+)' # with ! ending patternStartWithPoopEndWithExclamation = r'(?<!\S)poop\*?[\S]*!(?=\s|


这篇关于编码挑战:坏词过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆