匹配字符串的完整单词与列表 [英] Matching compelete words of a string against a list

查看:211
本文介绍了匹配字符串的完整单词与列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望将完整句子的单词与单词列表进行匹配,如果不包含则继续下一步。但是,它也部分匹配。



例如,如果我的列表中包含单词'me',并且在句子中有一个单词'some'则会失败。



我尝试了什么:



我做了什么,



I want to match words of a complete sentence against a list of words, and if contain not to proceed to next step. However, it matches partially as well.

For example, if my list contain a word 'me' and in the sentence there is a word 'some' it fails.

What I have tried:

What I've done,

if not any(word in text_sentence.lower().strip() for word in banned_words):
    # Proceed to next level, as a valid sentence





一旦我通过像你一样的人它失败了,因为我的列表中包含一个单词我。验证只需打印单词。





Once I pass "Some one like you" it fails, as my list contain a word "me". Verified simply printing the words.

for item in banned_words:
    if item in slbot_tweet:
        print(item)

推荐答案

你需要打破将你的句子分成单个单词标记,使用所有形式的标点符号作为分隔符:引号,双引号,逗号,空格,点,冒号,分号,括号,感叹号,连字符以及用户可能键入的任何其他内容以掩盖它:我建议在任何不是数字的字母上打破它!



一旦你将字符串作为一个单词标记数组,你可以检查是否有它们位于禁止列表中。
You need to break your sentence down into individual word tokens, using all forms of punctuation as delimiters: quotes, double quotes, comma, space, dot, colon, semicolon, brackets, exclamation, hyphen, and anything else your user is likely to type to mask it: I'd suggest breaking it on anything that isn't a letter of number!

Once you have the string as an array of word tokens, you can check if any of them are in the "banned" list.


您的推文文本不是单词数组,而是一个字符数组。正如OriginalGriff所说,你需要先将它分解成一个合适的标记(单词)数组。使用 string.split() [ ^ ]方法。
Your tweet text is not an array of words, but an array of characters. As OriginalGriff says, you need to break it up into a proper array of tokens (words) first. Use the string.split()[^] method.


这篇关于匹配字符串的完整单词与列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆