如何为“not"、“no"后面的字符串中的否定词添加标签和“从不" [英] How to add tags to negated words in strings that follow "not", "no" and "never"

查看:71
本文介绍了如何为“not"、“no"后面的字符串中的否定词添加标签和“从不"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将标签 NEG_ 添加到 notnonever 之后的所有单词,直到字符串中的下一个标点符号(用于情感分析)?我认为可以使用正则表达式,但我不确定如何使用.

How do I add the tag NEG_ to all words that follow not, no and never until the next punctuation mark in a string(used for sentiment analysis)? I assume that regular expressions could be used, but I'm not sure how.

输入:
这永远不会奏效,他想.他打得不太好,所以他必须多练习.

期望的输出:
NEG_going NEG_to NEG_work 从来都不是,他想.他没有NEG_play NEG_so NEG_well,所以他必须多练习一些.

知道如何解决这个问题吗?

Any idea how to solve this?

推荐答案

为了弥补 Python 的 re 正则表达式引擎缺乏一些 Perl 能力,你可以在 re 中使用 lambda 表达式.sub 函数来创建一个动态替换:

To make up for Python's re regex engine's lack of some Perl abilities, you can use a lambda expression in a re.sub function to create a dynamic replacement:

import re
string = "It was never going to work, he thought. He did not play so well, so he had to practice some more. Not foobar !"
transformed = re.sub(r'\b(?:not|never|no)\b[\w\s]+[^\w\s]', 
       lambda match: re.sub(r'(\s+)(\w+)', r'\1NEG_\2', match.group(0)), 
       string,
       flags=re.IGNORECASE)

将打印(此处演示)

It was never NEG_going NEG_to NEG_work, he thought. He did not NEG_play NEG_so NEG_well, so he had to practice some more. Not NEG_foobar !

<小时>

说明

  • 第一步是选择您感兴趣的字符串部分.这是通过

  • The first step is to select the parts of your string you're interested in. This is done with

\b(?:not|never|no)\b[\w\s]+[^\w\s]

你的否定关键字(\b 是一个词边界,(?:...) 一个非捕获组),后跟字母和空格(\w[0-9a-zA-Z_]\s 是各种空格),直到既不是字母也不是空格的东西(充当标点符号).

Your negative keyword (\b is a word boundary, (?:...) a non capturing group), followed by alpahnum and spaces (\w is [0-9a-zA-Z_], \s is all kind of whitespaces), up until something that's neither an alphanum nor a space (acting as punctuation).

请注意,此处标点符号是强制性的,但您可以安全地删除 [^\w\s] 以匹配字符串的结尾.

Note that the punctuation is mandatory here, but you could safely remove [^\w\s] to match end of string as well.

现在您正在处理 永远不会工作, 类型的字符串.只需选择前面有空格的单词

Now you're dealing with never going to work, kind of strings. Just select the words preceded by spaces with

(\s+)(\w+)

用你想要的替换它们

\1NEG_\2

这篇关于如何为“not"、“no"后面的字符串中的否定词添加标签和“从不"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆