使用关键字对列中的文本进行分类 [英] categorise text in column using keywords

查看:42
本文介绍了使用关键字对列中的文本进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表格列,其中包含为解决问题而进行的处理的描述,此文本为contian关键字.

I have a table column, that contain description of the treatment done to resolve an issue, this text contian keywords.

在其他列表中,我具有类别列表,其中包含有助于识别它的不同关键字.

In other list, I have the list of categories, with the different keywords that helps to identify it.

例如:

AAAA |关键字1

AAAA | keyword1

AAAA |keyword2和keyword3

AAAA | keyword2 and keyword3

AAAA |keyword3而不是keyword4

AAAA | keyword3 and not keyword4

BBBB |关键字4

BBBB | keyword4

BBBB |keyword5和keyword6

BBBB | keyword5 and keyword6

BBBB |关键字7

BBBB | keyword7

如何使用其中的关键字填充上一张表中的类别"列(包含说明).

how can fill the category column in my previous table (that contain the description), using the keywords in it.

例如:

     Description                  |  category


此自由文本关键字1已完成|AAAA


this free text keyword1 is done | AAAA

免费的sample2 keyword4 keyword3 |BBBB

free sample2 keyword4 keyword3 | BBBB

我使用的语言是python

the language I'm using is python,

我发现了类似的情况,但是使用Excel: https://exceljet.net/formula/categorize-text-with-keywords

I found a similar case, but using Excel: https://exceljet.net/formula/categorize-text-with-keywords

注意事项

推荐答案

我首先创建一个元组列表,其中第一个元素是类别,第二个是字典,其中包含应从中包括/排除的关键字列表说明.例如

I would start by creating a list of tuples where the first element is the category and the second is a dictionary with list of keywords that should be included/excluded from the description. For example

keyword_tuple = [('AAAA', {'in': ['kwrd1'], 'out':[]}), 
                 ('AAAA', {'in': ['kwrd2', 'kwrd3'], 'out': []),
                 ('AAAA', {'in': ['kwrd3'], 'out': ['kwrd4']}), 
                 ('BBBB', {'in': ['kwrd4'], 'out': [])]

正确初始化 keyword_tuple 后,您可以遍历描述列表以确定它们属于哪个类别.让我们将结果存储在名为 result_tuple 的元组列表中,其中第一个元素是描述,第二个元素是对应的类别.

After you have initialized correctly your keyword_tuple you can loop through your descriptions list to determine to which category they belong. Let's store the results in a list of tuples called result_tuple where the first element is the description and the second the corresponding category.

result_tuple = []

for description in description_list:
    # Find categories that satisfy the include condition
    categories_in = [cat[0] for cat in keyword_tuple if all([kw in description for kw in cat[1]['in']])]
    # Find categories that satisfy the exclude condition
    categories_out = [cat[0] for cat in keyword_tuple if all([kw not in description for kw in cat[1]['out']])]

    # Find the categories that satisfy both 
    # If there are multiple categories satisfying the condition, you need to come with a decision rule
    categories = list(set(categories_in).intersection(categories_out))

    # Append to the result list (Takes the first that is satisfied)
    if len(categories) > 0:
        category = categories[0]
    else:
        category = 'NO CATEGORY'

    result_tuple.append(description, category)

这篇关于使用关键字对列中的文本进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆