嵌套抢夺关键字/后继单词/前单词功能 [英] Unnest grab keywords/nextwords/beforewords function

查看:61
本文介绍了嵌套抢夺关键字/后继单词/前单词功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我具有以下代码来创建df:

I have the following code to create a df:

import pandas as pd
word_list = ['crayons', 'cars', 'camels']
l = ['there are many different crayons in the bright blue box and crayons of all different colors',
     'i like a lot of sports cars because they go really fast'
    'the middle east has many camels to ride and have fun',
    'all camels are fun']
df = pd.DataFrame(l, columns=['Text'])

df看起来像这样

    Text
0   there are many different crayons in the bright blue box and crayons of all different colors
1   i like a lot of sports cars because they go really fastthe middle east has many camels to ride and have fun
2   all camels are fun

以下代码工作并创建一个函数,该函数可捕获trigger单词以及trigger单词之前(beforewords)和之后(nextwords)的单词

The following code works and creates a function that grabs the trigger words, along with words that come before (beforewords) and after (nextwords) the trigger words

def find_words(row, word_list):

    sentence = row[0]

    #make empty lists
    trigger = []
    next_words = []
    before_words = []

    for keyword in word_list:
        #split words
        words = str(sentence).split()

        for index in range(0, len(words) - 1):

            # get keyword we want
            if words[index] == keyword:

                # get words after keyword and add to empty list
                next_words.append(words[index + 1:index + 3])

                # get words before keyword and add to empty list
                before_words.append(words[max(index - 3, 0):max(index - 1, 0)])

                # append
                trigger.append(keyword)

    return pd.Series([trigger,  before_words, next_words], index = ['Trigger', 'BeforeWords','NextWords'])

# glue together
df= df.join(df.apply(lambda x: find_words(x, word_list), axis=1))

输出

    Text         Trigger                  BeforeWords             NextWords
0   there ...    [crayons, crayons] [[are, many],[blue, box]] [[in, the],[of, all]]
1   i like ...   [cars, camels]     [[lot, of], [east, has]] [[because, they], [to, ride]]
2   all camels... [camels]             [[]]                  [[are, fun]]

问题

但是,我想或者 1)取消堆栈2)取消列出 OR 使用另一种/更好的方法来获取以下内容

However, I would like to either 1) unstack 2) unlist OR use another/better way to get the following

所需的输出

Text             Trigger        BeforeWords     NextWords
0   there ...    crayons        are many        in the 
1   there ...    crayons        blue box        of all
2   i like ...   cars           lot of          because they
3   i like ...   camels         east has        to ride
4   all camels...camels                         are fun  

问题

如何调整find_words函数以实现所需的输出?

How do I tweak my find_words function to achieve the desired output?

推荐答案

看起来像是嵌套,所以我们可以使用

Look like unnesting , so we can using

s=df.set_index(['Text']).stack()
s=pd.DataFrame(s.tolist(),index=s.index).stack()
s.apply(lambda x : ' '.join(x) if type(x)==list else x).unstack(1).reset_index(level=0)
                                                Text      ...          NextWords
0  there are many different crayons in the bright...      ...             in the
1  there are many different crayons in the bright...      ...             of all
0  i like a lot of sports cars because they go re...      ...       because they
1  i like a lot of sports cars because they go re...      ...            to ride
0                                 all camels are fun      ...            are fun
[5 rows x 4 columns]

这篇关于嵌套抢夺关键字/后继单词/前单词功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆