Python:用单词列表替换句子中的一个单词,然后将新句子放在pandas的另一列中 [英] Python: Replace one word in a sentence with a list of words and put thenew sentences in another column in pandas

查看:359
本文介绍了Python:用单词列表替换句子中的一个单词,然后将新句子放在pandas的另一列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,其中某些句子包含单词'o'clock',我想用我拥有的小时数列表替换之前提到的时间,然后将新句子放在另一列中,如下所示:

I have a dataframe in which SOME sentences contain the word 'o'clock' and I want to replace the time mentioned before it with the list of hours I have and put the new sentences in another column, like in the following:

data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']

我想看到的是一个额外的列,其中包含以下类似的新句子,其中时间更改为列表中的所有时间:

what I would like to see is an extra column, with the new sentences like in the following, in which the time is changed to all the times in the list:

输出:

     sentences                            new_sentences
0    I have a class at ten o'clock        I have a class at two o'clock, I have a class at three o'clock,...
1    she is my friend                     she is my friend
2    she goes to school at eight o'clock  she goes to school at two o'clock,....

new_sentences列中重复

很好.我尝试使用np.where:

repetition in the new_sentences column is fine. I have tried to use np.where:

np.where(data.str.contains('o\'clock', regex=False, case=False, na=False), data["sentence"].replace()... )

但是我不知道如何在'o'clock之前替换这个词

but I do not know how to replace the word before 'o'clock

先谢谢您

推荐答案

使用:

# STEP 1
df1 = data['sentences'].str.extract(
    r"(?i)(?P<before>.*)\s(?P<clock>\w+(?=\so'clock))\s(?P<after>.*)")

# STEP 2
df1['clock'] = df1['clock'].str.replace(
    r'\w+', ','.join(my_list)).str.split(',')

# STEP 3
data['new_sentences'] = df1.dropna().explode('clock').agg(
    ' '.join, 1).groupby(level=0).agg(', '.join)

# STEP 4
data['new_sentences'] = data['new_sentences'].fillna(data['sentences'])

说明/步骤:

步骤1:使用 Series.str.extract 和给定的正则表达式模式一起创建一个三列数据帧,其中第一个col对应于时钟e.g. 10之前的句子,中间列对应于时钟本身,右列对应于时钟之后的句子

STEP 1: Use Series.str.extract along with the given regex pattern to create a three column dataframe where the first col corresponds to the sentence before the clock e.g. 10, the middle column corresponds to clock itself and right column corresponds to the sentence after the clock.

# df1
                  before  clock    after
0      I have a class at    ten  o'clock
1                    NaN    NaN      NaN
2  she goes to school at  eight  o'clock

步骤2:使用 Series.str.replace ,用my_list中的所有项目替换clock列中的令牌.然后使用Series.str.split在定界符,周围拆分替换的标记.

STEP 2: Use Series.str.replace to replace the tokens in the clock column with all the items in my_list. Then use Series.str.split to split the replaced tokens around the delimiter ,.

# df1
                  before                    clock    after
0      I have a class at  [two, three, five, ten]  o'clock
1                    NaN                      NaN      NaN
2  she goes to school at  [two, three, five, ten]  o'clock

步骤3: Dataframe.explode 展开clock列周围的数据框df1,并使用.agg沿轴1连接这些列.然后在级别0上使用groupby进一步聚合此datframe.

STEP 3: Dataframe.explode to explode the dataframe df1 around column clock the use the .agg to join the columns along axis 1. Then use groupby on level 0 to agg this datframe further.

# data
                             sentences                                      new_sentences
0        I have a class at ten o'clock  I have a class at two o'clock, I have a class ...
1                     she is my friend                                                NaN
2  she goes to school at eight o'clock  she goes to school at two o'clock, she goes to...

第4步:最后使用 来填充相应的sentences列的new_sentences列中的缺失值.

STEP 4: Finally use Series.fillna to fill the missing values in the new_sentences column from the corresponding sentences column.

# data
                             sentences                                      new_sentences
0        I have a class at ten o'clock  I have a class at two o'clock, I have a class ...
1                     she is my friend                                   she is my friend
2  she goes to school at eight o'clock  she goes to school at two o'clock, she goes to...

这篇关于Python:用单词列表替换句子中的一个单词,然后将新句子放在pandas的另一列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆