Python:用单词列表替换句子中的一个单词,然后将新句子放在pandas的另一列中 [英] Python: Replace one word in a sentence with a list of words and put thenew sentences in another column in pandas
问题描述
我有一个数据帧,其中某些句子包含单词'o'clock'
,我想用我拥有的小时数列表替换之前提到的时间,然后将新句子放在另一列中,如下所示:>
I have a dataframe in which SOME sentences contain the word 'o'clock'
and I want to replace the time mentioned before it with the list of hours I have and put the new sentences in another column, like in the following:
data= {"sentences":["I have a class at ten o'clock", "she is my friend", "she goes to school at eight o'clock"]}
my_list=['two', 'three','five','ten']
我想看到的是一个额外的列,其中包含以下类似的新句子,其中时间更改为列表中的所有时间:
what I would like to see is an extra column, with the new sentences like in the following, in which the time is changed to all the times in the list:
输出:
sentences new_sentences
0 I have a class at ten o'clock I have a class at two o'clock, I have a class at three o'clock,...
1 she is my friend she is my friend
2 she goes to school at eight o'clock she goes to school at two o'clock,....
在new_sentences
列中重复
很好.我尝试使用np.where:
repetition in the new_sentences
column is fine. I have tried to use np.where:
np.where(data.str.contains('o\'clock', regex=False, case=False, na=False), data["sentence"].replace()... )
但是我不知道如何在'o'clock之前替换这个词
but I do not know how to replace the word before 'o'clock
先谢谢您
推荐答案
使用:
# STEP 1
df1 = data['sentences'].str.extract(
r"(?i)(?P<before>.*)\s(?P<clock>\w+(?=\so'clock))\s(?P<after>.*)")
# STEP 2
df1['clock'] = df1['clock'].str.replace(
r'\w+', ','.join(my_list)).str.split(',')
# STEP 3
data['new_sentences'] = df1.dropna().explode('clock').agg(
' '.join, 1).groupby(level=0).agg(', '.join)
# STEP 4
data['new_sentences'] = data['new_sentences'].fillna(data['sentences'])
说明/步骤:
步骤1:使用 Series.str.extract
和给定的正则表达式模式一起创建一个三列数据帧,其中第一个col对应于时钟e.g. 10
之前的句子,中间列对应于时钟本身,右列对应于时钟之后的句子
STEP 1: Use Series.str.extract
along with the given regex pattern to create a three column dataframe where the first col corresponds to the sentence before the clock e.g. 10
, the middle column corresponds to clock itself and right column corresponds to the sentence after the clock.
# df1
before clock after
0 I have a class at ten o'clock
1 NaN NaN NaN
2 she goes to school at eight o'clock
步骤2:使用 Series.str.replace
,用my_list
中的所有项目替换clock列中的令牌.然后使用Series.str.split
在定界符,
周围拆分替换的标记.
STEP 2: Use Series.str.replace
to replace the tokens in the clock column with all the items in my_list
. Then use Series.str.split
to split the replaced tokens around the delimiter ,
.
# df1
before clock after
0 I have a class at [two, three, five, ten] o'clock
1 NaN NaN NaN
2 she goes to school at [two, three, five, ten] o'clock
步骤3: Dataframe.explode
展开clock
列周围的数据框df1,并使用.agg
沿轴1连接这些列.然后在级别0上使用groupby进一步聚合此datframe.
STEP 3: Dataframe.explode
to explode the dataframe df1 around column clock
the use the .agg
to join the columns along axis 1. Then use groupby on level 0 to agg this datframe further.
# data
sentences new_sentences
0 I have a class at ten o'clock I have a class at two o'clock, I have a class ...
1 she is my friend NaN
2 she goes to school at eight o'clock she goes to school at two o'clock, she goes to...
第4步:最后使用 sentences
列的new_sentences
列中的缺失值.
STEP 4: Finally use Series.fillna
to fill the missing values in the new_sentences
column from the corresponding sentences
column.
# data
sentences new_sentences
0 I have a class at ten o'clock I have a class at two o'clock, I have a class ...
1 she is my friend she is my friend
2 she goes to school at eight o'clock she goes to school at two o'clock, she goes to...
这篇关于Python:用单词列表替换句子中的一个单词,然后将新句子放在pandas的另一列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!