在单元格中分割文本并为令牌创建其他行 [英] Split text in cells and create additional rows for the tokens

查看:71
本文介绍了在单元格中分割文本并为令牌创建其他行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设我在pandasDataFrame中具有以下内容:

Let's suppose that I have the following in a DataFrame in pandas:

id  text
1   I am the first document and I am very happy.
2   Here is the second document and it likes playing tennis.
3   This is the third document and it looks very good today.

我想将每个id的文本分成3个单词的标记,因此我最终想要拥有以下内容:

and I want to split the text of each id in tokens of 3 words so I finally want to have the following:

id  text
1   I am the
1   first document and
1   I am very
1   happy
2   Here is the
2   second document and
2   it likes playing
2   tennis
3   This is the
3   third document and
3   it looks very
3   good today

请记住,我的数据框可能还具有其他列,除了这两列外,其他列应以与上述id相同的方式简单地复制到新数据框中.

Keep in mind that my dataframe may also have other columns except for these two which should be simply copied at the new dataframe in the same way as id above.

最有效的方法是什么?

我认为我的问题的解决方案与此处给出的解决方案非常接近:

I reckon that the solution to my question is quite close to the solution given here: Tokenise text and create more rows for each row in dataframe.

这可能也有帮助: Python:每n拆分一次字符串较小的字符串中的单词.

推荐答案

一个自包含的解决方案,可能会稍微慢一些:

A self contained solution, maybe a little slower:

# Split every n words
n = 3

# incase id is not index yet
df.set_index('id', inplace=True)

new_df = df.text.str.split(' ', expand=True).stack().reset_index()

new_df = (new_df.groupby(['id', new_df.level_1//n])[0]
                .apply(lambda x: ' '.join(x))
                .reset_index(level=1, drop=True)
         )

new_df是系列:

id
1               I am the
1     first document and
1              I am very
1                 happy.
2            Here is the
2    second document and
2       it likes playing
2                tennis.
3            This is the
3     third document and
3          it looks very
3            good today.
Name: 0, dtype: object

这篇关于在单元格中分割文本并为令牌创建其他行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆