如何在包含现有行字符串中的单词的pandas数据框中创建新行? [英] How can I create new rows in a pandas data frame containing the words in a string of an existing row?

查看:72
本文介绍了如何在包含现有行字符串中的单词的pandas数据框中创建新行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas中有一个DataFrame,其中一列称为df.strings,带有文本字符串.我想将这些字符串的各个单词放在其自己的行上,而其他列的值相同.例如,如果我有3个字符串(还有一个不相关的列,时间):

I have a DataFrame in pandas with a column called df.strings with strings of text. I would like to get the individual words of those strings on their own rows with identical values for the other columns. For example if I have 3 strings (and an unrelated column, Time):

    Strings Time
0   The dog  4Pm
1  lazy dog  2Pm
2   The fox  1Pm

我想要包含字符串中单词的新行,但具有相同的列

I want new rows containing the words from the string, but with otherwise identical columns

Strings   --- Words ---Time  
"The dog" --- "The" --- 4Pm  
"The dog" --- "dog" --- 4Pm  
"lazy dog"--- "lazy"--- 2Pm  
"lazy dog"--- "dog" --- 2Pm  
"The fox" --- "The" --- 1Pm  
"The fox" --- "fox" --- 1Pm

我知道如何从字符串中分割单词:

I know how to split the words up from the strings:

   string_list  = '\n'.join(df.Strings.map(str))
   word_list = re.findall('[a-z]+', Strings)

但是如何在保留索引&的同时将它们放入数据帧中?其他变量?我正在使用Python 2.7和Pandas 0.10.1.

But how can I get these into the dataframe while preserving the index & other variables? I'm using Python 2.7 and pandas 0.10.1.

我现在了解了如何使用在此中找到的groupby来扩展行问题:

I now understand how to expand rows using groupby found in this question:

def f(group):
    row = group.irow(0)
    return DataFrame({'words':  re.findall('[a-z]+',row['Strings'])})
df.groupby('class', group_keys=False).apply(f)

我仍然想保留其他列.这可能吗?

I would still like to preserve the other columns. Is this possible?

推荐答案

这是我的不使用groupby()的代码,我认为它会更快.

Here is my code that doesn't use groupby(), I think it's faster.

import pandas as pd
import numpy as np
import itertools

df = pd.DataFrame({
"strings":["the dog", "lazy dog", "The fox jump"], 
"value":["a","b","c"]})

w = df.strings.str.split()
c = w.map(len)
idx = np.repeat(c.index, c.values)
#words = np.concatenate(w.values)
words = list(itertools.chain.from_iterable(w.values))
s = pd.Series(words, index=idx)
s.name = "words"
print df.join(s)

结果:

        strings value words
0       the dog     a   the
0       the dog     a   dog
1      lazy dog     b  lazy
1      lazy dog     b   dog
2  The fox jump     c   The
2  The fox jump     c   fox
2  The fox jump     c  jump

这篇关于如何在包含现有行字符串中的单词的pandas数据框中创建新行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆