如何将 Dataframe 单元格内的列表分解为单独的行 [英] How to explode a list inside a Dataframe cell into separate rows
问题描述
我希望将包含列表的 Pandas 单元格转换为每个值的行.
所以,拿这个:
如果我想解压并堆叠 nearest_neighbors
列中的值,以便每个值都是每个 opponent
索引中的一行,我最好怎么做对这个?是否有用于此类操作的 Pandas 方法?
在下面的代码中,我首先重置了索引以使行迭代更容易.
我创建了一个列表列表,其中外部列表的每个元素都是目标 DataFrame 的一行,而内部列表的每个元素都是其中的一列.这个嵌套列表最终将被连接起来以创建所需的 DataFrame.
我使用 lambda
函数和列表迭代来为 nearest_neighbors
的每个元素创建一行,并与相关的 name
和 <代码>对手代码>.
最后,我从这个列表中创建了一个新的 DataFrame(使用原始列名并将索引设置回 name
和 opponent
).
df = (pd.DataFrame({'name': ['A.J. Price'] * 3,'对手': ['76ers', 'blazers', 'bobcats'],'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3}).set_index(['姓名', '对手']))>>>df最近的邻居命名对手A.J.Price 76ers [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]开拓者 [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]山猫 [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]df.reset_index(就地=真)行 = []_ = df.apply(lambda 行: [rows.append([row['name'], row['opponent'], nn])对于 row.nearest_neighbors] 中的 nn,轴 = 1)df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])>>>df_new最近的邻居命名对手A.J.价格 76 人队扎克·拉文76人林书豪76人内特罗宾逊76人伊萨亚西装外套扎克·拉文西装外套林书豪西装外套内特罗宾逊西装外套 Isaia山猫扎克·拉文山猫林书豪山猫内特罗宾逊山猫伊萨亚
编辑 2017 年 6 月
另一种方法如下:
<预><代码>>>>(pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(),id_vars=['姓名', '对手'],value_name='nearest_neighbors').set_index(['姓名', '对手']).drop('变量',axis=1).dropna().sort_index())I'm looking to turn a pandas cell containing a list into rows for each of those values.
So, take this:
If I'd like to unpack and stack the values in the nearest_neighbors
column so that each value would be a row within each opponent
index, how would I best go about this? Are there pandas methods that are meant for operations like this?
In the code below, I first reset the index to make the row iteration easier.
I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.
I use a lambda
function together with list iteration to create a row for each element of the nearest_neighbors
paired with the relevant name
and opponent
.
Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name
and opponent
).
df = (pd.DataFrame({'name': ['A.J. Price'] * 3,
'opponent': ['76ers', 'blazers', 'bobcats'],
'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
.set_index(['name', 'opponent']))
>>> df
nearest_neighbors
name opponent
A.J. Price 76ers [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
blazers [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
bobcats [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
df.reset_index(inplace=True)
rows = []
_ = df.apply(lambda row: [rows.append([row['name'], row['opponent'], nn])
for nn in row.nearest_neighbors], axis=1)
df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])
>>> df_new
nearest_neighbors
name opponent
A.J. Price 76ers Zach LaVine
76ers Jeremy Lin
76ers Nate Robinson
76ers Isaia
blazers Zach LaVine
blazers Jeremy Lin
blazers Nate Robinson
blazers Isaia
bobcats Zach LaVine
bobcats Jeremy Lin
bobcats Nate Robinson
bobcats Isaia
EDIT JUNE 2017
An alternative method is as follows:
>>> (pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(),
id_vars=['name', 'opponent'],
value_name='nearest_neighbors')
.set_index(['name', 'opponent'])
.drop('variable', axis=1)
.dropna()
.sort_index()
)
这篇关于如何将 Dataframe 单元格内的列表分解为单独的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!