从Pandas数据框单元格中将设置值拆分为多行 [英] Split set values from Pandas dataframe cell over multiple rows

查看:101
本文介绍了从Pandas数据框单元格中将设置值拆分为多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的pandas DataFrame:

I have a pandas DataFrame in the following form:

    col1           col2
1    a       {hu, fdf, ko, dss}
2    b       {sdsjdn, lk}
3    c       {sds, aldj, dhva}

现在,我想将设置值分成多行,使其看起来像这样:

Now I want to split the set values over multiple rows to make it look like this:

    col1           col2
1    a              hu
2    a              fdf
3    a              ko
4    a              dss
5    b              sdsjdn
6    b              lk
7    c              sds
8    c              aldj
9    c              dhva

任何人都知道我该怎么做?

Anyone has any insights how I can do this?

推荐答案

您需要 numpy.repeat ,用于创建新的重复列,并通过chain.from_iterable将另一组列变平:

You need numpy.repeat for create new duplicated column with flattening of another set column by chain.from_iterable:

df = pd.DataFrame({ 'col1': ['a','b','c'],
                   'col2': [set({'hu', 'fdf', 'ko', 'dss'}),
                            set({'sdsjdn', 'lk'}),
                            set({'sds', 'aldj', 'dhva'})]})

print(df)
  col1                col2
0    a  {hu, dss, ko, fdf}
1    b        {lk, sdsjdn}
2    c   {dhva, aldj, sds}

from  itertools import chain

df1 = pd.DataFrame({
        "col1": np.repeat(df.col1.values, df.col2.str.len()),
        "col2": list(chain.from_iterable(df.col2))})

print (df1)
  col1    col2
0    a      hu
1    a     dss
2    a      ko
3    a     fdf
4    b      lk
5    b  sdsjdn
6    c    dhva
7    c    aldj
8    c     sds

这篇关于从Pandas数据框单元格中将设置值拆分为多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆