如何在一系列列中拆分列表? [英] How to split lists over a range of column?

查看:110
本文介绍了如何在一系列列中拆分列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含几列的数据框,其中包含一个列表.我想将此列表拆分为不同的列.我目前在stackoverflow中发现了问题,那只是将列表拆分成1列,我想将其应用于列表中对象数量不相等的多列.

I have a dataframe with several columns that contains a list inside. I want to split this list to different columns. I currently found this question here in stackoverflow, but it seem that it is only splitting the list inside 1 column, which I want to apply to multiple columns containing unequal number of objects in the list.

我的df看起来像这样:

My df looks something like this:

     ID |  value_0  |  value_1  |  value_2  | value_3   | value_4
0   1001|[1001,1002]|   None    |   None    |   None    |  None 
1   1010|[1010,2001]|[2526,1000]|   None    |   None    |  None  
2   1100|[1234,5678]|[9101,1121]|[3141,5161]|[1718,1920]|[2122,2324]

我想将其转换为:

     ID | 0  | 1  |  2   |  3   | 4
0   1001|1001|1002| None | None | None 
1   1010|1010|2001| 2526 | 1000 | None  
2   1100|1234|5678| 9101 | 1121 | 3141 ....etc.

当前这是我的代码,但是它仅输出包含"None"值的数据帧.我不确定如何解决它,因为它似乎只获得最后一列,而没有真正拆分列表.

Currently this is my code but it only outputs a dataframe containing "None" value. I'm not sure how to fix it cause it seem that it is only getting the last column and not really splitting the list.

length = len(list(df.columns.values))-1

for i in range(length):
    temp = "value_" + str(i)
    x = df[temp]
    new_df = pd.DataFrame(df[temp].values.tolist())

我得到的new_df结果是:

The result the new_df that I got is:

   | 0
  0| None
  1| None
  2| [2122,2324]

但是,如果我只关注1列(即value_0),则可以很好地拆分列表.

However if I just focus of only 1 column (ie. value_0) it splits the list just fine.

new_df = pd.DataFrame(df['value_0'].values.tolist())

非常感谢您的帮助

推荐答案

想法通过

Idea is reshape values by DataFrame.stack for remove None values, so possible use DataFrame constructor, then reshape back by Series.unstack, sorting column and set default columns names:

import ast
#if strings in columns instead lists
#df.iloc[:, 1:] = df.iloc[:, 1:].applymap(ast.literal_eval)

s = df.set_index('ID', append=True).stack()

df = pd.DataFrame(s.values.tolist(), index=s.index).unstack().sort_index(axis=1, level=1)
df.columns = np.arange(len(df.columns))

df = df.reset_index(level=1)
print (df)
     ID       0       1       2       3       4       5       6       7  \
0  1001  1001.0  1002.0     NaN     NaN     NaN     NaN     NaN     NaN   
1  1010  1010.0  2001.0  2526.0  1000.0     NaN     NaN     NaN     NaN   
2  1100  1234.0  5678.0  9101.0  1121.0  3141.0  5161.0  1718.0  1920.0   

        8       9  
0     NaN     NaN  
1     NaN     NaN  
2  2122.0  2324.0  

0.24+的大熊猫缺失值的解决方案:

Solution for pandas 0.24+ for missing values with integers:

df = df.astype('Int64').reset_index(level=1)
print (df)
     ID     0     1     2     3     4     5     6     7     8     9
0  1001  1001  1002   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
1  1010  1010  2001  2526  1000   NaN   NaN   NaN   NaN   NaN   NaN
2  1100  1234  5678  9101  1121  3141  5161  1718  1920  2122  2324

这篇关于如何在一系列列中拆分列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆