如何将数据框转换成一系列列表? [英] How do I turn a dataframe into a series of lists?

查看:636
本文介绍了如何将数据框转换成一系列列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不得不这么做,我总是很沮丧。我有一个数据框:

  df = pd.DataFrame([[1,2,3,4],[5,6 ,7,8]],['a','b'],['A','B','C','D'])

print df

ABCD
a 1 2 3 4
b 5 6 7 8

I想要将 df 转换成:

  pd.Series([[1 ,2,3,4],[5,6,7,8]],['a','b'])

a [1,2,3,4]
b [5,6,7,8]
dtype:object

/ p>

  df.apply(list,axis = 1)

哪些只是让我回来一样 df



什么是

解决方案

您可以先转换 DataFrame to numpy array by ,然后转换为列表,最后创建新的系列,索引从 df 如果需要更快的解决方案:

  print(pd.Series(df.values.tolist(),index = df.index))
a [1,2,3,4]
b [5,6, 7,8]
dtype:object

小DataFrame的时间:

 在[76]中:%timeit(pd.Series(df.values.tolist(),index = df.index))
1000循环,最佳3:295μs每循环

在[77]中:%timeit pd.Series(df.T.to_dict('list'))
1000循环,最好的3:每循环685μs

在[78]中:%timeit df.T.apply(tuple).apply(list)
1000循环,最好为3:958μs每循环

和大:

  from string import ascii_letters 
letters = list(ascii_letters)
df = pd.DataFrame(np.random.choice(range(10),(52 ** 2,52)) ,
pd.MultiIndex.from_product([letter,letters]),
letters)

在[71]中:%timeit(pd.Series(df.values.tol ist(),index = df.index))
100循环,最好3:2.06 ms每循环

在[72]中:%timeit pd.Series(df.T.to_dict ('list'))
1循环,最好3:203 ms每循环

在[73]:%timeit df.T.apply(tuple).apply(list)
1循环,最佳3:506 ms每循环


I have had to do this several times and I'm always frustrated. I have a dataframe:

df = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'], ['A', 'B', 'C', 'D'])

print df

   A  B  C  D
a  1  2  3  4
b  5  6  7  8

I want to turn df into:

pd.Series([[1, 2, 3, 4], [5, 6, 7, 8]], ['a', 'b'])

a    [1, 2, 3, 4]
b    [5, 6, 7, 8]
dtype: object

I've tried

df.apply(list, axis=1)

Which just gets me back the same df

What is a convenient/effective way to do this?

解决方案

You can first convert DataFrame to numpy array by values, then convert to list and last create new Series with index from df if need faster solution:

print (pd.Series(df.values.tolist(), index=df.index))
a    [1, 2, 3, 4]
b    [5, 6, 7, 8]
dtype: object

Timings with small DataFrame:

In [76]: %timeit (pd.Series(df.values.tolist(), index=df.index))
1000 loops, best of 3: 295 µs per loop

In [77]: %timeit pd.Series(df.T.to_dict('list'))
1000 loops, best of 3: 685 µs per loop

In [78]: %timeit df.T.apply(tuple).apply(list)
1000 loops, best of 3: 958 µs per loop

and with large:

from string import ascii_letters
letters = list(ascii_letters)
df = pd.DataFrame(np.random.choice(range(10), (52 ** 2, 52)),
                  pd.MultiIndex.from_product([letters, letters]),
                  letters)

In [71]: %timeit (pd.Series(df.values.tolist(), index=df.index))
100 loops, best of 3: 2.06 ms per loop

In [72]: %timeit pd.Series(df.T.to_dict('list'))
1 loop, best of 3: 203 ms per loop

In [73]: %timeit df.T.apply(tuple).apply(list)
1 loop, best of 3: 506 ms per loop

这篇关于如何将数据框转换成一系列列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆