从 Numpy 3d 数组有效地创建 Pandas DataFrame [英] Efficiently Creating A Pandas DataFrame From A Numpy 3d array

查看:27
本文介绍了从 Numpy 3d 数组有效地创建 Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们从

开始

将 numpy 导入为 npa = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

如何将其高效地制作成相当于

的pandas DataFrame

将pandas导入为pd>>>pd.DataFrame({'a': [0, 0, 1, 1], 'b': [1, 3, 5, 7], 'c': [2, 4, 6, 8]})a b c0 0 1 21 0 3 42 1 5 63 1 7 8

想法是让 a 列在原始数组的第一维中具有索引,其余列是后两个维度中二维数组的垂直串联原始数组.

(用循环很容易做到这一点;问题是没有它们怎么办.)

<小时>

更长的例子

使用@Divakar 的绝妙建议:

<预><代码>>>>np.random.randint(0,9,(4,3,2))数组([[[0, 6],[6, 4],[3, 4]],[[5, 1],[1, 3],[6, 4]],[[8, 0],[2, 3],[3, 1]],[[2, 2],[0, 0],[6, 3]]])

应该是这样的:

<预><代码>>>>pd.DataFrame({'a': [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],'b': [0, 6, 3, 5, 1, 6, 8, 2, 3, 2, 0, 6],'c': [6, 4, 4, 1, 3, 4, 0, 3, 1, 2, 0, 3]})a b c0 0 0 61 0 6 42 0 3 43 1 5 14 1 1 35 1 6 46 2 8 07 2 2 38 2 3 19 3 2 210 3 0 011 3 6 3

解决方案

这里有一种方法可以在 NumPy 上完成大部分处理,然后最终将其作为 DataFrame 推出,就像这样 -

m,n,r = a.shapeout_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))out_df = pd.DataFrame(out_arr)

如果您确切知道列数为 2,那么我们将 bc 作为最后两列和 a 作为第一个,你可以像这样添加列名 -

out_df = pd.DataFrame(out_arr,columns=['a', 'b', 'c'])

样品运行 -

<预><代码>>>>一个数组([[[2, 0],[1, 7],[3, 8]],[[5, 0],[0, 7],[8, 0]],[[2, 5],[8, 2],[1, 2]],[[5, 3],[1, 6],[3, 2]]])>>>out_dfa b c0 0 2 01 0 1 72 0 3 83 1 5 04 1 0 75 1 8 06 2 2 57 2 8 28 2 1 29 3 5 310 3 1 611 3 3 2

Suppose we start with

import numpy as np
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

How can this be efficiently be made into a pandas DataFrame equivalent to

import pandas as pd
>>> pd.DataFrame({'a': [0, 0, 1, 1], 'b': [1, 3, 5, 7], 'c': [2, 4, 6, 8]})

   a  b  c
0  0  1  2
1  0  3  4
2  1  5  6
3  1  7  8

The idea is to have the a column have the index in the first dimension in the original array, and the rest of the columns be a vertical concatenation of the 2d arrays in the latter two dimensions in the original array.

(This is easy to do with loops; the question is how to do it without them.)


Longer Example

Using @Divakar's excellent suggestion:

>>> np.random.randint(0,9,(4,3,2))
array([[[0, 6],
    [6, 4],
    [3, 4]],

   [[5, 1],
    [1, 3],
    [6, 4]],

   [[8, 0],
    [2, 3],
    [3, 1]],

   [[2, 2],
    [0, 0],
    [6, 3]]])

Should be made to something like:

>>> pd.DataFrame({
    'a': [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], 
    'b': [0, 6, 3, 5, 1, 6, 8, 2, 3, 2, 0, 6], 
    'c': [6, 4, 4, 1, 3, 4, 0, 3, 1, 2, 0, 3]})
    a  b  c
0   0  0  6
1   0  6  4
2   0  3  4
3   1  5  1
4   1  1  3
5   1  6  4
6   2  8  0
7   2  2  3
8   2  3  1
9   3  2  2
10  3  0  0
11  3  6  3

解决方案

Here's one approach that does most of the processing on NumPy before finally putting it out as a DataFrame, like so -

m,n,r = a.shape
out_arr = np.column_stack((np.repeat(np.arange(m),n),a.reshape(m*n,-1)))
out_df = pd.DataFrame(out_arr)

If you precisely know that the number of columns would be 2, such that we would have b and c as the last two columns and a as the first one, you can add column names like so -

out_df = pd.DataFrame(out_arr,columns=['a', 'b', 'c'])

Sample run -

>>> a
array([[[2, 0],
        [1, 7],
        [3, 8]],

       [[5, 0],
        [0, 7],
        [8, 0]],

       [[2, 5],
        [8, 2],
        [1, 2]],

       [[5, 3],
        [1, 6],
        [3, 2]]])
>>> out_df
    a  b  c
0   0  2  0
1   0  1  7
2   0  3  8
3   1  5  0
4   1  0  7
5   1  8  0
6   2  2  5
7   2  8  2
8   2  1  2
9   3  5  3
10  3  1  6
11  3  3  2

这篇关于从 Numpy 3d 数组有效地创建 Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆