从(行,列,值)数据创建Pandas DataFrame [英] Create Pandas DataFrame from (row, column, value) data
问题描述
我有一个Pandas数据框,其中包含三列:行,列,值.行值都是小于N
的所有整数,列值都是小于M
的所有整数.这些值都是正整数.
I have a Pandas Dataframe with three columns: row, column, value. The row values are all integers below some N
, and the column values are all integers below some M
. The values are all positive integers.
如何有效地创建具有N
行和M
列的数据框,如果(i, j , val)
是我的原始数据框中的行,则索引i, j
的值为val
,并且默认值为(0
)否则?此外,由于数据已经很大,但是N*M
仍然是我的数据大小的10倍,是否可以立即创建 parse 数据框?
How do I efficiently create a Dataframe with N
rows and M
columns, with at index i, j
the value val
if (i, j , val)
is a row in my original Dataframe, and some default value (0
) otherwise? Furthermore, is it possible to create a sparse Dataframe immediately, since the data is already quite large, but N*M
is still about 10 times the size of my data?
推荐答案
NumPy解决方案适合此处的性能-
A NumPy solution would suit here for performance -
a = df.values
m,n = a[:,:2].max(0)+1
out = np.zeros((m,n),dtype=a.dtype)
out[a[:,0], a[:,1]] = a[:,2]
df_out = pd.DataFrame(out)
样品运行-
In [58]: df
Out[58]:
row col val
0 7 1 30
1 3 3 0
2 4 8 30
3 5 8 18
4 1 3 6
5 1 6 48
6 0 2 6
7 4 7 6
8 5 0 48
9 8 1 48
10 3 2 12
11 6 8 18
In [59]: df_out
Out[59]:
0 1 2 3 4 5 6 7 8
0 0 0 6 0 0 0 0 0 0
1 0 0 0 6 0 0 48 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 12 0 0 0 0 0 0
4 0 0 0 0 0 0 0 6 30
5 48 0 0 0 0 0 0 0 18
6 0 0 0 0 0 0 0 0 18
7 0 30 0 0 0 0 0 0 0
8 0 48 0 0 0 0 0 0 0
这篇关于从(行,列,值)数据创建Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!