将列表转换为数据框时如何优化时间? [英] How to optimize time while converting list to dataframe?
问题描述
根据我之前的问题,如何填充列表中的值并将其转换为数据框?
这可行,但需要大量时间处理大量数据.
This works but takes a lot of time for a large amount of data.
例如,我有一个数据框:
For example, I have a dataframe:
Id Name Photo1 Photo2
1 Mark 1.jpg 2.jpg
2 Julia 1.jpg
3 Andy 1.jpg 2.jpg
我试过了:
import pandas as pd
df = pd.read_csv('PyCharmProjects/book1.csv')
df1 = df.reindex(['I','Id','Name','P','46','N','Photo1','Photo2','PH'],axis=1)
df1['I'] = df1['I'].fillna('I')
df1['P'] = df1['P'].fillna('P')
df1['46'] = df1['46'].fillna('46')
df1['N'] = df1['N'].fillna('N')
df1['PH'] = df1['PH'].fillna('PH')
df1 = df1.astype(str)
vals = [['I','Id'],['N','Name'],['P','46']]
photo_df = df1.fillna('').filter(like='Photo')
vals = [(i, y) for i, x in enumerate(photo_df.to_numpy())
for y in vals[:2] +[['PH',z]
for z in photo_df.columns[x!='']] +vals[2:]]
L = [df1.loc[df1.index[[i]], x].set_axis(range(len(x)), axis=1) for i, x in vals]
df1 = pd.concat(L)
df1
结果如下:
I 1
Name Mark
PH 1.jpg
PH 2.jpg
P 46
I 2
Name Julia
PH 1.jpg
P 46
I 3
Name Andy
PH 1.jpg
PH 2.jpg
P 46
这很好用,但是当我尝试使用大型数据集时,这将花费大量时间.
This works fine, but when I tried with large datasets, this would take an enormous amount of time.
这一行需要很多时间:
L = [df1.loc[df1.index[[i]], x].set_axis(range(len(x)), axis=1) for i, x in vals]
任何减少时间的想法或此方法的任何替代方案.
Any ideas to minimize the time or any alternatives to this approach.
推荐答案
out = (df.assign(P=46)
.stack(dropna=False)
.reset_index(level=-1)
.set_axis([0, 1], axis=1)
.replace({0: {"Id": "I", "Name": "N", r"^Photo\d+$": "PH"}}, regex=True))
我们首先 assign
列 P
值为 46,然后 stack
它同时保持 NaN
s 即,列在索引旁边,然后 reset_index
最后一层,即新来的列成为自己的列,然后 set_axis
命名为 0, 1
列.最后,在 0
列中执行所需的替换,即 ID"
到 I"
等.
We first assign
a column P
with values 46, then stack
it whilst keeping NaN
s i.e., columns come next to index, then reset_index
the last level i.e., the newly-came columns became a column of their own, then set_axis
names as 0, 1
of columns. Lastly, perform the required replacements in column 0
which are "ID"
to "I"
etc.
得到
0 1
0 I 1
0 N Mark
0 PH 1.jpg
0 PH 2.jpg
0 P 46
1 I 2
1 N Julia
1 PH 1.jpg
1 PH None
1 P 46
2 I 3
2 N Andy
2 PH 1.jpg
2 PH 2.jpg
2 P 46
这篇关于将列表转换为数据框时如何优化时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!