将列表转换为数据框时如何优化时间? [英] How to optimize time while converting list to dataframe?

查看:29
本文介绍了将列表转换为数据框时如何优化时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据我之前的问题,如何填充列表中的值并将其转换为数据框?

这可行,但需要大量时间处理大量数据.

This works but takes a lot of time for a large amount of data.

例如,我有一个数据框:

For example, I have a dataframe:

Id  Name  Photo1  Photo2
1   Mark  1.jpg   2.jpg
2   Julia 1.jpg
3   Andy  1.jpg   2.jpg

我试过了:

import pandas as pd

df = pd.read_csv('PyCharmProjects/book1.csv')

df1 = df.reindex(['I','Id','Name','P','46','N','Photo1','Photo2','PH'],axis=1)
df1['I'] = df1['I'].fillna('I')
df1['P'] = df1['P'].fillna('P')
df1['46'] = df1['46'].fillna('46')
df1['N'] = df1['N'].fillna('N')
df1['PH'] = df1['PH'].fillna('PH')
df1 = df1.astype(str)

vals = [['I','Id'],['N','Name'],['P','46']]

photo_df = df1.fillna('').filter(like='Photo')

vals = [(i, y) for i, x in enumerate(photo_df.to_numpy())
                    for y in vals[:2] +[['PH',z] 
                    for z in photo_df.columns[x!='']] +vals[2:]]

L = [df1.loc[df1.index[[i]], x].set_axis(range(len(x)), axis=1) for i, x in vals]

df1 = pd.concat(L)

df1

结果如下:

I     1
Name  Mark
PH    1.jpg
PH    2.jpg
P     46
I     2
Name  Julia
PH    1.jpg
P     46
I     3
Name  Andy
PH    1.jpg
PH    2.jpg
P     46

这很好用,但是当我尝试使用大型数据集时,这将花费大量时间.

This works fine, but when I tried with large datasets, this would take an enormous amount of time.

这一行需要很多时间:

L = [df1.loc[df1.index[[i]], x].set_axis(range(len(x)), axis=1) for i, x in vals]

任何减少时间的想法或此方法的任何替代方案.

Any ideas to minimize the time or any alternatives to this approach.

推荐答案

out = (df.assign(P=46)
         .stack(dropna=False)
         .reset_index(level=-1)
         .set_axis([0, 1], axis=1)
         .replace({0: {"Id": "I", "Name": "N", r"^Photo\d+$": "PH"}}, regex=True))

我们首先 assignP 值为 46,然后 stack 它同时保持 NaNs 即,列在索引旁边,然后 reset_index 最后一层,即新来的列成为自己的列,然后 set_axis 命名为 0, 1 列.最后,在 0 列中执行所需的替换,即 ID"I" 等.

We first assign a column P with values 46, then stack it whilst keeping NaNs i.e., columns come next to index, then reset_index the last level i.e., the newly-came columns became a column of their own, then set_axis names as 0, 1 of columns. Lastly, perform the required replacements in column 0 which are "ID" to "I" etc.

得到

    0      1
0   I      1
0   N   Mark
0  PH  1.jpg
0  PH  2.jpg
0   P     46
1   I      2
1   N  Julia
1  PH  1.jpg
1  PH   None
1   P     46
2   I      3
2   N   Andy
2  PH  1.jpg
2  PH  2.jpg
2   P     46

这篇关于将列表转换为数据框时如何优化时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆