用优雅的Pandas代码替换迭代 [英] Replace Iterations by elegant Pandas code

查看：67 发布时间：2020/5/18 22:21:19 python pandas performance numpy dataframe

本文介绍了用优雅的Pandas代码替换迭代的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将我的旧学校代码更改为优雅/快速的Pandas代码，例如上一个问题之一:

I am trying to change my old school codes for the elegant/fast Pandas codes, such as the one of this previous question:

滚动操作速度较慢，无法创建新列

我想使用Pandas优雅而快速的代码来提高4种不同的代码性能.

I have 4 different codes that I would like to improve in performance with an elegant and fast code using Pandas.

原始数据帧(df)与此类似(尽管更大):

The original dataframe (df) is similar to this (although much larger):

idx = [np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Apr-18', 'Apr-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',]),np.array(['A', 'B', 'B', 'A', 'B', 'C', 'A', 'B', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])]
data = [{'x': 1}, {'x': 5}, {'x': 3}, {'x': 2}, {'x': 7}, {'x': 3},{'x': 1}, {'x': 6}, {'x': 3}, {'x': 5}, {'x': 2}, {'x': 3},{'x': 1}, {'x': 9}, {'x': 3}, {'x': 2}, {'x': 7}, {'x': 3}, {'x': 6}, {'x': 8}, {'x': 2}, {'x': 7}, {'x': 9}]
df = pd.DataFrame(data, index=idx, columns=['x'])
df.index.names=['date','type']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y')
df=df.set_index(['date','type'])

它看起来像这样:

                 x
date       type
2018-01-01 A     1
           B     5
2018-02-01 B     3
2018-03-01 A     2
           B     7
           C     3
2018-04-01 A     1
           B     6
2018-05-01 B     3
2018-06-01 A     5
           B     2
           C     3
2018-07-01 A     1
2018-08-01 B     9
           C     3
2018-09-01 A     2
           B     7
2018-10-01 C     3
           A     6
           B     8
2018-11-01 A     2
2018-12-01 B     7
           C     9

我的目标是改善这种缓慢的for循环代码.在我的代码下面:

My goal is to improve this slow for-loop code. Below my code:

df=df.reset_index()
df['y']=0
for j in df['date'].unique():
    list_1=list(df['type'][df['date']==j].index)
    df['y'][list_1]=np.mean(df['x'][df['date']==j])

结果如下:

         date type  x         y
0  2018-01-01    A  1  3.000000
1  2018-01-01    B  5  3.000000
2  2018-02-01    B  3  3.000000
3  2018-03-01    A  2  4.000000
4  2018-03-01    B  7  4.000000
5  2018-03-01    C  3  4.000000
6  2018-04-01    A  1  3.500000
7  2018-04-01    B  6  3.500000
8  2018-05-01    B  3  3.000000
9  2018-06-01    A  5  3.333333
10 2018-06-01    B  2  3.333333
11 2018-06-01    C  3  3.333333
12 2018-07-01    A  1  1.000000
13 2018-08-01    B  9  6.000000
14 2018-08-01    C  3  6.000000
15 2018-09-01    A  2  4.500000
16 2018-09-01    B  7  4.500000
17 2018-10-01    C  3  5.666667
18 2018-10-01    A  6  5.666667
19 2018-10-01    B  8  5.666667
20 2018-11-01    A  2  2.000000
21 2018-12-01    B  7  8.000000
22 2018-12-01    C  9  8.000000

**我尝试了下面的Pandas代码，但没有用(我仍然需要查看更多示例来弄清楚它是如何工作的):

** I tried the following Pandas code, but it doesn't work (I still need to see more examples to figure out how it works):

df['y'] = df.groupby('date')['x'].mean().reset_index(level=2, drop=True).swaplevel(0,1)

2)所有类型的日期观察(使用相同的数据框(df)):

我的目标是衡量每个日期的类型数.

2) Observations by date for all types (uses the same dataframe (df)):

My goal is to measure the number of types for every date.

我的慢速代码是:

df=df.reset_index()
df['y']=0
for j in df['date'].unique():
    list_1=list(df['type'][df['date']==j].index)
    df['y'][list_1]=len(df['type'][df['date']==j])

结果如下:

         date type  x  y
0  2018-01-01    A  1  2
1  2018-01-01    B  5  2
2  2018-02-01    B  3  1
3  2018-03-01    A  2  3
4  2018-03-01    B  7  3
5  2018-03-01    C  3  3
6  2018-04-01    A  1  2
7  2018-04-01    B  6  2
8  2018-05-01    B  3  1
9  2018-06-01    A  5  3
10 2018-06-01    B  2  3
11 2018-06-01    C  3  3
12 2018-07-01    A  1  1
13 2018-08-01    B  9  2
14 2018-08-01    C  3  2
15 2018-09-01    A  2  2
16 2018-09-01    B  7  2
17 2018-10-01    C  3  3
18 2018-10-01    A  6  3
19 2018-10-01    B  8  3
20 2018-11-01    A  2  1
21 2018-12-01    B  7  2
22 2018-12-01    C  9  2

3)"A"类型的日期观察(使用相同的数据框(df)):

我的目标是测量每个日期的类型A的数量.

3) Observations by date for 'A' type (uses the same dataframe (df)):

My goal is to measure the number of type A for every date.

我的慢速代码如下:

df=df.reset_index()
df['z']=0
df['y']=0

for index,row in df.iterrows():
    if row['type']=='A':
        df['z'][index]=1
    else:
        df['z'][index]=0

for j in df['date'].unique():
    list_1=list(df['type'][df['date']==j].index)
    df['y'][list_1]=sum(df['z'][df['date']==j])

del df['z']

它看起来像这样:

         date type  x  y
0  2018-01-01    A  1  1
1  2018-01-01    B  5  1
2  2018-02-01    B  3  0
3  2018-03-01    A  2  1
4  2018-03-01    B  7  1
5  2018-03-01    C  3  1
6  2018-04-01    A  1  1
7  2018-04-01    B  6  1
8  2018-05-01    B  3  0
9  2018-06-01    A  5  1
10 2018-06-01    B  2  1
11 2018-06-01    C  3  1
12 2018-07-01    A  1  1
13 2018-08-01    B  9  0
14 2018-08-01    C  3  0
15 2018-09-01    A  2  1
16 2018-09-01    B  7  1
17 2018-10-01    C  3  1
18 2018-10-01    A  6  1
19 2018-10-01    B  8  1
20 2018-11-01    A  2  1
21 2018-12-01    B  7  0
22 2018-12-01    C  9  0

4)对于其z值= 1的"A"类型的观测值:

请使用以下数据框(df1):

4) Observations for 'A' type for which its z value = 1:

Please use the following dataframe (df1):

idx = [np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Apr-18', 'Apr-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',]),np.array(['A', 'B', 'B', 'A', 'B', 'C', 'A', 'B', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])]
data = [{'x': 10, 'z': 1}, {'x': 50, 'z': 0}, {'x': 30, 'z': 0}, {'x': 20, 'z': 0}, {'x': 70, 'z': 1}, {'x': 30, 'z': 1},{'x': 10, 'z': 1}, {'x': 60, 'z': 0}, {'x': 30, 'z': 0}, {'x': 50, 'z': 1}, {'x': 20, 'z': 0}, {'x': 30, 'z': 1},{'x': 10, 'z': 0}, {'x': 90, 'z': 1}, {'x': 30, 'z': 1}, {'x': 20, 'z': 1}, {'x': 70, 'z': 0}, {'x': 30, 'z': 0}, {'x': 60, 'z': 1}, {'x': 80, 'z': 1}, {'x': 20, 'z': 0}, {'x': 70, 'z': 0}, {'x': 90, 'z': 1}]
df1 = pd.DataFrame(data, index=idx, columns=['x','z'])
df1.index.names=['date','type']
df1=df1.reset_index()
df1['date'] = pd.to_datetime(df1['date'],format = '%b-%y')
df1=df1.set_index(['date','type'])

此数据帧(df1)如下所示:

This dataframe (df1) looks like this:

                  x  z
date       type
2018-01-01 A     10  1
           B     50  0
2018-02-01 B     30  0
2018-03-01 A     20  0
           B     70  1
           C     30  1
2018-04-01 A     10  1
           B     60  0
2018-05-01 B     30  0
2018-06-01 A     50  1
           B     20  0
           C     30  1
2018-07-01 A     10  0
2018-08-01 B     90  1
           C     30  1
2018-09-01 A     20  1
           B     70  0
2018-10-01 C     30  0
           A     60  1
           B     80  1
2018-11-01 A     20  0
2018-12-01 B     70  0
           C     90  1

我的慢速代码是:

df1=df1.reset_index()
df1['h']=0
df1['k']=0
df1['y']=0

for index,row in df1.iterrows():
    if row['type']=='A':
        df1['h'][index]=1
    else:
        df1['h'][index]=0

for index,row in df1.iterrows():
    if row['z']==1 and row['h']==1:
        df1['k'][index]=1
    else:
        df1['k'][index]=0   

for j in df1['date'].unique():
    list_1=list(df1['type'][df1['date']==j].index)
    df1['y'][list_1]=sum(df1['k'][df1['date']==j])

del df1['h']
del df1['k']

它看起来像这样:

         date type   x  z  y
0  2018-01-01    A  10  1  1
1  2018-01-01    B  50  0  1
2  2018-02-01    B  30  0  0
3  2018-03-01    A  20  0  0
4  2018-03-01    B  70  1  0
5  2018-03-01    C  30  1  0
6  2018-04-01    A  10  1  1
7  2018-04-01    B  60  0  1
8  2018-05-01    B  30  0  0
9  2018-06-01    A  50  1  1
10 2018-06-01    B  20  0  1
11 2018-06-01    C  30  1  1
12 2018-07-01    A  10  0  0
13 2018-08-01    B  90  1  0
14 2018-08-01    C  30  1  0
15 2018-09-01    A  20  1  1
16 2018-09-01    B  70  0  1
17 2018-10-01    C  30  0  1
18 2018-10-01    A  60  1  1
19 2018-10-01    B  80  1  1
20 2018-11-01    A  20  0  0
21 2018-12-01    B  70  0  0
22 2018-12-01    C  90  1  0

尽管问题似乎很久，但我知道Pandas的答案可以用几行代码来解决.如果您可以让我知道您的代码比我的代码快多少，那将非常有用.

Although the question seems long I know that the answers with Pandas can be in a few lines of code. If you can let me know how much faster are your codes than mine it would be very useful.

输出`df`:

                 x  num_A     x_avg  N
date       type                       
2018-01-01 A     1      1  3.000000  2
           B     5      1  3.000000  2
2018-02-01 B     3      0  3.000000  1
2018-03-01 A     2      1  4.000000  3
           B     7      1  4.000000  3
           C     3      1  4.000000  3
2018-04-01 A     1      1  3.500000  2
           B     6      1  3.500000  2
2018-05-01 B     3      0  3.000000  1
2018-06-01 A     5      1  3.333333  3
           B     2      1  3.333333  3
           C     3      1  3.333333  3
2018-07-01 A     1      1  1.000000  1
2018-08-01 B     9      0  6.000000  2
           C     3      0  6.000000  2
2018-09-01 A     2      1  4.500000  2
           B     7      1  4.500000  2
2018-10-01 C     3      1  5.666667  3
           A     6      1  5.666667  3
           B     8      1  5.666667  3
2018-11-01 A     2      1  2.000000  1
2018-12-01 B     7      0  8.000000  2
           C     9      0  8.000000  2

这篇关于用优雅的Pandas代码替换迭代的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用优雅的Pandas代码替换迭代 [英] Replace Iterations by elegant Pandas code

问题描述

2)所有类型的日期观察(使用相同的数据框(df)):

2) Observations by date for all types (uses the same dataframe (df)):

3)"A"类型的日期观察(使用相同的数据框(df)):

3) Observations by date for 'A' type (uses the same dataframe (df)):

4)对于其z值= 1的"A"类型的观测值:

4) Observations for 'A' type for which its z value = 1:

推荐答案

输出`df`:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

用优雅的Pandas代码替换迭代 [英] Replace Iterations by elegant Pandas code

问题描述

2)所有类型的日期观察(使用相同的数据框(df)):

2) Observations by date for all types (uses the same dataframe (df)):

3)"A"类型的日期观察(使用相同的数据框(df)):

3) Observations by date for 'A' type (uses the same dataframe (df)):

4)对于其z值= 1的"A"类型的观测值:

4) Observations for 'A' type for which its z value = 1:

推荐答案

输出df:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

输出`df`:

登录关闭