如何标准化我的数据框,使我的线图从同一点开始? [英] how can I normalize my dataframe , in way that my line plots start from a same point?

查看:66
本文介绍了如何标准化我的数据框,使我的线图从同一点开始?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框(名为 net_asset),从 2015 年到今天

 a b c d e f g h i j k l m n o p q r日期2015年4月30日162.20100 38.69620 98.88842 11.75094 8.92177 1.07767 112.81237 110.08090 NaN的4.20428 221.5440 NaN的1.63142 155.30297 8.19891 13.94684 7.40493 27.853452015年5月29日164.04053 39.19910 101.54701 11.97325 8.94295 1.12211 114.48715 113.24696 NaN的4.30719 215.7512 NaN的1.65257 154.85456 8.33938 14.29280 7.47724 27.328462015年6月30日163.17050 39.00262 101.77694 11.93908 8.96241 1.13880 114.23190 112.75483 10.0000 4.22515 207.5485 NaN的1.67049 158.25418 8.57353 14.13962 7.61546 26.996182015年7月31日160.73069 38.49814 102.63752 11.95354 8.93894 1.14438 111.00177 110.01403 10.1106 4.19375 205.0794 NaN的1.65833 161.83255 8.67075 14.25327 7.67866 27.31167

为了更容易比较绘图后的数据,我希望所有列都从同一点开始,这里是 100.(在 2015 年应该都是 100)

我尝试了下面的代码,但无法得到我想象的结果,2015 年是 100.

net_asset.apply(lambda x: (x - x.min())/(x.max() - x.min()))

上面的代码返回.net_asset.head()

日期2015年4月30日29.481157 20.728226 12.566996 14.006493 24.887183 85.363231 11.168351 20.119944 NaN的26.292755 38.674209 NaN的19.586481 9.290352 5.570366 9.204228 4.566915 100.0000002015年5月29日31.475018 22.683843 15.138121 16.334712 25.302741 95.113764 12.794772 25.172351 NaN的31.434296 34.177011 NaN的21.440216 9.022051 7.029734 11.419483 5.223939 95.558550二〇一五年六月三十〇日30.531995 21.919795 15.360487 15.976855 25.684553 98.775698 12.546892 24.387008 26.207877 27.335452 27.808905 NaN的23.010851 11.056174 9.462360 10.438639 6.479836 92.7474402015年7月31日27.887493 19.958033 16.192755 16.128292 25.224064 100.000000 9.410033 20.013232 27.427053 25.766660 25.892037 NaN的21.945063 13.197250 10.472396 11.166364 7.054085 95.416506

net_asset.tail()

<预> <代码>二零二零年十一月三十零日67.200005 72.608636 76.959357 85.856731 88.155809 57.219650 94.367147 84.263184 84.411962 49.771676 78.669830 91.698367 91.659509 95.793550 97.312319 100.000000 98.638703 12.5720802020年12月31日79.321960 80.759312 87.806721 94.821595 96.394572 69.535073 99.215011 97.320232 87.610922 62.294533 89.893726 100.000000 100.000000 100.000000 100.000000 99.515149 100.000000 20.8186972021年1月29日82.292270 80.581521 87.481611 92.795622 97.256100 70.575071 99.335197 93.571979 89.231346 58.588387 91.402937 92.293295 96.259225 96.302455 93.245683 95.127478 94.362002 20.4057622021年2月26日91.587476 90.773715 91.445362 94.800335 98.102520 81.569651 95.674504 91.847156 97.434880 70.743028 97.713593 85.960528 89.612951 93.915749 88.721404 87.146839 88.763620 21.7161412021年3月31日100.000000 100.000000 100.000000 100.000000 100.000000 91.807271 100.000000 97.903339 100.000000 81.996363 100.000000 94.200479 87.929251 89.484993 86.827664 86.035818 87.447754 19.689448

有什么方法可以做到这一点?谢谢

  • 有些列以 Nan 开头,但后来才有价值
  • 在excel中,我通过将每一行除以第一行并乘以百来实现.=(A2/$A$2)*100

解决方案

如果要对每一列应用归一化,必须使用axis=0

Z-Score 归一化

"计算 z 分数的公式是 z = (x-μ)/σ,其中 x 是原始分数,μ 是总体平均值,σ是总体标准差.正如公式所示,z 分数只是原始分数减去总体平均值,再除以总体标准差.

#get 表示每一列均值 = df.mean(axis=0)#获取标准差std = df.std(轴=0)#正常化归一化 = ((df - mean)/std)

或在一行中

归一化 = (df - df.mean())/df.std()

最小-最大归一化

归一化 = (df-df.min())/(df.max()-df.min())

如果要将值固定为 100,只需乘以 100

归一化 = ( (df-df.min())/(df.max()-df.min()) * 100 )

I have a dataframe like the following(named net_asset), from 2015 to today

    a   b   c   d   e   f   g   h   i   j   k   l   m   n   o   p   q   r
Date                                                                        
2015-04-30  162.20100   38.69620    98.88842    11.75094    8.92177 1.07767 112.81237   110.08090   NaN 4.20428 221.5440    NaN 1.63142 155.30297   8.19891 13.94684    7.40493 27.85345
2015-05-29  164.04053   39.19910    101.54701   11.97325    8.94295 1.12211 114.48715   113.24696   NaN 4.30719 215.7512    NaN 1.65257 154.85456   8.33938 14.29280    7.47724 27.32846
2015-06-30  163.17050   39.00262    101.77694   11.93908    8.96241 1.13880 114.23190   112.75483   10.0000 4.22515 207.5485    NaN 1.67049 158.25418   8.57353 14.13962    7.61546 26.99618
2015-07-31  160.73069   38.49814    102.63752   11.95354    8.93894 1.14438 111.00177   110.01403   10.1106 4.19375 205.0794    NaN 1.65833 161.83255   8.67075 14.25327    7.67866 27.31167

to be more easier to compare the data after plotting, I want all the columns start at the same point,here at 100.(at 2015 should be all 100)

I'd tried the code bellow, but couldn't get what I imagined,which was 100 at 2015.

net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))

the above code returns. net_asset.head()

Date                                                                        
2015-04-30  29.481157   20.728226   12.566996   14.006493   24.887183   85.363231   11.168351   20.119944   NaN 26.292755   38.674209   NaN 19.586481   9.290352    5.570366    9.204228    4.566915    100.000000
2015-05-29  31.475018   22.683843   15.138121   16.334712   25.302741   95.113764   12.794772   25.172351   NaN 31.434296   34.177011   NaN 21.440216   9.022051    7.029734    11.419483   5.223939    95.558550
2015-06-30  30.531995   21.919795   15.360487   15.976855   25.684553   98.775698   12.546892   24.387008   26.207877   27.335452   27.808905   NaN 23.010851   11.056174   9.462360    10.438639   6.479836    92.747440
2015-07-31  27.887493   19.958033   16.192755   16.128292   25.224064   100.000000  9.410033    20.013232   27.427053   25.766660   25.892037   NaN 21.945063   13.197250   10.472396   11.166364   7.054085    95.416506

net_asset.tail()

2020-11-30  67.200005   72.608636   76.959357   85.856731   88.155809   57.219650   94.367147   84.263184   84.411962   49.771676   78.669830   91.698367   91.659509   95.793550   97.312319   100.000000  98.638703   12.572080
2020-12-31  79.321960   80.759312   87.806721   94.821595   96.394572   69.535073   99.215011   97.320232   87.610922   62.294533   89.893726   100.000000  100.000000  100.000000  100.000000  99.515149   100.000000  20.818697
2021-01-29  82.292270   80.581521   87.481611   92.795622   97.256100   70.575071   99.335197   93.571979   89.231346   58.588387   91.402937   92.293295   96.259225   96.302455   93.245683   95.127478   94.362002   20.405762
2021-02-26  91.587476   90.773715   91.445362   94.800335   98.102520   81.569651   95.674504   91.847156   97.434880   70.743028   97.713593   85.960528   89.612951   93.915749   88.721404   87.146839   88.763620   21.716141
2021-03-31  100.000000  100.000000  100.000000  100.000000  100.000000  91.807271   100.000000  97.903339   100.000000  81.996363   100.000000  94.200479   87.929251   89.484993   86.827664   86.035818   87.447754   19.689448

what is the way to do this? thank you

  • some columns start with Nan but got value later
  • in excel I do it by dividing each row to the first and multiply by hundred. =(A2/$A$2)*100

解决方案

if you want to apply normalization each column, you have to use axis=0

Z-Score Normalization

"The formula for calculating a z-score is is z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation. As the formula shows, the z-score is simply the raw score minus the population mean, divided by the population standard deviation."

#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)

or in one line

normalization = (df - df.mean()) / df.std()

Min-max normalization

normalization = (df-df.min()) / (df.max()-df.min())

if you want to fix your values to 100, just multiply with 100

normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )

这篇关于如何标准化我的数据框,使我的线图从同一点开始?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆