如何标准化我的数据框,使我的线图从同一点开始? [英] how can I normalize my dataframe , in way that my line plots start from a same point?
问题描述
我有一个如下所示的数据框(名为 net_asset),从 2015 年到今天
a b c d e f g h i j k l m n o p q r日期2015年4月30日162.20100 38.69620 98.88842 11.75094 8.92177 1.07767 112.81237 110.08090 NaN的4.20428 221.5440 NaN的1.63142 155.30297 8.19891 13.94684 7.40493 27.853452015年5月29日164.04053 39.19910 101.54701 11.97325 8.94295 1.12211 114.48715 113.24696 NaN的4.30719 215.7512 NaN的1.65257 154.85456 8.33938 14.29280 7.47724 27.328462015年6月30日163.17050 39.00262 101.77694 11.93908 8.96241 1.13880 114.23190 112.75483 10.0000 4.22515 207.5485 NaN的1.67049 158.25418 8.57353 14.13962 7.61546 26.996182015年7月31日160.73069 38.49814 102.63752 11.95354 8.93894 1.14438 111.00177 110.01403 10.1106 4.19375 205.0794 NaN的1.65833 161.83255 8.67075 14.25327 7.67866 27.31167
为了更容易比较绘图后的数据,我希望所有列都从同一点开始,这里是 100.(在 2015 年应该都是 100)
我尝试了下面的代码,但无法得到我想象的结果,2015 年是 100.
net_asset.apply(lambda x: (x - x.min())/(x.max() - x.min()))
上面的代码返回.net_asset.head()
日期2015年4月30日29.481157 20.728226 12.566996 14.006493 24.887183 85.363231 11.168351 20.119944 NaN的26.292755 38.674209 NaN的19.586481 9.290352 5.570366 9.204228 4.566915 100.0000002015年5月29日31.475018 22.683843 15.138121 16.334712 25.302741 95.113764 12.794772 25.172351 NaN的31.434296 34.177011 NaN的21.440216 9.022051 7.029734 11.419483 5.223939 95.558550二〇一五年六月三十〇日30.531995 21.919795 15.360487 15.976855 25.684553 98.775698 12.546892 24.387008 26.207877 27.335452 27.808905 NaN的23.010851 11.056174 9.462360 10.438639 6.479836 92.7474402015年7月31日27.887493 19.958033 16.192755 16.128292 25.224064 100.000000 9.410033 20.013232 27.427053 25.766660 25.892037 NaN的21.945063 13.197250 10.472396 11.166364 7.054085 95.416506
net_asset.tail()
<预> <代码>二零二零年十一月三十零日67.200005 72.608636 76.959357 85.856731 88.155809 57.219650 94.367147 84.263184 84.411962 49.771676 78.669830 91.698367 91.659509 95.793550 97.312319 100.000000 98.638703 12.5720802020年12月31日79.321960 80.759312 87.806721 94.821595 96.394572 69.535073 99.215011 97.320232 87.610922 62.294533 89.893726 100.000000 100.000000 100.000000 100.000000 99.515149 100.000000 20.8186972021年1月29日82.292270 80.581521 87.481611 92.795622 97.256100 70.575071 99.335197 93.571979 89.231346 58.588387 91.402937 92.293295 96.259225 96.302455 93.245683 95.127478 94.362002 20.4057622021年2月26日91.587476 90.773715 91.445362 94.800335 98.102520 81.569651 95.674504 91.847156 97.434880 70.743028 97.713593 85.960528 89.612951 93.915749 88.721404 87.146839 88.763620 21.7161412021年3月31日100.000000 100.000000 100.000000 100.000000 100.000000 91.807271 100.000000 97.903339 100.000000 81.996363 100.000000 94.200479 87.929251 89.484993 86.827664 86.035818 87.447754 19.689448有什么方法可以做到这一点?谢谢
- 有些列以 Nan 开头,但后来才有价值
- 在excel中,我通过将每一行除以第一行并乘以百来实现.=(A2/$A$2)*100
如果要对每一列应用归一化,必须使用axis=0
Z-Score 归一化
"计算 z 分数的公式是 z = (x-μ)/σ,其中 x 是原始分数,μ 是总体平均值,σ是总体标准差.正如公式所示,z 分数只是原始分数减去总体平均值,再除以总体标准差.
#get 表示每一列均值 = df.mean(axis=0)#获取标准差std = df.std(轴=0)#正常化归一化 = ((df - mean)/std)
或在一行中
归一化 = (df - df.mean())/df.std()
最小-最大归一化
归一化 = (df-df.min())/(df.max()-df.min())
如果要将值固定为 100,只需乘以 100
归一化 = ( (df-df.min())/(df.max()-df.min()) * 100 )
I have a dataframe like the following(named net_asset), from 2015 to today
a b c d e f g h i j k l m n o p q r
Date
2015-04-30 162.20100 38.69620 98.88842 11.75094 8.92177 1.07767 112.81237 110.08090 NaN 4.20428 221.5440 NaN 1.63142 155.30297 8.19891 13.94684 7.40493 27.85345
2015-05-29 164.04053 39.19910 101.54701 11.97325 8.94295 1.12211 114.48715 113.24696 NaN 4.30719 215.7512 NaN 1.65257 154.85456 8.33938 14.29280 7.47724 27.32846
2015-06-30 163.17050 39.00262 101.77694 11.93908 8.96241 1.13880 114.23190 112.75483 10.0000 4.22515 207.5485 NaN 1.67049 158.25418 8.57353 14.13962 7.61546 26.99618
2015-07-31 160.73069 38.49814 102.63752 11.95354 8.93894 1.14438 111.00177 110.01403 10.1106 4.19375 205.0794 NaN 1.65833 161.83255 8.67075 14.25327 7.67866 27.31167
to be more easier to compare the data after plotting, I want all the columns start at the same point,here at 100.(at 2015 should be all 100)
I'd tried the code bellow, but couldn't get what I imagined,which was 100 at 2015.
net_asset.apply(lambda x: (x - x.min()) / (x.max() - x.min()))
the above code returns. net_asset.head()
Date
2015-04-30 29.481157 20.728226 12.566996 14.006493 24.887183 85.363231 11.168351 20.119944 NaN 26.292755 38.674209 NaN 19.586481 9.290352 5.570366 9.204228 4.566915 100.000000
2015-05-29 31.475018 22.683843 15.138121 16.334712 25.302741 95.113764 12.794772 25.172351 NaN 31.434296 34.177011 NaN 21.440216 9.022051 7.029734 11.419483 5.223939 95.558550
2015-06-30 30.531995 21.919795 15.360487 15.976855 25.684553 98.775698 12.546892 24.387008 26.207877 27.335452 27.808905 NaN 23.010851 11.056174 9.462360 10.438639 6.479836 92.747440
2015-07-31 27.887493 19.958033 16.192755 16.128292 25.224064 100.000000 9.410033 20.013232 27.427053 25.766660 25.892037 NaN 21.945063 13.197250 10.472396 11.166364 7.054085 95.416506
net_asset.tail()
2020-11-30 67.200005 72.608636 76.959357 85.856731 88.155809 57.219650 94.367147 84.263184 84.411962 49.771676 78.669830 91.698367 91.659509 95.793550 97.312319 100.000000 98.638703 12.572080
2020-12-31 79.321960 80.759312 87.806721 94.821595 96.394572 69.535073 99.215011 97.320232 87.610922 62.294533 89.893726 100.000000 100.000000 100.000000 100.000000 99.515149 100.000000 20.818697
2021-01-29 82.292270 80.581521 87.481611 92.795622 97.256100 70.575071 99.335197 93.571979 89.231346 58.588387 91.402937 92.293295 96.259225 96.302455 93.245683 95.127478 94.362002 20.405762
2021-02-26 91.587476 90.773715 91.445362 94.800335 98.102520 81.569651 95.674504 91.847156 97.434880 70.743028 97.713593 85.960528 89.612951 93.915749 88.721404 87.146839 88.763620 21.716141
2021-03-31 100.000000 100.000000 100.000000 100.000000 100.000000 91.807271 100.000000 97.903339 100.000000 81.996363 100.000000 94.200479 87.929251 89.484993 86.827664 86.035818 87.447754 19.689448
what is the way to do this? thank you
- some columns start with Nan but got value later
- in excel I do it by dividing each row to the first and multiply by hundred. =(A2/$A$2)*100
if you want to apply normalization each column, you have to use axis=0
Z-Score Normalization
"The formula for calculating a z-score is is z = (x-μ)/σ, where x is the raw score, μ is the population mean, and σ is the population standard deviation. As the formula shows, the z-score is simply the raw score minus the population mean, divided by the population standard deviation."
#get mean each column
mean = df.mean(axis=0)
#get standard deviation
std = df.std(axis=0)
#normalization
normalization = ((df - mean) / std)
or in one line
normalization = (df - df.mean()) / df.std()
Min-max normalization
normalization = (df-df.min()) / (df.max()-df.min())
if you want to fix your values to 100, just multiply with 100
normalization = ( (df-df.min()) / (df.max()-df.min()) * 100 )
这篇关于如何标准化我的数据框,使我的线图从同一点开始?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!