Python: pandas ,解析数学运算 [英] Python: pandas, parsing math operations

查看:63
本文介绍了Python: pandas ,解析数学运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

stackoverflow上的某人建议我使用熊猫标记我的csv文件的值,并提供以下代码:

somebody on stackoverflow adviced me to use pandas to label the values of my csv files and provided the code below:

# original code
import pandas

cmf = pandas.read_csv('CMF_MA68II.csv', names=['wavelength', 'x', 'y', 'z'])
d65 = pandas.read_csv('D65_MA68II_10nm.csv', names=['wavelength', 'a', 'b'])
data = pandas.read_csv('spectral_data.csv', names=['serialNumber', 'wavelength', 'measurement', 'name'])

lookup = pandas.merge(cmf, d65, on='wavelength')
merged = pandas.merge(data, lookup, on='wavelength')

totals = ((lookup[['x', 'y', 'z']].T*lookup['a']).T).sum()
wps  = 100 * totals/totals['y']

print totals['y']
print "D65_CMF_2006_10_deg white point = "
print wps

我在最后添加了这一部分:

I added this part at the end:

# here's my crappy part:

i = 0

for i in range(i, i+1), data['serialNumber']:
    x = ((merged.x * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())    
    y = ((merged.y * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())    
    z = ((merged.z * merged.a * merged.measurement).sum() / (merged.y * merged.a * 100).sum())         
    print x, y, z

但是这些行对文件的所有行执行操作,而与name无关,结果是所有单独测量的平均值.

But these lines perform the operation on all the lines of my file regardless of the nameassociated with them, the result being an average of all separate measurements.

如您所见,文件'spectral_data.csv'的结构为names=['serialNumber', 'wavelength', 'measurement', 'name']

As you can see, the structure of the file 'spectral_data.csv' is names=['serialNumber', 'wavelength', 'measurement', 'name']

我想做的是执行此操作:

What I'd like to do is to perform this operation :

merged['X'] = (merged.x * merged.a * merged.measurement).sum()/totals['y']

关于由它们的name定义的一系列数据中的

,即我的文件'spectral_data.csv'包含多个系列的值,我想获取每个值的结果并将它们存储在一个新文件中结构为[序列号","X","Y","Z",名称"]

on series of data that are defined by their name, i.e., my file 'spectral_data.csv' contains multiple series of values, and I'd like to get results for each one of them, and store them in a new file with a structure ['serial number', 'X', 'Y', 'Z', 'name']

有人对此有解决方案吗?

Anybody has a solution for this?

谢谢

文件示例: 'CMF_MA68II.csv'

file examples: 'CMF_MA68II.csv'

400,1.879338E-02,2.589775E-03,8.508254E-02
410,8.277331E-02,1.041303E-02,3.832822E-01
420,2.077647E-01,2.576133E-02,9.933444E-01
430,3.281798E-01,4.698226E-02,1.624940E+00
440,4.026189E-01,7.468288E-02,2.075946E+00
450,3.932139E-01,1.039030E-01,2.128264E+00
460,3.013112E-01,1.414586E-01,1.768440E+00
470,1.914176E-01,1.999859E-01,1.310576E+00
480,7.593120E-02,2.682271E-01,7.516389E-01
490,1.400745E-02,3.554018E-01,3.978114E-01
500,5.652072E-03,4.780482E-01,2.078158E-01
510,3.778185E-02,6.248296E-01,8.852389E-02
520,1.201511E-01,7.788199E-01,3.784916E-02
530,2.380254E-01,8.829552E-01,1.539505E-02
540,3.841856E-01,9.665325E-01,6.083223E-03
550,5.374170E-01,9.907500E-01,2.323578E-03
560,7.123849E-01,9.944304E-01,8.779264E-04
570,8.933408E-01,9.640545E-01,3.342429E-04
580,1.034327E+00,8.775360E-01,1.298230E-04
590,1.147304E+00,7.869950E-01,5.207245E-05
600,1.148163E+00,6.629035E-01,2.175998E-05
610,1.048485E+00,5.282296E-01,9.530130E-06
620,8.629581E-01,3.950755E-01,0.000000E+00
630,6.413984E-01,2.751807E-01,0.000000E+00
640,4.323126E-01,1.776882E-01,0.000000E+00
650,2.714900E-01,1.083996E-01,0.000000E+00
660,1.538163E-01,6.033976E-02,0.000000E+00
670,8.281010E-02,3.211852E-02,0.000000E+00
680,4.221473E-02,1.628841E-02,0.000000E+00
690,2.025590E-02,7.797457E-03,0.000000E+00
700,9.816228E-03,3.776140E-03,0.000000E+00

'D65_MA68II_10nm.csv'

'D65_MA68II_10nm.csv'

400,82.7549,14.708
410,91.486,17.6753
420,93.4318,20.995
430,86.6823,24.6709
440,104.865,28.7027
450,117.008,33.0859
460,117.812,37.8121
470,114.861,42.8693
480,115.923,48.2423
490,108.811,53.9132
500,109.354,59.8611
510,107.802,66.0635
520,104.79,72.4959
530,107.689,79.1326
540,104.405,85.947
550,104.046,92.912
560,100,100
570,96.3342,107.184
580,95.788,114.436
590,88.6856,121.731
600,90.0062,129.043
610,89.5991,136.346
620,87.6987,143.618
630,83.2886,150.836
640,83.6992,157.979
650,80.0268,165.028
660,80.2146,171.963
670,82.2778,178.769
680,78.2842,185.429
690,69.7213,191.931
700,71.6091,198.261

'spectral_data.csv'

'spectral_data.csv'

0,400,12.73,"a"
0,410,12.41,"a"
0,420,12.55,"a"
0,430,13.42,"a"
0,440,15.07,"a"
0,450,17.31,"a"
0,460,19.20,"a"
0,470,20.96,"a"
0,480,22.11,"a"
0,490,23.45,"a"
0,500,24.62,"a"
0,510,25.42,"a"
0,520,24.51,"a"
0,530,22.43,"a"
0,540,20.94,"a"
0,550,21.59,"a"
0,560,22.36,"a"
0,570,21.54,"a"
0,580,22.03,"a"
0,590,28.86,"a"
0,600,37.02,"a"
0,610,42.00,"a"
0,620,44.79,"a"
0,630,46.57,"a"
0,640,47.56,"a"
0,650,48.70,"a"
0,660,49.90,"a"
0,670,50.75,"a"
0,680,51.53,"a"
0,690,52.24,"a"
0,700,53.00,"a"
1,400,2.31,"b"
1,410,2.33,"b"
1,420,2.33,"b"
1,430,2.30,"b"
1,440,2.29,"b"
1,450,2.30,"b"
1,460,2.27,"b"
1,470,2.26,"b"
1,480,2.24,"b"
1,490,2.23,"b"
1,500,2.22,"b"
1,510,2.21,"b"
1,520,2.20,"b"
1,530,2.19,"b"
1,540,2.18,"b"
1,550,2.18,"b"
1,560,2.18,"b"
1,570,2.16,"b"
1,580,2.15,"b"
1,590,2.14,"b"
1,600,2.14,"b"
1,610,2.13,"b"
1,620,2.12,"b"
1,630,2.11,"b"
1,640,2.11,"b"
1,650,2.11,"b"
1,660,2.10,"b"
1,670,2.08,"b"
1,680,2.07,"b"
1,690,2.06,"b"
1,700,2.04,"b"

推荐答案

您可以对用户定义的函数进行分组和应用:

You can group and apply an user-defined function:

res =  merged.groupby(['serialNumber','name']).apply(lambda g:pd.Series([(g[c] * g.a * g.measurement).sum() / totals['y'] for c in "xyz"], index=['X','Y','Z']))
print res

这篇关于Python: pandas ,解析数学运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆