如何计算 pandas 中一行中所有元素的加权和？ [英] How to compute weighted sum of all elements in a row in pandas?

查看：143 发布时间：2017/3/26 2:37:39 python pandas dataframe calculated-columns weighted-average

本文介绍了如何计算 pandas 中一行中所有元素的加权和？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个有多列的熊猫数据框。我想从行和另一个列向量数据框 weight

中的值创建一个新列 weighted_sum

weighted_sum 应该具有以下值：

row [weighted_sum] = row [col0] * weight [0] + row [col1] * weight [1] + row [col2] * weight [2] + ...

我发现函数 sum（axis = 1），但不允许我乘以

编辑：我改变了一些东西。

 
 
   weight 如下所示：
  0 
 col1 0.5 
 col2 0.3 
 col3 0.2 
  
  df 如下所示：
  col1 col2 col3 
 1.0 2.2 3.5 
 6.1 0.4 1.2 
  
  df * weight 返回数据帧已满的 Nan 值。
解决方案
问题是你倍增一个fr具有不同大小的框架，具有不同的行索引。这是解决方案：
 在[121]中：df = DataFrame（[[1,2.2,3.5]，[6.1,0.4 ，1.2]]，columns = list（'abc'））
 
在[122]中：weight = DataFrame（Series（[0.5,0.3,0.2]，index = list（'abc' name = 0））
 
在[123]中：df 
出[123]：
abc 
 0 1.00 2.20 3.50 
 1 6.10 0.40 1.20 
 
在[124]中：weight 
 Out [124]：
 0 
a 0.50 
b 0.30 
c 0.20 
 
 [125]：df * weight 
 Out [125]：
 0 abc 
 0 nan nan nan nan 
 1 nan nan nan nan 
a nan nan nan nan 
b nan nan nan nan 
c nan nan nan nan 
  
您可以访问列： 
 在[126]中：df * weight [0] 
输出[126]：
 abc 
 0 0.50 0.66 0.70 
 1 3.05 0.12 0.24 
 
在[128]中：（df * weight [0]）sum（1）
 Out [128 ]：
 0 1.86 
 1 3.41 
 dtype：float64 
  
或者使用 dot 取回另一个 DataFrame  
 在[127]中：df.dot（weight）
 Out [127]：
 0 
 0 1.86 
 1 3.41 
  
将其全部合并：
 在[130]中：df ['weighted_sum'] = df.dot（weight）
 
在[131]中：df 
输出[131]：
abc weighted_sum 
 0 1.00 2.20 3.50 1.86 
 1 6.10 0.40 1.20 3.41 
  
这里是使用更大的 DataFrame 的 timeit 。
 在[145]中：df = DataFrame（randn（10000000，3），columns = list（'a bc'）
 weight 
在[146]中：weight = DataFrame（Series（[0.5，0.3，0.2]，index = list（'abc'），name = 0））
 
在[147]：timeit df.dot（weight）
 10循环，最好3：57.5 ms每循环
 
在[148]中：timeit（df * weight [ 0]）。sum（1）
 10循环，最好3：125 ms每循环
  
对于广泛的 DataFrame ：
 在[162]中：df = DataFrame（randn（10000，1000））
 
在[163]中：weight = DataFrame（randn（1000,1））
 
在[164]中：timeit df。点（重量）
 100循环，最佳3：每循环5.14毫秒
 
在[165]：timeit（df * weight [0]）。sum（1）
 10个循环，最好3：41.8 ms每循环
  
所以， dot 更快，更可读。
 
 
  注意：如果您的任何数据包含 NaN  s，那么你不应该使用 dot ，你应该使用multip-and-sum方法。  dot 不能处理 NaN ，因为它只是一个薄的包装器，围绕 numpy.dot（） （不处理 NaN  s）。
 
I have a pandas data frame with multiple columns. I want to create a new column weighted_sum from the values in the row and another column vector dataframe weight 

weighted_sum should have the following value:

row[weighted_sum] = row[col0]*weight[0] + row[col1]*weight[1] + row[col2]*weight[2] + ...

I found the function sum(axis=1), but it doesn't let me multiply with weight.

Edit:
I changed things a bit.

weight looks like this:
     0
col1 0.5
col2 0.3
col3 0.2
df looks like this:
col1 col2 col3
1.0  2.2  3.5
6.1  0.4  1.2
df*weight returns a dataframe full of Nan values.
 解决方案 
The problem is that you're multiplying a frame with a frame of a different size with a different row index. Here's the solution:
In [121]: df = DataFrame([[1,2.2,3.5],[6.1,0.4,1.2]], columns=list('abc'))

In [122]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))

In [123]: df
Out[123]:
           a          b          c
0       1.00       2.20       3.50
1       6.10       0.40       1.20

In [124]: weight
Out[124]:
           0
a       0.50
b       0.30
c       0.20

In [125]: df * weight
Out[125]:
           0          a          b          c
0        nan        nan        nan        nan
1        nan        nan        nan        nan
a        nan        nan        nan        nan
b        nan        nan        nan        nan
c        nan        nan        nan        nan
You can either access the column:
In [126]: df * weight[0]
Out[126]:
           a          b          c
0       0.50       0.66       0.70
1       3.05       0.12       0.24

In [128]: (df * weight[0]).sum(1)
Out[128]:
0         1.86
1         3.41
dtype: float64
Or use dot to get back another DataFrame
In [127]: df.dot(weight)
Out[127]:
           0
0       1.86
1       3.41
To bring it all together:
In [130]: df['weighted_sum'] = df.dot(weight)

In [131]: df
Out[131]:
           a          b          c  weighted_sum
0       1.00       2.20       3.50          1.86
1       6.10       0.40       1.20          3.41
Here are the timeits of each method, using a larger DataFrame.
In [145]: df = DataFrame(randn(10000000, 3), columns=list('abc'))
weight
In [146]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))

In [147]: timeit df.dot(weight)
10 loops, best of 3: 57.5 ms per loop

In [148]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 125 ms per loop
For a wide DataFrame:
In [162]: df = DataFrame(randn(10000, 1000))

In [163]: weight = DataFrame(randn(1000, 1))

In [164]: timeit df.dot(weight)
100 loops, best of 3: 5.14 ms per loop

In [165]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 41.8 ms per loop
So, dot is faster and more readable.

NOTE: If any of your data contain NaNs then you should not use dot you should use the multiply-and-sum method. dot cannot handle NaNs since it is just a thin wrapper around numpy.dot() (which doesn't handle NaNs).

                        这篇关于如何计算 pandas 中一行中所有元素的加权和？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何计算 pandas 中一行中所有元素的加权和？ [英] How to compute weighted sum of all elements in a row in pandas?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何计算 pandas 中一行中所有元素的加权和？ [英] How to compute weighted sum of all elements in a row in pandas?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭