Python numpy或Pandas等效于R函数scan() [英] Python numpy or pandas equivalent of the R function sweep()

查看:226
本文介绍了Python numpy或Pandas等效于R函数scan()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R函数sweep()numpypandas等效项是什么?

What is numpy or pandas equivalent of the R function sweep()?

要详细说明:在R中,假设我们有一个系数向量(例如beta-数字类型)和一个数组(例如数据-20x5数字类型).我想将向量叠加在数组的每一行上,并乘以对应的元素.然后返回结果(20x5)数组,我可以使用sweep()来实现.在示例R代码下面找到.

To elaborate: in R lets say we have a coefficient vector (say beta - numeric type) and an array (say data - 20x5 numeric type). I want to superimpose the vector on each row of the array and multiply the corresponding elements. And then return the resultant (20x5) array I could achieve this using sweep(). Find below the sample R code.

beta <-  c(10, 20, 30, 40)
data <- array(1:20,c(5,4))
sweep(data,MARGIN=2,beta,`*`)
#---------------
 > data
      [,1] [,2] [,3] [,4]
 [1,]    1    6   11   16
 [2,]    2    7   12   17
 [3,]    3    8   13   18
 [4,]    4    9   14   19
 [5,]    5   10   15   20

 > beta
 [1] 10 20 30 40

 > sweep(data,MARGIN=2,beta,`*`)
      [,1] [,2] [,3] [,4]
 [1,]   10  120  330  640
 [2,]   20  140  360  680
 [3,]   30  160  390  720
 [4,]   40  180  420  760
 [5,]   50  200  450  800

我听说过关于Python中的numpypandas的令人兴奋的事情,而且似乎有很多R类似的命令.使用这些库实现相同目的的最快方法是什么?实际数据具有数百万行和约50列. Beta向量当然符合数据.

I have heard exciting things about numpy and pandas in Python and it seems to have a lot of R like commands. What would be the fastest way to achieve the same using these libraries? The actual data has millions of rows and around 50 columns. The beta vector is of course conformable with data.

推荐答案

Pandas也有一个apply方法,适用于R的扫频使用. (请注意,MARGIN参数与许多熊猫函数中的axis参数等效",只是接受值0和1而不是1和2 ).

Pandas has an apply method too, apply being what R's sweep uses under the hood. (Note that the MARGIN argument is "equivalent" to the axis argument in many pandas functions, except that it takes values 0 and 1 rather than 1 and 2).

In [11]: np.random.seed = 1

In [12]: beta = pd.Series(np.random.randn(5))

In [13]: data = pd.DataFrame(np.random.randn(20, 5))

您可以将apply与针对每行调用的函数一起使用:

You can use an apply with a function which is called against each row:

In [14]: data.apply(lambda row: row * beta, axis=1)

注意:轴= 0将应用于每个列,这是默认设置,因为数据按列存储,因此按列操作更有效.

但是,在这种情况下,很容易使更快(并且更具可读性)进行矢量化,只需通过逐行相乘即可:

However, in this case it's easy to make significantly faster (and more readable) to vectorize, simply by multiplying row-wise:

In [21]: data.apply(lambda row: row * beta, axis=1).head()
Out[21]:
          0         1         2         3         4
0 -0.024827 -1.465294 -0.416155 -0.369182 -0.649587
1  0.026433  0.355915 -0.672302  0.225446 -0.520374
2  0.042254 -1.223200 -0.545957  0.103864 -0.372855
3  0.086367  0.218539 -1.033671  0.218388 -0.598549
4  0.203071 -3.402876  0.192504 -0.147548 -0.726001

In [22]: data.mul(beta, axis=1).head()  # just show first few rows with head
Out[22]:
          0         1         2         3         4
0 -0.024827 -1.465294 -0.416155 -0.369182 -0.649587
1  0.026433  0.355915 -0.672302  0.225446 -0.520374
2  0.042254 -1.223200 -0.545957  0.103864 -0.372855
3  0.086367  0.218539 -1.033671  0.218388 -0.598549
4  0.203071 -3.402876  0.192504 -0.147548 -0.726001

注意:与使用*相比,它更健壮/允许更多控制.

Note: this is slightly more robust / allows more control than using *.

您可以在numpy中执行相同的操作(即此处为data.values),或者直接相乘,这样会更快,因为它不必担心数据对齐问题,或者可以使用

You can do the same in numpy (ie data.values here), either multiplying directly, this will be faster as it doesn't worry about data-alignment, or using vectorize rather than apply.

这篇关于Python numpy或Pandas等效于R函数scan()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆