如何为数据帧行的子集的列分配值 [英] How to assign values to a column for a subset of data frame rows

查看:59
本文介绍了如何为数据帧行的子集的列分配值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框很大,我正在尝试为特定子集的特定数据列分配值.

I have a large data frame and I am trying to assign values to a particular data column for specific subsets.

subset(P2Y12R_binding_summary,(SYSTEM=="4NTJ")&(VARIANT=="D294N"))
  SYSTEM VARIANT  MODEL EPSIN INP dE_water_free dE_ERR_water_free dE_water_periodic dE_ERR_water_periodic
1   4NTJ   D294N LVLSET     1   1       -42.155          29.28460           -42.205              29.52604
2   4NTJ   D294N LVLSET     1   2       -34.225          29.75176           -34.235              29.96571
3   4NTJ   D294N LVLSET    20   1       -65.163          40.62241           -65.163              40.52564
4   4NTJ   D294N LVLSET    20   2       -57.454          41.04459           -57.454              41.26962
5   4NTJ   D294N    SES     1   1       -23.406          30.56636           -23.335              30.75794
6   4NTJ   D294N    SES     1   2       -15.434          30.70035           -15.414              30.85944
7   4NTJ   D294N    SES    20   1       -64.351          40.65919           -64.350              40.51345
8   4NTJ   D294N    SES    20   2       -56.342          41.23456           -56.542              41.21865

现在假设我使用

P2Y12R_binding_summary$Ki_expt <- 0

我只想为与上面的子集相对应的行更新此列的值.

And I want to update values for this column for only the rows corresponding to the subset above.

尝试幼稚的方法失败:

>subset(P2Y12R_binding_summary,(SYSTEM=="4NTJ")&(VARIANT=="D294N"))$Ki_expt = 42.2

>subset(P2Y12R_binding_summary,(SYSTEM=="4NTJ")&(VARIANT=="D294N"))$Ki_expt <- 42.2

两者都产生错误消息:

Error in subset(P2Y12R_binding_summary, (SYSTEM == "4NTJ") & (VARIANT ==  : 
could not find function "subset<-"

有人知道这样做的适当方法吗?显然,使用for循环是可能的,但是这似乎很笨拙,而且可能会很慢(如以前的经验所示).

Does anyone know of the appropriate way to do this? Obviously, it would be possible with a for loop, but that seems rather klunky and would probably be quite slow (as previous experience seems to show).

推荐答案

如果需要考虑速度,我会考虑data.table(我通常还是会去那里看).

If speed is a concern I would look to data.table (I normally look there anyway).

library(data.table)
setDT(P2Y12R_binding_summary)[SYSTEM=="4NTJ" & VARIANT=="D294N",  Ki_expt := 42.2 ]

an Example using diamonds:

    library(data.table)
    dummydf  <- diamonds
    setDT(dummydf)[cut =="Premium" & color =="J",  example := 42.2 ]

dummydf[!is.na(example)]
     carat     cut color clarity depth table price    x    y    z example
  1:  0.30 Premium     J     SI2  59.3    61   405 4.43 4.38 2.61    42.2
  2:  1.00 Premium     J     SI2  62.3    58  2801 6.45 6.34 3.98    42.2
  3:  0.93 Premium     J     SI2  61.9    57  2807 6.21 6.19 3.84    42.2
  4:  1.17 Premium     J      I1  60.2    61  2825 6.90 6.83 4.13    42.2
  5:  0.33 Premium     J     VS1  62.8    58   557 4.41 4.38 2.76    42.2
 ---                                                                     
804:  1.01 Premium     J      I1  60.7    59  2602 6.42 6.39 3.89    42.2
805:  1.01 Premium     J     SI2  58.3    62  2683 6.49 6.43 3.77    42.2
806:  1.01 Premium     J     SI2  59.3    56  2683 6.51 6.45 3.84    42.2
807:  0.90 Premium     J     SI2  62.7    57  2717 6.09 6.06 3.80    42.2
808:  0.90 Premium     J     SI2  63.0    59  2717 6.14 6.11 3.86    42.2

请注意,您只设置一次DT().之后,只需使用dummydf [子集,LHS名称:= RHS值]调用您的DT

Note that you only setDT() once. after that just call your DT using dummydf[subsets, LHS name := RHS value]

这篇关于如何为数据帧行的子集的列分配值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆