- [R列追加到数据集命名错误 [英] R Appending Columns to Dataset Misnamed

查看:136
本文介绍了 - [R列追加到数据集命名错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:净度

当我追加一个新列到现有data.frame,列的标题是不正确的。在summary.myData,最后两列测量和办法应该说是加和减分别。

这是并列的另外一个问题我有,我在那里询问有关如何在TK / R GUI我工作的正确引用列。​​

家长问

myData的:

 组小组措施
1 A 1 0.234213
2 A 1 0.046248
3 A 1 0.391376
4 A 2 0.911849
5 A 2 0.729955
6 A 2 0.991110
7 A 2 0.378422
8 A 3 0.898037
9 A 3 0.258884
10 A 3 NA
11 A 3 0.057631
12 A 3 0.745202
13 A 3 0.121376
14 B 1 0.385198
15架B 1 0.484399
16架B 1 0.115034
17 b 1分配0.073629
18 b 1分配0.456150
19 B 2 0.336108
20 B 2 0.845458
21 B 2 0.267494
22 B 3 0.536123
23 B 3 1.331731
24 B 3 0.505114
25 B 3 0.843348
26 B 3 0.827932
27 B 3 0.813351
28℃1 0.095587
29 C 1 0.158822
30℃1 0.392376
31 C 1 0.284625
32 C 2 0.898819
33 C 2 0.743428
34 C 2 0.298989
35 C 2 0.423961
36 C 3 0.868351
37℃3 0.181547
38℃3 1.146131
39℃3 0.234941

附加脚本:

  summary.myData< -summarySE(MYDATA的,measurevar =膏(tx.choice1),groupvars =膏(tx.choice2),conf.interval = 0.95,na​​.rm = TRUE ,.drop = FALSE)
  summary.myData $加< -summary.myData [3] -summary.myData [6]
  summary.myData $减去&下; -summary.myData [3] + summary.myData [6]

结果:

  N组测量SD SE CI评估与衡量
1一个12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862


解决方案

您正在运行到的问题是,您分配 $加 $减去来data.frames,而不是原子的载体。所以打印时,R是表示嵌入data.frame列名('测量'在这两种情况下),而不是列表组件的名称('加''减')。

  STR(summary.myData);
##'data.frame':3 OBS。 8变量:
## $组:系数W / 3水平A,B,C:1 2 3
## $ N:民12 14 12
## $测量:NUM 0.48 0.559 0.477
## $ SD:NUM 0.354 0.341 0.347
## $ SE:NUM 0.1022 0.0912 0.1
## $ CI:NUM 0.225 0.197 0.22
## $加:'data.frame':3 OBS。 1个变量:
## .. $测量:NUM 0.255 0.362 0.257
## $减:'data.frame':3 OBS。 1个变量:
## .. $测量:NUM 0.705 0.756 0.697
summary.myData;
## N组测量SD SE CI评估与衡量
## 1一个12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862

替换为分配

  summary.myData $加<  -  summary.myData [3] -summary.myData [6];
summary.myData $减去&下; - summary.myData [1,3] + summary.myData [6];

然后你得到:

  STR(summary.myData);
##'data.frame':3 OBS。 8变量:
## $组:系数W / 3水平A,B,C:1 2 3
## $ N:民12 14 12
## $测量:NUM 0.48 0.559 0.477
## $ SD:NUM 0.354 0.341 0.347
## $ SE:NUM 0.1022 0.0912 0.1
## $ CI:NUM 0.225 0.197 0.22
## $加:NUM​​ 0.255 0.362 0.257
## $减:NUM 0.705 0.756 0.697
summary.myData;
## N组测量SD SE CI正负
## 1一个12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2 B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3 C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862


这里的关键是不同的索引风格。当您使用1D索引,你实际上治疗data.frame作为一个列表(它在内部是),所以索引操作返回指定列表中的组件,仍然归类为data.frame。当您使用2D索引,索引你单独的行和列,它允许你提取data.frame的2D子表。但是,当你只指定一列,默认行为(降= T )是列返回为原子向量,而作为一列数据比。帧。你可以用下滑更改此= F

  summary.myData [3];
##措施
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
summary.myData [1,3];
## [1] 0.4803586 0.5586478 0.4772981
summary.myData [,3,滴= F]
##措施
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981

Edit: Clarity

When I append a new column to a existing data.frame, the title of the columns are incorrect. In summary.myData, the last two columns "Measure" and "Measure" should say "plus" and "minus" respectively.

This is tied in with another question I had, where I ask about how to correctly reference a column in a Tk/R GUI I am working on.

Parent Question

myData:

   Group Subgroup  Measure
1      A        1 0.234213
2      A        1 0.046248
3      A        1 0.391376
4      A        2 0.911849
5      A        2 0.729955
6      A        2 0.991110
7      A        2 0.378422
8      A        3 0.898037
9      A        3 0.258884
10     A        3       NA
11     A        3 0.057631
12     A        3 0.745202
13     A        3 0.121376
14     B        1 0.385198
15     B        1 0.484399
16     B        1 0.115034
17     B        1 0.073629
18     B        1 0.456150
19     B        2 0.336108
20     B        2 0.845458
21     B        2 0.267494
22     B        3 0.536123
23     B        3 1.331731
24     B        3 0.505114
25     B        3 0.843348
26     B        3 0.827932
27     B        3 0.813351
28     C        1 0.095587
29     C        1 0.158822
30     C        1 0.392376
31     C        1 0.284625
32     C        2 0.898819
33     C        2 0.743428
34     C        2 0.298989
35     C        2 0.423961
36     C        3 0.868351
37     C        3 0.181547
38     C        3 1.146131
39     C        3 0.234941

Append script:

  summary.myData<-summarySE(myData, measurevar=paste(tx.choice1), groupvars=paste(tx.choice2),conf.interval=0.95,na.rm=TRUE,.drop=FALSE)
  summary.myData$plus<-summary.myData[3]-summary.myData[6]
  summary.myData$minus<-summary.myData[3]+summary.myData[6]

Result:

  Group  N   Measure        sd         se        ci   Measure   Measure
1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862

解决方案

The problem you're running into is that you've assigned $plus and $minus to data.frames, rather than atomic vectors. So when printing, R is showing the column name in the embedded data.frame ('Measure' in both cases), rather than the name of the list component ('plus' and 'minus').

str(summary.myData);
## 'data.frame': 3 obs. of  8 variables:
##  $ Group  : Factor w/ 3 levels "A","B","C": 1 2 3
##  $ N      : num  12 14 12
##  $ Measure: num  0.48 0.559 0.477
##  $ sd     : num  0.354 0.341 0.347
##  $ se     : num  0.1022 0.0912 0.1
##  $ ci     : num  0.225 0.197 0.22
##  $ plus   :'data.frame':  3 obs. of  1 variable:
##   ..$ Measure: num  0.255 0.362 0.257
##  $ minus  :'data.frame':  3 obs. of  1 variable:
##   ..$ Measure: num  0.705 0.756 0.697
summary.myData;
##   Group  N   Measure        sd         se        ci   Measure   Measure
## 1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862

Replace the assignments with

summary.myData$plus <- summary.myData[,3]-summary.myData[,6];
summary.myData$minus <- summary.myData[,3]+summary.myData[,6];

Then you get:

str(summary.myData);
## 'data.frame': 3 obs. of  8 variables:
##  $ Group  : Factor w/ 3 levels "A","B","C": 1 2 3
##  $ N      : num  12 14 12
##  $ Measure: num  0.48 0.559 0.477
##  $ sd     : num  0.354 0.341 0.347
##  $ se     : num  0.1022 0.0912 0.1
##  $ ci     : num  0.225 0.197 0.22
##  $ plus   : num  0.255 0.362 0.257
##  $ minus  : num  0.705 0.756 0.697
summary.myData;
##   Group  N   Measure        sd         se        ci      plus     minus
## 1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862


The key here is the different indexing style. When you use 1D indexing, you're actually treating the data.frame as a list (which it is internally), and so the index operation returns the specified list components, still classed as a data.frame. When you use 2D indexing, you index the rows and columns separately, which allows you to extract a 2D "subtable" of the data.frame. But when you only specify one column, the default behavior (drop=T) is for the column to be returned as an atomic vector, rather than as a one-column data.frame. You can change this with drop=F.

summary.myData[3];
##     Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
summary.myData[,3];
## [1] 0.4803586 0.5586478 0.4772981
summary.myData[,3,drop=F];
##     Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981

这篇关于 - [R列追加到数据集命名错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆