如何使用R查找子集的均值? [英] How to find mean for subset using R?
问题描述
使用R中的mtcars预先安装的数据集,我试图查找仅适用于梅赛德斯汽车的"mpg"变量的平均值.我是R的新手,自己学习.我使用以下方法算出了所有汽车的平均mpg:
Using the pre-installed dataset in R, mtcars, I'm trying to find the mean of the "mpg" variable for only Mercedes cars. I am new to R and learning on my own. I've figured out the average for mpg of all cars using the following:
read.csv("mtcars.csv") 平均值(mtcars $ mpg)
read.csv ("mtcars.csv") mean(mtcars$mpg)
我想到了使用GROUP BY之类的东西来仅对梅赛德斯"汽车进行分组,但似乎无法弄清楚.我敢肯定这真的很简单,所以我有点沮丧,我不知道下一步该怎么做....
I thought of using something like a GROUP BY, to group only the 'Mercedes cars, but can't seem to figure it out. I'm sure it's really simple so I'm a little frustrated I'm not seeing what to do here next....
文件的外观如下: https://gist.github.com/seankross/a412dfbd88b3db70b74b
推荐答案
在基数R中,mtcars
是内置数据帧.您可以在控制台中键入mtcars
进行查看.
In base R, mtcars
is a built-in data frame. You can type mtcars
in the console to view it.
在这里打印mtcars
数据框的前10行.
Here I printed the first 10 rows of the mtcars
data frame.
head(mtcars, 10)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
您需要的信息(模型)存储在行名称中.要访问该信息,我们可以使用rownames
函数.
The information you need, the model, is stored in the row names. To access that information, we can use the rownames
function.
rownames(mtcars)
# [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
# [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
# [7] "Duster 360" "Merc 240D" "Merc 230"
# [10] "Merc 280" "Merc 280C" "Merc 450SE"
# [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
# [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
# [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
# [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
# [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
# [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
# [31] "Maserati Bora" "Volvo 142E"
下一步,我们需要过滤行名称,以查看是否有任何匹配"Merc"的元素.我们可以使用grepl
来实现,如果匹配则返回逻辑向量.这里的"^ Merc"表示以"Merc"开头的字符串.
The next thing we need to do is filter the row names to see if there are any elements match "Merc". We can use grepl
to achieve this, which returns a logical vector if there is a match. Here "^Merc" means to capture string with a beginning in "Merc".
grepl("^Merc", rownames(mtcars))
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
# [14] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [27] FALSE FALSE FALSE FALSE FALSE FALSE
最后,我们可以使用逻辑向量对mtcars
数据帧进行子集化.子集之后,我们可以计算子集mpg
的平均值.
Finally, we can use the logical vector to subset the mtcars
data frame. After the subset, we can calculate the average of mpg
of the subset.
mtcars_merc <- mtcars[grepl("^Merc", rownames(mtcars)), ]
mean(mtcars_merc$mpg)
# [1] 19.01429
这篇关于如何使用R查找子集的均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!