data.frame中的错误,未使用的参数 [英] Error in data.frame , unused argument
问题描述
我有这个数据框:
> head(merged.tables)
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType
1 1 5 2015-07-31 5263 555 1 1 0 1 c
2 1 6 2013-01-12 4952 646 1 0 0 0 c
3 1 5 2014-01-03 4190 552 1 0 0 1 c
4 1 3 2014-12-03 6454 695 1 1 0 0 c
5 1 3 2013-11-13 3310 464 1 0 0 0 c
6 1 7 2013-10-27 0 0 0 0 0 0 c
Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2
1 a 1270 9 2008 0
2 a 1270 9 2008 0
3 a 1270 9 2008 0
4 a 1270 9 2008 0
5 a 1270 9 2008 0
6 a 1270 9 2008 0
Promo2SinceWeek Promo2SinceYear PromoInterval
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
然后我想提取一个显示平均销售价向量的数据框,当开仓价等于1 并按 StoreType 。
之所以使用此命令,是因为它是我认为最胖的命令:
Then I want to extract a dataframe showing the average of Sales vector when Open equal to 1 and by StoreType. I used this command because it's the fatest I think:
merged.tables[StateHoliday==1,mean(na.omit(Sales)),by=StoreType]
但是我遇到了这个错误:
But I got this error:
[。data.frame(merged.tables,StateHoliday == 0,
mean(na.omit(Sales)), :未使用的参数(按= StoreType)
Error in [.data.frame(merged.tables, StateHoliday == 0, mean(na.omit(Sales)), : unused argument (by = StoreType)
我正在搜索,但没有得到此错误的答案,感谢您的帮助!
I search but I didn't get an answer to this error. Thanks for your help!
推荐答案
概述
应用函数的方法有很多到数据框中的一组值。我给出两个:
Overview
There are lots of ways of applying a function to a group of values in your data frame. I present two:
- Using the
dplyr
package to arrange your data in a way that answers your question. - Using
tapply()
, which performs a function over a group of values.
可复制示例
对于每种商店类型,我都希望 Open
值等于1的那些商店的平均销售额。
Reproducible Example
For each store type, I want the average sales for those stores whose Open
value is equal to 1.
I首先显示 dplyr 方法的问题,然后显示 tapply 的问题。
I present the dplyr method first, followed by tapply.
注意:以下数据框仅来自OP中发布的内容。
# install necessary package
install.packages( pkgs = "dplyr" )
# load necessary package
library( dplyr )
# create data frame
merged.tables <-
data.frame(
Store = c( 1, 1, 1, 2, 2, 2 )
, StoreType = rep( x = c( "s", "m", "l" ) , times = 2)
, Sales = round( x = runif( n = 6, min = 3000, max = 6000 ) , digits = 0 )
, Open = c( 1, 1, 0, 0, 1, 1 )
, stringsAsFactors = FALSE
)
# view the data
merged.tables
# Store StoreType Sales Open
# 1 1 s 4608 1
# 2 1 m 4017 1
# 3 1 l 4210 0
# 4 2 s 4833 0
# 5 2 m 3818 1
# 6 2 l 3090 1
# dplyr method
merged.tables %>%
group_by( StoreType ) %>%
filter( Open == 1 ) %>%
summarise( AverageSales = mean( x = Sales , na.rm = TRUE ) )
# A tibble: 3 x 2
# StoreType AverageSales
# <chr> <dbl>
# 1 l 3090
# 2 m 3918
# 3 s 4608
# tapply method
# create the condition
# that 'Open' must be equal to one
Open.equals.one <- which( merged.tables$Open == 1 )
# apply the condition to
# both X and INDEX
tapply( X = merged.tables$Sales[ Open.equals.one ]
, INDEX = merged.tables$StoreType[ Open.equals.one ]
, FUN = mean
, na.rm = TRUE # just in case your data does have NA values in the `Sales` column, this removes them from the calculation
)
# l m s
# 3090.0 3917.5 4608.0
# end of script #
资源
以后,如果您需要更多条件,建议您查看其他相关的SO帖子,例如> 如何组合是否有多个条件可以使用或子集数据框? 和 为什么 [
比子集
好? 。
这篇关于data.frame中的错误,未使用的参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!