R:将字符串拆分为数字,然后将平均值作为数据框中的新列返回 [英] R: split string into numeric and return the mean as a new column in a data frame
问题描述
我有一个很大的数据框,其中的列是数字字符串,例如 1、2、3、4。我希望添加一个新列,作为这些数字的平均值。我已经建立了以下示例:
I have a large data frame with columns that are a character string of numbers such as "1, 2, 3, 4". I wish to add a new column that is the average of these numbers. I have set up the following example:
set.seed(2015)
library(dplyr)
a<-c("1, 2, 3, 4", "2, 4, 6, 8", "3, 6, 9, 12")
df<-data.frame(a)
df$a <- as.character(df$a)
现在我可以使用strsplit分割字符串并返回给定行的均值,其中[[1]]指定第一行。
Now I can use strsplit to split the string and return the mean for a given row where the [[1]] specifies the first row.
mean(as.numeric(strsplit((df$a), split=", ")[[1]]))
[1] 2.5
问题是当我尝试在数据中执行此操作时框架并引用行号我得到一个错误。
The problem is when I try to do this in a data frame and reference the row number I get an error.
> df2<- df %>%
+ mutate(index = row_number(),
+ avg = mean(as.numeric(strsplit((df$a), split=", ")
[[index]])))
Error in strsplit((df$a), split = ", ")[[1:3]] :
recursive indexing failed at level 2
有人可以解释此错误,为什么我不能使用变量编制索引?如果我用常量替换索引,那么它似乎可以工作,似乎不喜欢在其中使用变量。
Can anyone explain this error and why I cannot index using a variable? If I replace index with a constant it works, it seems to not like me using a variable there.
非常感谢!
推荐答案
您可以使用 sapply
遍历 strsplit $ c $返回的列表c>,处理每个列表元素:
You could use sapply
to loop through the list returned by strsplit
, handling each of the list elements:
sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5
这篇关于R:将字符串拆分为数字,然后将平均值作为数据框中的新列返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!