在 r 中使用 lapply 函数和列表 [英] using lapply function and list in r

查看:38
本文介绍了在 r 中使用 lapply 函数和列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))my.list <- 列表(d1,d2,d3)for (i in 1:3) {表<- lapply(my.list, function(data, count) {sql <-#sqldf(粘贴0("select *,count(col_one) from data where col_one = ",计数,按 col_one 分组")#)打印(SQL)},计数 = i)}

输出:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"[1] "select *,count(col_one) from data where col_one = 1 group by col_one"[1] "select *,count(col_one) from data where col_one = 1 group by col_one"[1] "select *,count(col_one) from data where col_one = 2 group by col_one"[1] "select *,count(col_one) from data where col_one = 2 group by col_one"[1] "select *,count(col_one) from data where col_one = 2 group by col_one"[1] "select *,count(col_one) from data where col_one = 3 group by col_one"[1] "select *,count(col_one) from data where col_one = 3 group by col_one"[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

期望:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"[1] "select *,count(col_one) from data where col_one = 2 group by col_one"[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

我该如何改进?我希望运行 SQL 来创建我想要的新数据集,但它没有成功,我可以指定知道与 SQL 语句相关的列表的索引.还有其他简单的方法吗?

我尝试了其中一种方法.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))d2 <- data.frame(col_one = c(3, 2, 1), col_two = c(6, 5, 4))d3 <- data.frame(col_one = c(7, 2, 1), col_two = c(8, 5, 4))my.list <- 列表(d1,d2,d3)seq_along(x)#for (i in 1:3) {表<- lapply(seq_along(my.list), function(index) {sql <-sqldf(粘贴0("select *,count(col_one) from my.list where col_one = ",索引,按 col_one 分组"))打印(SQL)})#}

输出:

[1] "select *,count(col_one) from my.list where col_one = 1 group by col_one"[1] "select *,count(col_one) from my.list where col_one = 2 group by col_one"[1] "select *,count(col_one) from my.list where col_one = 3 group by col_one"

但是,它不会找到运行 SQL 的数据集.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))my.list <- 列表(d1,d2,d3)表<-映射(函数(数据,计数){sql <-sqldf(粘贴0("select *,count(col_one) from data where col_one = ",计数,按 col_one 分组"))打印(SQL)}, 我的.list, 1)

解决方案

如果我理解正确,OP 想要为 col_onemy 中的每个 data.frames 创建列联表.list,即他想知道值 1、2 或 3 在每个 data.frame 中的 col_one 中出现了多少次.

我的回答中所述,OP 的另一个问题 的建议G.Grothendieck,在一个大的 data.table 中结合具有相同结构的 data.frames 几乎总是比将它们分开在一个列表中更好.顺便说一句,OP 寻求帮助还有第三个 问题(如何使用 sqldf 循环数据帧?")带有 data.frames 列表.

为了将 data.frames 合并到一个大的 data.table 中,使用了 rbindlist() 函数.请注意,添加的 id 列 df 标识了每一行的原始 data.frame.

library(data.table)rbindlist(my.list, idcol = "df")

<块引用>

 df col_one col_two1:1 1 42:1 2 53:1 3 64:2 1 65:2 1 56: 2 1 47: 3 7 88: 3 1 59:3 1 4

现在我们可以轻松计算聚合:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]

<块引用>

 df col_one col_two count_col_one1:1 1 4 12:1 2 5 13:1 3 6 14:2 1 6 35:2 1 5 36:2 1 4 37: 3 7 8 18: 3 1 5 29:3 1 4 2

data.table 语句使用特殊符号 为每个df 计算col_one 中每个单独值的出现次数.N 并通过 dfcol_one 分组.

在这个问题中,OP 只要求计算 col_one 中 1、2 或 3 的出现次数.如果这确实是有意的,则需要删除 7 的值.这可以通过过滤结果来实现:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][col_one %in% 1:3]

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)

for (i in 1:3) {
  table<- lapply(my.list, function(data, count) {
    sql <-
      #sqldf(
        paste0(
          "select *,count(col_one) from data where col_one = ",
          count," group by col_one"
        )
      #)
    print(sql)
  },
  count = i)
}

output:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

expectation:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

How could I improve? I hope to run SQL to create a new dataset what I want but it is not successful and I can specify to know the index of the list which related to the SQL statement. Is there another simple method?

I have tried one of the methods.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(3, 2, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 2, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
seq_along(x)
#for (i in 1:3) {
  table<- lapply(seq_along(my.list), function(index) {
    sql <-
      sqldf(
        paste0(
          "select *,count(col_one) from my.list where col_one = ",
          index," group by col_one"
        )
      )
    print(sql)
  })
#}

output:

[1] "select *,count(col_one) from my.list where col_one = 1 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 2 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 3 group by col_one"

however, it will not find the dataset to run SQL.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
table<- mapply(function(data, count) {
  sql <-
    sqldf(
    paste0(
      "select *,count(col_one) from data where col_one = ",
      count," group by col_one"
    )
  )
  print(sql)
}, my.list, 1
)

解决方案

If I understood correctly, the OP wants to create contingency tables for col_one for each of the data.frames in my.list, i.e., he wants to know how many times each of the values 1, 2, or 3 appear in col_one in each data.frame.

As explained in my answer to another question of the OP and as suggested by G. Grothendieck, it is almost always better to combine data.frames with identical structure in a large data.table than to keep them separate in a list. BTW, there is also a third question ("how to loop the dataframe using sqldf?") by the OP asking for help with a list of data.frames.

To combine the data.frames in a large data.table, the rbindlist() function is used. Note that the added id column df identifies the originating data.frame of each row.

library(data.table)
rbindlist(my.list, idcol = "df")

   df col_one col_two
1:  1       1       4
2:  1       2       5
3:  1       3       6
4:  2       1       6
5:  2       1       5
6:  2       1       4
7:  3       7       8
8:  3       1       5
9:  3       1       4

Now we can easily compute the aggregates:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]

   df col_one col_two count_col_one
1:  1       1       4             1
2:  1       2       5             1
3:  1       3       6             1
4:  2       1       6             3
5:  2       1       5             3
6:  2       1       4             3
7:  3       7       8             1
8:  3       1       5             2
9:  3       1       4             2

This data.table statement counts the occurrences of each individual value in col_one for each df by using the special symbol .N and by grouping by df and col_one.

In the question, the OP has only asked to count occurrences of 1, 2, or 3 in col_one. If this really is intended, the value of 7 needs to be removed. This can be accomplished by filtering the result:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
  col_one %in% 1:3]

这篇关于在 r 中使用 lapply 函数和列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆