使用lapply函数并在r中列出 [英] using lapply function and list in r

查看:200
本文介绍了使用lapply函数并在r中列出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)

for (i in 1:3) {
  table<- lapply(my.list, function(data, count) {
    sql <-
      #sqldf(
        paste0(
          "select *,count(col_one) from data where col_one = ",
          count," group by col_one"
        )
      #)
    print(sql)
  },
  count = i)
}

输出:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

期望:

[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"

我该如何改善?我希望运行SQL来创建我想要的新数据集,但它没有成功,我可以指定知道与SQL语句相关的列表索引.还有另一种简单的方法吗?

How could I improve? I hope to run SQL to create a new dataset what I want but it is not successful and I can specify to know the index of the list which related to the SQL statement. Is there another simple method?

我尝试了其中一种方法.

I have tried one of the methods.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(3, 2, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 2, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
seq_along(x)
#for (i in 1:3) {
  table<- lapply(seq_along(my.list), function(index) {
    sql <-
      sqldf(
        paste0(
          "select *,count(col_one) from my.list where col_one = ",
          index," group by col_one"
        )
      )
    print(sql)
  })
#}

输出:

[1] "select *,count(col_one) from my.list where col_one = 1 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 2 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 3 group by col_one"

但是,它将找不到要运行SQL的数据集.

however, it will not find the dataset to run SQL.

d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
table<- mapply(function(data, count) {
  sql <-
    sqldf(
    paste0(
      "select *,count(col_one) from data where col_one = ",
      count," group by col_one"
    )
  )
  print(sql)
}, my.list, 1
)

推荐答案

如果我理解正确,那么OP希望为my.list中每个data.frames的col_one创建偶发表,也就是说,他想知道值1、2或3在每个data.frame的col_one中出现了多少次.

If I understood correctly, the OP wants to create contingency tables for col_one for each of the data.frames in my.list, i.e., he wants to know how many times each of the values 1, 2, or 3 appear in col_one in each data.frame.

我对的回答中所述,建议G.对于Grothendieck ,将大型数据表中具有相同结构的data.frame与组合在一起的总比将它们保持在列表中要好得多.顺便说一句,OP还提出了第三个问题(如何使用sqldf循环数据框?")带有data.frames列表.

As explained in my answer to another question of the OP and as suggested by G. Grothendieck, it is almost always better to combine data.frames with identical structure in a large data.table than to keep them separate in a list. BTW, there is also a third question ("how to loop the dataframe using sqldf?") by the OP asking for help with a list of data.frames.

要在大型data.table中组合data.frame,请使用rbindlist()函数.请注意,添加的id列df标识每行的原始data.frame.

To combine the data.frames in a large data.table, the rbindlist() function is used. Note that the added id column df identifies the originating data.frame of each row.

library(data.table)
rbindlist(my.list, idcol = "df")

   df col_one col_two
1:  1       1       4
2:  1       2       5
3:  1       3       6
4:  2       1       6
5:  2       1       5
6:  2       1       4
7:  3       7       8
8:  3       1       5
9:  3       1       4

现在,我们可以轻松地计算合计:

Now we can easily compute the aggregates:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]

   df col_one col_two count_col_one
1:  1       1       4             1
2:  1       2       5             1
3:  1       3       6             1
4:  2       1       6             3
5:  2       1       5             3
6:  2       1       4             3
7:  3       7       8             1
8:  3       1       5             2
9:  3       1       4             2

data.table语句通过使用特殊符号.N并按dfcol_one分组,对每个dfcol_one中每个单独值的出现进行计数.

This data.table statement counts the occurrences of each individual value in col_one for each df by using the special symbol .N and by grouping by df and col_one.

在问题中,OP仅要求对col_one中出现的1、2或3进行计数.如果确实要这样做,则需要删除7的值.这可以通过过滤结果来实现:

In the question, the OP has only asked to count occurrences of 1, 2, or 3 in col_one. If this really is intended, the value of 7 needs to be removed. This can be accomplished by filtering the result:

rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
  col_one %in% 1:3]

这篇关于使用lapply函数并在r中列出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆