根据r中列的值添加遗漏值 [英] add missed value based on the value of the column in r

查看:286
本文介绍了根据r中列的值添加遗漏值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  vector1 < -  
data.frame(
名称=a,
age= 10,
fruit= c(orange,cherry,apple),
count= c (b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b) =b,
age= 33,
fruit= c(apple,mango),
count= c(1,1),
tag= c(2,2)

vector3 < -
data.frame(
name=c,
age= 58,
fruit= c(cherry,apple),
count= c(1,1),
tag= c (1,1)


list < - list(vector1,vector2,vector3)
print(list)

这是我的测试:

 默认值< c(cherry,
orange,
apple,
mango)

)){
#print(list [[num]])

list [[num]]< - rbind(
list [[num]],
data.frame(
name= list [[num]] $ name,
age= list [[num]] $ age,
fruit= setdiff(default ,list [[num]] $ fruit),#add missed value
count= 0,
tag= 1#未找到解决方案



print(paste0(--------------,num,--------))
print(list)
}
#print(list)

我试图找到哪个水果在数据框中丢失,果实基于标签的值。例如,在第一个数据框中,有标签1和2.如果标签1的值没有默认水果,例如苹果和香蕉期望格式如下所示:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ]
名称年龄水果计数标签
1 a 10橙色1 1
2 a 10樱桃1 1
3 a 10苹果1 2
4 a 10芒果0 1
5 a 10苹果0 1
6 a 10芒果0 2
7 a 10橙色0 2
8 a 10樱桃0 2

当我检查循环的过程时,我也发现第一个循环加了芒果3次我不明白为什么它不能一次性添加遗漏的值。总体输出如下:

$ $ $ $ $ $ $ $ $ $ $ [$ 1]]
名称年龄水果计数标签
1 a 10橙色1 1
2 a 10樱桃1 1
3 a 10苹果1 2
4 a 10芒果0 1
5 a 10芒果0 1
6 a 10芒果0 1

[[2]]
名称年龄水果计数标记
1 b 33苹果1 2
2 b 33芒果1 2
3 b 33樱桃0 1
4 b 33橙色0 1

[[3]]
名称年龄水果计数标签
1 c 58樱桃1 1
2 c 58苹果1 1
3 c 58橙色0 1
4 c 58 mango 0 1

有人帮我,提供简单的方法或其他方法?我应该使用sqldf函数来添加0值吗?这是一个简单的方法来解决我的问题?解决方案

解决方案

显示问题的标记'dplyr'rel =tag> dplyr 和 tidyr 。我们可以使用 complete 展开数据框,并将填充值指定为0到 count



请注意,我将列表名从 list 更改为 fruit_list ,因为它是在R中使用保留字来命名对象是一种不好的做法。另请注意,当我创建示例数据框时,我设置了 stringsAsFactors = FALSE ,因为我不想创建因子列。最后,我使用 lapply 来代替for循环来遍历列表元素。

  library(dplyr)
library(tidyr)

fruit_list2< - lapply fruit_list,function(x){
x2 < - x%>%
complete(name,age,fruit = default,tag = c(1,2),fill = list(count = 0) )%>%
select(name,age,fruit,count,tag)%>%
arrange(tag,fruit)%>%
as.data.frame( )
return(x2)
})

fruit_list2
#[[1]]
#年龄水果计数标签
#1 a 10苹果0 1
#2 a 10樱桃1 1
#3 a 10芒果0 1
#4 a 10橙色1 1
#5 a 10苹果1 2
#6 a 10樱桃0 2
#7 a 10芒果0 2
#8 a 10橙色0 2

#[[2]]
#名称年龄水果计数标记
#1 b 33苹果0 1
#2 b 33樱桃0 1
#3 b 33芒果0 1
#4 b 33橙色0 1
#5 b 33苹果1 2
#6 b 33樱桃0 2
#7 b 33芒果1 2
#8 b 33橙色0 2

#[[3]]
#年龄水果计数标签
# 1 c 58苹果1 1
#2 c 58樱桃1 1
#3 c 58芒果0 1
#4 c 58橙色0 1
#5 c 58苹果0 2
#6 c 58樱桃0 2
#7 c 58芒果0 2
#8 c 58橙色0 2

DATA

  vector1 < -  
data.frame(
name=a,
age= 10,
fruit= c(orange,cherry,apple) ,
count= c(1,1,1),
tag= c(1,1,2),
stringsAsFactors = FALSE

vector2 < -
data.frame(
name=b,
age= 33,
fruit= c(apple,芒果),
count= c(1,1),
tag= c(2,2),
stringsAsFactors = FALSE

vector3 < -
data.frame(
name=c,
age= 58,
fruit= c(cherry,apple),
count= c(1,1),
tag c(1,1),
stringsAsFactors = FALSE


fruit_list< - list(vector1,vector2,vector3)

default< - c(cherry,orange,apple,mango)


This is my sample dataset:

   vector1 <-
      data.frame(
        "name" = "a",
        "age" = 10,
        "fruit" = c("orange", "cherry", "apple"),
        "count" = c(1, 1, 1),
        "tag" = c(1, 1, 2)
      )
    vector2 <-
      data.frame(
        "name" = "b",
        "age" = 33,
        "fruit" = c("apple", "mango"),
        "count" = c(1, 1),
        "tag" = c(2, 2)
      )
    vector3 <-
      data.frame(
        "name" = "c",
        "age" = 58,
        "fruit" = c("cherry", "apple"),
        "count" = c(1, 1),
        "tag" = c(1, 1)
      )

    list <- list(vector1, vector2, vector3)
    print(list)

This is my test:

default <- c("cherry",
       "orange",
       "apple",
       "mango")

for (num in 1:length(list)) {
  #print(list[[num]])

  list[[num]] <- rbind(
    list[[num]],
    data.frame(
      "name" = list[[num]]$name,
      "age" = list[[num]]$age,
      "fruit" = setdiff(default, list[[num]]$fruit),#add missed value
      "count" = 0,
      "tag" = 1 #not found solutions
    )
  )

  print(paste0("--------------", num, "--------"))
  print(list)
}
#print(list)

I'm trying to find which fruit miss in the data frame and the fruit is based on the value of the tag.For example, in the first data frame, there are tags 1 and 2.If the value of tag 1 does not have the default fruit such as apple and banana, the missed default fruit will be added to 0 to the data frame.The expectation format likes the following:

[[1]]
  name age  fruit count tag
1    a  10 orange     1   1
2    a  10 cherry     1   1
3    a  10  apple     1   2
4    a  10  mango     0   1
5    a  10  apple     0   1
6    a  10  mango     0   2
7    a  10  orange    0   2
8    a  10  cherry    0   2

When I check the process of the loop, I also find that the first loop adds mango 3 times and I don't find the reason why it cannot add the missed value at one time.The overall output likes the following:

[[1]]
  name age  fruit count tag
1    a  10 orange     1   1
2    a  10 cherry     1   1
3    a  10  apple     1   2
4    a  10  mango     0   1
5    a  10  mango     0   1
6    a  10  mango     0   1

[[2]]
  name age  fruit count tag
1    b  33  apple     1   2
2    b  33  mango     1   2
3    b  33 cherry     0   1
4    b  33 orange     0   1

[[3]]
  name age  fruit count tag
1    c  58 cherry     1   1
2    c  58  apple     1   1
3    c  58 orange     0   1
4    c  58  mango     0   1

Does anyone help me and provides simple methods or other ways? Should I use the sqldf function to add 0 value?Is this a simple way to solve my problems?

解决方案

A solution using and . We can use complete to expand the data frame and specify the fill values as 0 to count.

Notice that I changed your list name from list to fruit_list because it is a bad practice to use reserved words in R to name an object. Also notice that when I created the example data frame I set stringsAsFactors = FALSE because I don't want to create factor columns. Finally, I used lapply instead of for-loop to loop through the list elements.

library(dplyr)
library(tidyr)

fruit_list2 <- lapply(fruit_list, function(x){
  x2 <- x %>%
    complete(name, age, fruit = default, tag = c(1, 2), fill = list(count = 0)) %>%
    select(name, age, fruit, count, tag) %>%
    arrange(tag, fruit) %>%
    as.data.frame()
  return(x2)
})

fruit_list2
# [[1]]
#   name age  fruit count tag
# 1    a  10  apple     0   1
# 2    a  10 cherry     1   1
# 3    a  10  mango     0   1
# 4    a  10 orange     1   1
# 5    a  10  apple     1   2
# 6    a  10 cherry     0   2
# 7    a  10  mango     0   2
# 8    a  10 orange     0   2
# 
# [[2]]
#   name age  fruit count tag
# 1    b  33  apple     0   1
# 2    b  33 cherry     0   1
# 3    b  33  mango     0   1
# 4    b  33 orange     0   1
# 5    b  33  apple     1   2
# 6    b  33 cherry     0   2
# 7    b  33  mango     1   2
# 8    b  33 orange     0   2
# 
# [[3]]
#   name age  fruit count tag
# 1    c  58  apple     1   1
# 2    c  58 cherry     1   1
# 3    c  58  mango     0   1
# 4    c  58 orange     0   1
# 5    c  58  apple     0   2
# 6    c  58 cherry     0   2
# 7    c  58  mango     0   2
# 8    c  58 orange     0   2

DATA

vector1 <-
  data.frame(
    "name" = "a",
    "age" = 10,
    "fruit" = c("orange", "cherry", "apple"),
    "count" = c(1, 1, 1),
    "tag" = c(1, 1, 2),
    stringsAsFactors = FALSE
  )
vector2 <-
  data.frame(
    "name" = "b",
    "age" = 33,
    "fruit" = c("apple", "mango"),
    "count" = c(1, 1),
    "tag" = c(2, 2),
    stringsAsFactors = FALSE
  )
vector3 <-
  data.frame(
    "name" = "c",
    "age" = 58,
    "fruit" = c("cherry", "apple"),
    "count" = c(1, 1),
    "tag" = c(1, 1),
    stringsAsFactors = FALSE
  )

fruit_list <- list(vector1, vector2, vector3)

default <- c("cherry", "orange", "apple", "mango")

这篇关于根据r中列的值添加遗漏值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆