在R数据帧中没有值的情况下输入0 [英] Input 0 where there is no value in R data frame

查看:65
本文介绍了在R数据帧中没有值的情况下输入0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获得一个data.frame,如下所示,但包括每个主题的所有年份。我所做的这个操作按年份计算每个主题的项目数,但是当某年中没有任何项目时,它只是不会为该特定主题创建该行,并且在最终图形中为空白。谁能告诉我如何为没有价值的主题添加计数== 0的缺失年份?

I want to get a data.frame like the one below, but including all years per topic. This one I made counts the number of items by year for each topic but when there is no item in some year, it just doesn't create that row for that particular topic, and it's blank in the final graph. Could anyone please tell me how to add the missing year with Count == 0 for the topics that have no value?

dtd2 <- structure(list(Topic = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), .Label = c("Topic 1", 
"Topic 10", "Topic 11", "Topic 12", "Topic 2", "Topic 3", "Topic 4", 
"Topic 5", "Topic 6", "Topic 7", "Topic 8", "Topic 9"), class = "factor"), 
    Year = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 2L, 
    3L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
    8L, 9L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 6L, 7L, 8L, 
    9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("2011", 
    "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019"
    ), class = "factor"), Count = c(3L, 3L, 3L, 5L, 5L, 11L, 
    17L, 14L, 4L, 1L, 1L, 4L, 2L, 3L, 9L, 4L, 2L, 1L, 3L, 4L, 
    5L, 18L, 23L, 19L, 15L, 1L, 5L, 6L, 8L, 11L, 17L, 7L, 1L, 
    3L, 6L, 4L, 20L, 21L, 18L, 12L, 3L, 1L, 1L, 2L, 5L, 5L, 11L, 
    5L, 2L, 1L, 1L, 2L, 2L, 5L, 7L, 23L, 9L, 1L, 1L, 2L, 3L, 
    6L, 4L, 9L, 8L, 1L, 1L, 6L, 2L, 3L, 3L, 1L, 3L, 2L, 5L, 7L, 
    11L, 11L, 28L, 11L, 2L, 1L, 2L, 2L, 5L, 6L, 5L, 16L, 3L, 
    4L, 2L, 2L, 7L, 6L, 8L, 6L)), row.names = c(NA, -96L), class = "data.frame")

ggplot(dtd2, aes(x = Year, y = Count, colour = Topic, group = Topic)) + geom_point() + geom_line() + labs(x = "Year", y = NULL, title = "Timeline")


推荐答案

时间序列方法可能是

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:lubridate':
#> 
#>     interval, new_interval
#> The following object is masked from 'package:dplyr':
#> 
#>     id


dtd2 <- structure(list(Topic = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
  3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
  5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
  7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 
  10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
  11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), .Label = c("Topic 1", 
    "Topic 10", "Topic 11", "Topic 12", "Topic 2", "Topic 3", "Topic 4", 
    "Topic 5", "Topic 6", "Topic 7", "Topic 8", "Topic 9"), class = "factor"), 
  Year = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 2L, 
    3L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
    8L, 9L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 6L, 7L, 8L, 
    9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("2011", 
      "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019"
    ), class = "factor"), Count = c(3L, 3L, 3L, 5L, 5L, 11L, 
      17L, 14L, 4L, 1L, 1L, 4L, 2L, 3L, 9L, 4L, 2L, 1L, 3L, 4L, 
      5L, 18L, 23L, 19L, 15L, 1L, 5L, 6L, 8L, 11L, 17L, 7L, 1L, 
      3L, 6L, 4L, 20L, 21L, 18L, 12L, 3L, 1L, 1L, 2L, 5L, 5L, 11L, 
      5L, 2L, 1L, 1L, 2L, 2L, 5L, 7L, 23L, 9L, 1L, 1L, 2L, 3L, 
      6L, 4L, 9L, 8L, 1L, 1L, 6L, 2L, 3L, 3L, 1L, 3L, 2L, 5L, 7L, 
      11L, 11L, 28L, 11L, 2L, 1L, 2L, 2L, 5L, 6L, 5L, 16L, 3L, 
      4L, 2L, 2L, 7L, 6L, 8L, 6L)), row.names = c(NA, -96L), class = "data.frame")
tsibble2 <- dtd2 %>%
  mutate(Year = as_date(str_c(Year,"01",'01'))) %>% 
  as_tsibble(index = Year,key = Topic) %>%
  tsibble::fill_gaps(.full = TRUE) %>%
  group_by_key() %>% 
  index_by(year = Year %>% year) %>% 
  summarise(Count = Count %>% sum(na.rm = T)) %>% 
  as_tibble() %>% 
  mutate(year = year %>% as_factor())

tsibble2 %>% 
  ggplot() +
  aes(x = year,y = Count,color = Topic,group = Topic) +
  geom_line() +
  geom_point()

reprex软件包(v0.3.0)

Created on 2020-01-08 by the reprex package (v0.3.0)

这篇关于在R数据帧中没有值的情况下输入0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆