R日期作为包含重复值的列名(需要保留原始日期) [英] R dates as column names containing duplicate values (need to retain original date)

查看:47
本文介绍了R日期作为包含重复值的列名(需要保留原始日期)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要整理的数据集.我用read.xlsx读入文件,标题中包含的是日期值,即使我在收集/传播数据时重复了这些日期值,也需要保留它们的值.

I have a dataset I'm trying to tidy up. I read in the file with read.xlsx, contained in the header is date values that I need to retain their values even when duplicated when I gather/spread the data.

数据集如下所示.excel中的日期读为数字(很好),问题是可能存在重复的日期(例如43693),我需要保留其原始值.

The data set looks like the below. The dates from excel read in as numbers (which is fine) the issue is that there can be duplicate dates (e.g. 43693) , which I need to keep their original values.

      Date        43693 43686 43686 43714 43693
1     Contract    111   222   333   444   555
2     Org1        NR    NB    NR    NB     P
3     Org2         P     P     P    NB    NR
4     Org3        NB    NB    NB    NB     P

当我尝试转换数据时,出现重复名称的错误.

When I try to transform the data I get the error of duplicate names.

最终,我正试图通过像这样的数据来获取数据,其中日期值保留所有重复项(例如43693)

Ultimately I'm trying to get by data shaped like this, where the date value retains any duplicates (e.g. 43693)

    Date        Contract              ORG     status
 1 43693            111              Org1     NR
 2 43493            555              Org1     P    
 3 43686            111              Org2     P    

以下是要测试的df示例:

Here is an example df to test on:

 df <- structure(
     list(
      Date = c("Contract", "Org1", "Org2", "Org3", "Org4"), 
      '12/16/18' = c("111", "pending", "complete", "complete", 
       "pending"), 
       '12/16/18' = c("222", "pending", "complete", "pending", 
         "complete"),
      '1/18/18' = c("222", "pending", "complete", "pending", 
     "complete") ), 
     class = "data.frame", 
   .Names = c("Date", "12/16/18", "12/16/18",'1/18/18'), 
    row.names = c(NA, -5L)
     )

推荐答案

您有两个标题行,这很混乱.我建议重新读取数据,跳过日期行,然后将日期行作为列名称的一部分合并.

You have two header rows, which is pretty messy. I'd recommend re-reading the data, skipping the date line, then incorporating the date line as part of the column names.

如果已经读取了数据,则可以尝试执行以下操作:

If you already have the data read in, you can try something like this:

library(data.table)
df2 <- setDT(df[-1, ])
setnames(df2, c("Org", paste(names(df), unlist(df[1, ], use.names = FALSE), sep = "_")[-1]))
# Current data
df2
#     Org 12/16/18_111 12/16/18_222 1/18/18_222
# 1: Org1      pending      pending     pending
# 2: Org2     complete     complete    complete
# 3: Org3     complete      pending     pending
# 4: Org4      pending     complete    complete

# melt and split
melt(df2, id.vars="Org")[, c("Date", "Contract") := tstrsplit(variable, "_")][, variable := NULL][]
#      Org    value     Date Contract
#  1: Org1  pending 12/16/18      111
#  2: Org2 complete 12/16/18      111
#  3: Org3 complete 12/16/18      111
#  4: Org4  pending 12/16/18      111
#  5: Org1  pending 12/16/18      222
#  6: Org2 complete 12/16/18      222
#  7: Org3  pending 12/16/18      222
#  8: Org4 complete 12/16/18      222
#  9: Org1  pending  1/18/18      222
# 10: Org2 complete  1/18/18      222
# 11: Org3  pending  1/18/18      222
# 12: Org4 complete  1/18/18      222

如果您确实想坚持使用 dplyr tidyr ,则为以上内容的翻译:

If you do want to stick with dplyr and tidyr, here's a translation of the above:

library(dplyr)
library(tidyr)
setNames(df, c("Org", paste(names(df), unlist(df[1, ], use.names = FALSE), sep = "_")[-1])) %>% 
  slice(-1) %>% 
  pivot_longer(-Org) %>% 
  separate(name, into = c("Date", "Contract"), sep = "_")

请注意,在开始将其他命令链接在一起之前,必须重命名数据集.

Note that you have to rename the dataset before you start chaining the other commands together.

这篇关于R日期作为包含重复值的列名(需要保留原始日期)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆