将 xts-dataframe 分成几组,折叠为每周数据并保留时间索引 [英] split up xts-dataframe into several groups, collapse to weekly data and keep the time index

查看:32
本文介绍了将 xts-dataframe 分成几组,折叠为每周数据并保留时间索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手,所以如果我的问题的答案太明显,我深表歉意.我是以下形式的数据集:

I am a total newbie to R so I apologize if the answer to my question is too obvious. I a data set of the following form:

Date, V1,V...,VN,Land,Nace
22/03/1995 23:01:12,1,3,2,15,A
21/03/1995 21:01:12,3,3,1,9,C
1/04/1995 17:01:06,3,2,1,3,B   

我想通过 Land、NACE(这是一个行业代码)、Date(我想将整个想法折叠为每周数据)和三个不同的回答选项来分析 data.frame 中的数据{1,2,3} 在 V1...VN 中.这是我的数据示例:

I would like to analyze the data in the data.frame by Land, NACE (it's an industry code), Date (I would like to collapse the whole think to weekly data) and by the three different answering options {1,2,3} in V1...VN. This is a sample of my data:

example <- as.data.frame(structure(c(" 1", " 2", " 1", " 2", " 1", " 1", " 2", " 1", " 2", 
" 1", " 2", " 3", " 1", " 1", " 2", " 2", " 3", " 1", " 2", " 2", 
" 1", " 2", " 1", " 1", " 2", NA, " 2", NA, NA, " 1", " 3", " 1", 
" 3", " 3", " 2", " 3", " 3", " 3", " 2", " 2", " 2", " 3", " 3", 
" 3", " 2", " 2", " 3", " 3", " 3", " 3", " 1", " 2", " 1", " 2", 
" 2", " 1", " 2", " 1", " 2", " 2", " 2", " 3", " 1", " 1", " 2", 
" 2", " 3", " 3", " 2", " 2", " 1", " 2", " 1", " 1", " 2", NA, 
" 2", NA, NA, " 1", " 3", " 2", " 3", " 2", " 0", " 3", " 3", 
" 3", " 2", " 0", " 2", " 3", " 3", " 3", " 0", " 2", " 2", " 3", 
" 3", " 0", "12", " 5", " 9", "14", " 5", "tra", "tra", "man", 
"inf", "agc", "07-2011", "07-2011", "07-2011", "07-2011", "07-2011" 
), .indexCLASS = c("POSIXlt", "POSIXt"), .indexTZ = "", class = c("xts", 
"zoo"), .indexFORMAT = "%U-%Y", index = structure(c(1297642226, 
1297672737, 1297741204, 1297748893, 1297749513), tzone = "", tclass = c("POSIXlt", 
"POSIXt")), .Dim = c(5L, 23L), .Dimnames = list(NULL, c("rev_sit", 
"prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit", 
"emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit", 
"rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp", 
"ord_abr_exp", "emp_cost_exp", "usage_cost_exp", "tax_cost_exp", 
"gov_cost_exp", "land", "nace", "index")))) 

prof_sit 等是问题,下面是 1,2,3 等级的答案.land、nace 和 index(即时间索引)是我想要拆分数据集的变量.目标是获得一个 xts 数据框,如下所示:

prof_sit etc. are questions and below there are the answer on the scale 1,2,3. land, nace and index (that's the time index) are the variables with the help of which i would like to split up the dataset. The goal is to get an xts data frame which would look like:

-,nace.land,nace.land,nace.land,...
10-1995,sum of answers coded i.e. as 1 for a certain nace and a certain land,sum,sum,...  
11-1995,sum,sum,...
12-1995,sum,sum,...

其中 12-1995 是 1995 年的第 12 个日历周.我最接近此解决方案的是使用 tapply:

where 12-1995 is the 12th calendar week in 1995. The nearest I came to this solution was with tapply:

pos <- as.data.frame(tapply((example[,1]==3)*1,
  list(example$index, example$land, example$nace), sum)) 

在低迷时期,作为 xts 对象的格式丢失,因此行的顺序不正确,它或多或少地做了我想要的.作为第二个缺点,可能还会提到我将运行循环以对所有 20 个问题应用相同的技术.有人知道该问题的解决方案吗?我感谢每一个帮助或提示,因为我已经在这个问题上浪费了好几天的时间.

It does more or less what I want with the downturn that the format as an xts object is lost and thus that the rows are not in the right order. as a second disadavantage it might also be mentioned that i will to run loops to apply the same technique for all the twenty questions. does anybody know a solution for that problem? i appreciate every help or hint since i am wasting my time on this problem since several days now.

最好的问候,

安德烈亚斯

推荐答案

非常感谢您的帮助.与此同时,我正忙于其他一些事情,但现在我又在处理我的问题了,在您的精彩评论的帮助下,我找到了一个解决方案:

thank you very much for all your help. I was busy with some other stuff in the meanwhile but now I was working on my problem again, and with the help of your great comments I have found a solution:

我放弃了直接处理时间序列,将这一步推迟到我的分析结束.因此,我将日期向量转换为周:

I gave up working directly with time series, postponing this step to the end of my analysis. Therefore I take the date vector and transform it into weeks:

图书馆(ISOweek)

library(ISOweek)

d$index <- ISOweek(d$date)

(我用 ISOweek 做这个,因为我使用的是 Windows)

(i do this with ISOweek since I am using Windows)

然后我使用了 tapply 和 lapply 的组合.以下函数计算每个日历周 (d$index = t[[22]]) 和两个分类列的每个组合的调查中肯定答案的数量(由 1 编码)t[[21]], t[[22]].在同一步骤中,整个事情都被转换成一个数据框:

then I use a combination of tapply and lapply. The following function calculates the number of positive answers in the survey (coded by 1) for every calendar week (d$index = t[[22]]) and every combination of the two categorical columns t[[21]], t[[22]]. In the same step the whole thing is transformed into a data frame:

groupweeksums <- function(x,t){as.data.frame(tapply((x==1)*1,list(t[[23]],t[[21]],t[[22]]), function(d)sum(d,na.rm=TRUE)))}

  • x 代表特定列,
  • t 用于数据框(否则我不知道该怎么做,因为有一次我必须在另一个数据框上寻址一列,我想避免大量输入);
  • 如果 d 是数据框,则:

    if d is the data frame then:

    df <- groupweeksums(d,d)
    

    为了不必为我所有的 20 个问题重复此过程,请使用 lapply:

    in order that I don't have to repeat this procedure for all of my 20 questions is use lapply:

    df <- as.data.frame(lapply(df[,1:20],function(d)groupweeksums(d,euwifo)))
    

    这给了我一个漂亮的数据框,我需要进一步分析.感谢您的帮助,通过您的有益评论,我离解决方案越来越近了!!!

    This gives me a beautiful data frame with all I need for further analysis. Thanks for your help, with your helpful comments I came closer and closer to the solution!!!

    附言我还将将此答案发布到我在 stackoverflow 上发布的另一个问题,该问题与第一个相关.

    P.S. I will also post this answer to the other question I posted on stackoverflow which was connected to this first one.

    这篇关于将 xts-dataframe 分成几组,折叠为每周数据并保留时间索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆