基于其他列计算平均值 [英] Calculate the average based on other columns

查看:144
本文介绍了基于其他列计算平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算紧接公告日前(包括公告日)前5,10,30个连续交易日收盘价的平均值,



不包括交易暂停天数(交易量为0或NA的天数)



例如,现在我们设置2014/5/7是公告日。



然后连续5天的平均价格:



平均价格(2014/5/7的价格, 2014/5/5,2014/5/2,2014/4 / 30,2014 / 4/29),



价格为2014/5/6和2014 /


$ b

有一件事要注意:每个股票的公告日是不同的,并且它不是数据中的最后有效日期,因此使用

 
日期价格卷
2014/5/9 1.42 668000
2014/5/8 1.4 2972​​000
2014/5/7 1.5 1180000
2014/5/6 1.59 0
2014/5/5 1.59 752000
2014/5 / 2 1.6 138000
2014/5/1 1.6 NA
2014/4/30 1.6 656000
2014/4/29 1.61 364000
2014/4/28 1.61 1786000
2014/4/25 1.64 1734000
2014/4/24 1.68 1130000
2014/4/23 1.68 506000
2014/4/22 1.67 354000
2014/4 / 21 1.7 0
2014/4/18 1.7 0
2014/4/17 1.7 1954 000
2014/4/16 1.65 1788000
2014/4/15 1.71 1294000
2014/4/14 1.68 1462000

可复制代码:

  require(quantmod)
require(data.table)

tickers< - c(0007.hk,1036.hk )
date_begin< - as.Date(2010-01-01)
date_end< - as.Date(2014-09-09)


#retrive所有股票的数据
price < - getSymbols(tickers,from = date_begin,to = date_end,auto.assign = TRUE)

dataset& (i in 2:length(prices)){
dataset(get(price [1])),$(


< - merge(dataset,Cl(get(prices [i])),Vo(get(prices [i]))
}

# .zoo(dataset,file =prices.csv,sep =,,qmethod =double)

#读取动物园
test< - fread(prices.csv )

setnames(test,Index,Date)

然后我得到了一个data.table。第一列是日期,然后是每个股票的价格和数量。



实际上,原始数据包含约40个股票的信息。列名称具有相同的模式:X+ ticker.close,X+ ticker.volumn



不同股票的最后交易日不同。



所需输出:

 
天0007.HK 1036.HK
5 1.1 1.1
10 1.1 1.1
30 1.1 1.1

主要问题:


  1. .SD和lapply和.SDCol可用于循环不同的股票。 .N可用于计算最后连续N天。


  2. 由于不同的公告日,它变得有点复杂。


任何使用quantmod或使用data.table的多个股票对单一股票的建议都非常受欢迎!



感谢GSee和pbible的好的解决方案,这是非常有用的。我会更新我的代码以后合并不同的公告日为每个股票,并咨询你后来。



确实,这是一个xts问题比一个data.table。任何关于data.table将是非常有帮助。非常感谢!



因为不同的股票有不同的公告日期,我试图做出一个解决方案首先遵循@ pbible的逻辑,任何建议将非常欢迎。

  library(quantmod)
tickers< - c(0007.hk,1036.hk)
date_begin& - as.Date(2010-01-01)

#为了方便以下工作,使用不同的date_end来代替一个特定的date_end。

date_end< - c(as.Date(2014-07-08),as.Date(2014-05-15))

i in 1:length(date_end)){

stocks< - getSymbols(tickers [i],from = date_begin,to = date_end [i],auto.assign = TRUE)
数据集< - cbind(Cl(get(stocks)),Vo(get(stocks)))
可用子集(数据集,数据集[,2] [,2]))
sma.5 <-SMA(可用[,1],5)
sma.10 <-SMA(可用[,1],10)
sma.30 <-SMA(可用[,1],30)
col < - as.matrix(rbind(tail(sma.5,1),tail(sma.10,1),tail sma.30,1)))
colnames(col)< - colnames(可用的[,1])$ ​​b $ b rownames(col)< - c(5,10,30 )

if(i == 1){
matrix< - as.matrix(col)
}
else {matrix< - cbind ,col)}
}



我得到了我想要的,但是代码是丑陋的。任何建议,使其优雅是非常欢迎!

解决方案

好吧,这里是一种方法。我不知道你为什么要摆脱循环,这不会摆脱它(实际上它有一个循环嵌套在另一个)。你所做的一件事是在你的循环的每次迭代(即 matrix 部分是低效的)在内存中增加对象。

  library(quantmod)
tickers< - c(0007.hk, 1036.hk)
date_begin< - as.Date(2010-01-01)

myEnv< - new.env()
date_end< c(as.Date(2014-07-08),as.Date(2014-05-15))
回顾< - c(5,10,30)#不同天数回顾计算平均值。

symbols< - getSymbols(ticker,from = date_begin,
to = tail(sort(date_end),1),env = myEnv)#to = last date
end .dates <-setNames(date_end,symbols)

out< - do.call(cbind,lapply(end.dates,function(x){
dat& omit(get(names(x),pos = myEnv))[paste0(/,x)]
prc < (vapply(lookback,function(n)mean(tail(prc,n)),numeric(1)),
lookback)
})

colnames < - names(end.dates)
out

#0007.HK 1036.HK
#5 1.080 8.344
#10 1.125 8.459
# 30 1.186 8.805

一些评论...




  • 我创建了一个新环境, myEnv 来保存您的数据,使其不会混淆您的工作空间。

  • 我使用 getSymbols 的输出(与您的尝试中一样),因为输入代码不是大写。

  • 我命名结束日期的向量,以便我们可以遍历该向量,并知道结束日期和股票的名称。

  • 大部分代码是 lapply 循环(包含在 do.call(cbind,.. 。))。我正在循环指定的 end.dates 向量。


    1. 第一行从 myEnv ,删除NAs,并将其子集只包含到相关结束日期的数据。

    2. 下一行提取关闭列,将其子容器仅包括体积大于零的行。

    3. vapply 循环遍历不同回溯的向量, code>表示。它被包装在 setNames 中,以便每个结果根据用来计算它的回溯命名。


  • lapply 调用返回命名向量的列表。 do.call(cbind,LIST)与调用 cbind(LIST [[1]],LIST [[2] [3]])除了 LIST 可以是任何长度的列表。

  • 有一个有行名,但没有列名的矩阵。所以,我根据他们代表的股票命名列。



希望这有助于。


I want to calculate

"average of the closing prices for the 5,10,30 consecutive trading days immediately preceding and including the Announcement Day, but excluding trading halt days (days on which trading volume is 0 or NA)

For example, now we set 2014/5/7 is the Announcement day.

then average of price for 5 consecutive days :

average of (price of 2014/5/7,2014/5/5, 2014/5/2, 2014/4/30,2014/4/29),

price of 2014/5/6 and 2014/5/1 was excluded due to 0 trading volume on those days.

EDIT on 11/9/2014

One thing to Note: the announcement day for each stock is different, and it's not last valid date in the data, so usage of tail when calculating average was not appropriate.

Date        Price   Volume
2014/5/9    1.42    668000
2014/5/8    1.4     2972000
2014/5/7    1.5     1180000
2014/5/6    1.59    0
2014/5/5    1.59    752000
2014/5/2    1.6     138000
2014/5/1    1.6     NA
2014/4/30   1.6     656000
2014/4/29   1.61    364000
2014/4/28   1.61    1786000
2014/4/25   1.64    1734000
2014/4/24   1.68    1130000
2014/4/23   1.68    506000
2014/4/22   1.67    354000
2014/4/21   1.7     0
2014/4/18   1.7     0
2014/4/17   1.7     1954000
2014/4/16   1.65    1788000
2014/4/15   1.71    1294000
2014/4/14   1.68    1462000

Reproducible Code:

require(quantmod)
require(data.table)

tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")


# retrive data of all stocks
prices <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)

dataset <- merge(Cl(get(prices[1])),Vo(get(prices[1])))


for (i in 2:length(prices)){
  dataset <- merge(dataset, Cl(get(prices[i])),Vo(get(prices[i])))
}

# Write First
write.zoo(dataset, file = "prices.csv", sep = ",", qmethod = "double")

# Read zoo
test <- fread("prices.csv")

setnames(test, "Index", "Date")

Then I got a data.table. The first Column is Date, then the price and volume for each stock.

Actually, the original data contains information for about 40 stocks. Column names have the same patter: "X" + ticker.close , "X" + ticker.volumn

Last trading days for different stock were different.

The desired output :

days    0007.HK 1036.HK
5       1.1     1.1
10      1.1     1.1
30      1.1     1.1

The major issues:

  1. .SD and lapply and .SDCol can be used for looping different stocks. .N can be used when calculating last consecutive N days.

  2. Due to the different announcement day, it becomes a little complicated.

Any suggestions on single stock using quantmod or multiple stocks using data.table are extremely welcomed!

Thanks GSee and pbible for the nice solutions, it was very useful. I'll update my code later incorporating different announcement day for each stocks, and consult you later.

Indeed, it's more a xts question than a data.table one. Anything about data.table will be very helpful. Thanks a lot!

Because the different stocks have different announcement days, I tried to make a solution first following @pbible's logic, any suggestions will be extremely welcomed.

library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")

# Instead of making one specific date_end, different date_end is used for convenience of the following work.

date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))

for ( i in 1: length(date_end)) {

  stocks <- getSymbols(tickers[i], from = date_begin, to = date_end[i], auto.assign = TRUE)
  dataset <- cbind(Cl(get(stocks)),Vo(get(stocks)))
  usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
  sma.5 <- SMA(usable[,1],5)
  sma.10 <- SMA(usable[,1],10)
  sma.30 <- SMA(usable[,1],30)
  col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
  colnames(col) <- colnames(usable[,1])
  rownames(col) <- c("5","10","30")

  if (i == 1) {
    matrix <- as.matrix(col)
  }
  else  {matrix <- cbind(matrix,col)}
}

I got what I want, but the code is ugly..Any suggestions to make it elegant are extremely welcomed!

解决方案

Well, here's a way to do it. I don't know why you want to get rid of the loop, and this does not get rid of it (in fact it has a loop nested inside another). One thing that you were doing is growing objects in memory with each iteration of your loop (i.e. the matrix <- cbind(matrix,col) part is inefficient). This Answer avoids that.

library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")

myEnv <- new.env()
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
lookback <- c(5, 10, 30) # different number of days to look back for calculating mean.

symbols <- getSymbols(tickers, from=date_begin, 
                      to=tail(sort(date_end), 1), env=myEnv) # to=last date
end.dates <- setNames(date_end, symbols)

out <- do.call(cbind, lapply(end.dates, function(x) {
  dat <- na.omit(get(names(x), pos=myEnv))[paste0("/", x)]
  prc <- Cl(dat)[Vo(dat) > 0]
  setNames(vapply(lookback, function(n) mean(tail(prc, n)), numeric(1)), 
           lookback)
}))

colnames(out) <- names(end.dates)
out

#   0007.HK 1036.HK
#5    1.080   8.344
#10   1.125   8.459
#30   1.186   8.805

Some commentary...

  • I created a new environment, myEnv, to hold your data so that it does not clutter your workspace.
  • I used the output of getSymbols (as you did in your attempt) because the input tickers are not uppercase.
  • I named the vector of end dates so that we can loop over that vector and know both the end date and the name of the stock.
  • the bulk of the code is an lapply loop (wrapped in do.call(cbind, ...)). I'm looping over the named end.dates vector.

    1. The first line gets the data from myEnv, removes NAs, and subsets it to only include data up to the relevant end date.
    2. The next line extracts the close column and subsets it to only include rows where volume is greater than zero.
    3. The vapply loops over a vector of different lookbacks and calculates the mean. That is wrapped in setNames so that each result is named based on which lookback was used to calculate it.

  • The lapply call returns a list of named vectors. do.call(cbind, LIST) is the same as calling cbind(LIST[[1]], LIST[[2]], LIST[[3]]) except LIST can be a list of any length.
  • at this point we have a matrix with row names, but no column names. So, I named the columns based on which stock they represent.

Hope this helps.

这篇关于基于其他列计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆