基于其他列计算平均值 [英] Calculate the average based on other columns
问题描述
我想计算紧接公告日前(包括公告日)前5,10,30个连续交易日收盘价的平均值,
不包括交易暂停天数(交易量为0或NA的天数)
例如,现在我们设置2014/5/7是公告日。
然后连续5天的平均价格:
平均价格(2014/5/7的价格, 2014/5/5,2014/5/2,2014/4 / 30,2014 / 4/29),
价格为2014/5/6和2014 /
$ b 有一件事要注意:每个股票的公告日是不同的,并且它不是数据中的最后有效日期,因此使用
尾$ c $
日期价格卷
2014/5/9 1.42 668000
2014/5/8 1.4 2972000
2014/5/7 1.5 1180000
2014/5/6 1.59 0
2014/5/5 1.59 752000
2014/5 / 2 1.6 138000
2014/5/1 1.6 NA
2014/4/30 1.6 656000
2014/4/29 1.61 364000
2014/4/28 1.61 1786000
2014/4/25 1.64 1734000
2014/4/24 1.68 1130000
2014/4/23 1.68 506000
2014/4/22 1.67 354000
2014/4 / 21 1.7 0
2014/4/18 1.7 0
2014/4/17 1.7 1954 000
2014/4/16 1.65 1788000
2014/4/15 1.71 1294000
2014/4/14 1.68 1462000
可复制代码:
require(quantmod)
require(data.table)
tickers< - c(0007.hk,1036.hk )
date_begin< - as.Date(2010-01-01)
date_end< - as.Date(2014-09-09)
#retrive所有股票的数据
price < - getSymbols(tickers,from = date_begin,to = date_end,auto.assign = TRUE)
dataset& (i in 2:length(prices)){
dataset(get(price [1])),$(
< - merge(dataset,Cl(get(prices [i])),Vo(get(prices [i]))
}
# .zoo(dataset,file =prices.csv,sep =,,qmethod =double)
#读取动物园
test< - fread(prices.csv )
setnames(test,Index,Date)
然后我得到了一个data.table。第一列是日期,然后是每个股票的价格和数量。
实际上,原始数据包含约40个股票的信息。列名称具有相同的模式:X+ ticker.close,X+ ticker.volumn
不同股票的最后交易日不同。
所需输出:
天0007.HK 1036.HK
5 1.1 1.1
10 1.1 1.1
30 1.1 1.1
主要问题:
-
.SD和lapply和.SDCol可用于循环不同的股票。 .N可用于计算最后连续N天。
-
由于不同的公告日,它变得有点复杂。
任何使用quantmod或使用data.table的多个股票对单一股票的建议都非常受欢迎!
感谢GSee和pbible的好的解决方案,这是非常有用的。我会更新我的代码以后合并不同的公告日为每个股票,并咨询你后来。
确实,这是一个xts问题比一个data.table。任何关于data.table将是非常有帮助。非常感谢!
因为不同的股票有不同的公告日期,我试图做出一个解决方案首先遵循@ pbible的逻辑,任何建议将非常欢迎。
library(quantmod)
tickers< - c(0007.hk,1036.hk)
date_begin& - as.Date(2010-01-01)
#为了方便以下工作,使用不同的date_end来代替一个特定的date_end。
date_end< - c(as.Date(2014-07-08),as.Date(2014-05-15))
i in 1:length(date_end)){
stocks< - getSymbols(tickers [i],from = date_begin,to = date_end [i],auto.assign = TRUE)
数据集< - cbind(Cl(get(stocks)),Vo(get(stocks)))
可用子集(数据集,数据集[,2] [,2]))
sma.5 <-SMA(可用[,1],5)
sma.10 <-SMA(可用[,1],10)
sma.30 <-SMA(可用[,1],30)
col < - as.matrix(rbind(tail(sma.5,1),tail(sma.10,1),tail sma.30,1)))
colnames(col)< - colnames(可用的[,1])$ b $ b rownames(col)< - c(5,10,30 )
if(i == 1){
matrix< - as.matrix(col)
}
else {matrix< - cbind ,col)}
}
我得到了我想要的,但是代码是丑陋的。任何建议,使其优雅是非常欢迎!
解决方案好吧,这里是一种方法。我不知道你为什么要摆脱循环,这不会摆脱它(实际上它有一个循环嵌套在另一个)。你所做的一件事是在你的循环的每次迭代(即 matrix 部分是低效的)在内存中增加对象。
library(quantmod)
tickers< - c(0007.hk, 1036.hk)
date_begin< - as.Date(2010-01-01)
myEnv< - new.env()
date_end< c(as.Date(2014-07-08),as.Date(2014-05-15))
回顾< - c(5,10,30)#不同天数回顾计算平均值。
symbols< - getSymbols(ticker,from = date_begin,
to = tail(sort(date_end),1),env = myEnv)#to = last date
end .dates <-setNames(date_end,symbols)
out< - do.call(cbind,lapply(end.dates,function(x){
dat& omit(get(names(x),pos = myEnv))[paste0(/,x)]
prc < (vapply(lookback,function(n)mean(tail(prc,n)),numeric(1)),
lookback)
})
colnames < - names(end.dates)
out
#0007.HK 1036.HK
#5 1.080 8.344
#10 1.125 8.459
# 30 1.186 8.805
一些评论...
- 我创建了一个新环境,
myEnv
来保存您的数据,使其不会混淆您的工作空间。
- 我使用
getSymbols
的输出(与您的尝试中一样),因为输入代码不是大写。
- 我命名结束日期的向量,以便我们可以遍历该向量,并知道结束日期和股票的名称。
- 大部分代码是
lapply
循环(包含在 do.call(cbind,.. 。)
)。我正在循环指定的 end.dates
向量。
- 第一行从
myEnv
,删除NAs,并将其子集只包含到相关结束日期的数据。
- 下一行提取关闭列,将其子容器仅包括体积大于零的行。
-
vapply
循环遍历不同回溯的向量, code>表示
。它被包装在 setNames
中,以便每个结果根据用来计算它的回溯命名。
lapply
调用返回命名向量的列表。 do.call(cbind,LIST)
与调用 cbind(LIST [[1]],LIST [[2] [3]])
除了 LIST
可以是任何长度的列表。希望这有助于。
I want to calculate
"average of the closing prices for the 5,10,30 consecutive trading days immediately preceding and including the Announcement Day, but excluding trading halt days (days on which trading volume is 0 or NA)
For example, now we set 2014/5/7 is the Announcement day.
then average of price for 5 consecutive days :
average of (price of 2014/5/7,2014/5/5, 2014/5/2, 2014/4/30,2014/4/29),
price of 2014/5/6 and 2014/5/1 was excluded due to 0 trading volume on those days.
EDIT on 11/9/2014
One thing to Note: the announcement day for each stock is different, and it's not last valid date in the data, so usage of tail
when calculating average was not appropriate.
Date Price Volume 2014/5/9 1.42 668000 2014/5/8 1.4 2972000 2014/5/7 1.5 1180000 2014/5/6 1.59 0 2014/5/5 1.59 752000 2014/5/2 1.6 138000 2014/5/1 1.6 NA 2014/4/30 1.6 656000 2014/4/29 1.61 364000 2014/4/28 1.61 1786000 2014/4/25 1.64 1734000 2014/4/24 1.68 1130000 2014/4/23 1.68 506000 2014/4/22 1.67 354000 2014/4/21 1.7 0 2014/4/18 1.7 0 2014/4/17 1.7 1954000 2014/4/16 1.65 1788000 2014/4/15 1.71 1294000 2014/4/14 1.68 1462000
Reproducible Code:
require(quantmod)
require(data.table)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
date_end <- as.Date("2014-09-09")
# retrive data of all stocks
prices <- getSymbols(tickers, from = date_begin, to = date_end, auto.assign = TRUE)
dataset <- merge(Cl(get(prices[1])),Vo(get(prices[1])))
for (i in 2:length(prices)){
dataset <- merge(dataset, Cl(get(prices[i])),Vo(get(prices[i])))
}
# Write First
write.zoo(dataset, file = "prices.csv", sep = ",", qmethod = "double")
# Read zoo
test <- fread("prices.csv")
setnames(test, "Index", "Date")
Then I got a data.table. The first Column is Date, then the price and volume for each stock.
Actually, the original data contains information for about 40 stocks. Column names have the same patter: "X" + ticker.close , "X" + ticker.volumn
Last trading days for different stock were different.
The desired output :
days 0007.HK 1036.HK 5 1.1 1.1 10 1.1 1.1 30 1.1 1.1
The major issues:
.SD and lapply and .SDCol can be used for looping different stocks. .N can be used when calculating last consecutive N days.
Due to the different announcement day, it becomes a little complicated.
Any suggestions on single stock using quantmod or multiple stocks using data.table are extremely welcomed!
Thanks GSee and pbible for the nice solutions, it was very useful. I'll update my code later incorporating different announcement day for each stocks, and consult you later.
Indeed, it's more a xts question than a data.table one. Anything about data.table will be very helpful. Thanks a lot!
Because the different stocks have different announcement days, I tried to make a solution first following @pbible's logic, any suggestions will be extremely welcomed.
library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
# Instead of making one specific date_end, different date_end is used for convenience of the following work.
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
for ( i in 1: length(date_end)) {
stocks <- getSymbols(tickers[i], from = date_begin, to = date_end[i], auto.assign = TRUE)
dataset <- cbind(Cl(get(stocks)),Vo(get(stocks)))
usable <- subset(dataset,dataset[,2] > 0 & !is.na(dataset[,2]))
sma.5 <- SMA(usable[,1],5)
sma.10 <- SMA(usable[,1],10)
sma.30 <- SMA(usable[,1],30)
col <- as.matrix(rbind(tail(sma.5,1), tail(sma.10,1), tail(sma.30,1)))
colnames(col) <- colnames(usable[,1])
rownames(col) <- c("5","10","30")
if (i == 1) {
matrix <- as.matrix(col)
}
else {matrix <- cbind(matrix,col)}
}
I got what I want, but the code is ugly..Any suggestions to make it elegant are extremely welcomed!
Well, here's a way to do it. I don't know why you want to get rid of the loop, and this does not get rid of it (in fact it has a loop nested inside another). One thing that you were doing is growing objects in memory with each iteration of your loop (i.e. the matrix <- cbind(matrix,col)
part is inefficient). This Answer avoids that.
library(quantmod)
tickers <- c("0007.hk","1036.hk")
date_begin <- as.Date("2010-01-01")
myEnv <- new.env()
date_end <- c(as.Date("2014-07-08"),as.Date("2014-05-15"))
lookback <- c(5, 10, 30) # different number of days to look back for calculating mean.
symbols <- getSymbols(tickers, from=date_begin,
to=tail(sort(date_end), 1), env=myEnv) # to=last date
end.dates <- setNames(date_end, symbols)
out <- do.call(cbind, lapply(end.dates, function(x) {
dat <- na.omit(get(names(x), pos=myEnv))[paste0("/", x)]
prc <- Cl(dat)[Vo(dat) > 0]
setNames(vapply(lookback, function(n) mean(tail(prc, n)), numeric(1)),
lookback)
}))
colnames(out) <- names(end.dates)
out
# 0007.HK 1036.HK
#5 1.080 8.344
#10 1.125 8.459
#30 1.186 8.805
Some commentary...
- I created a new environment,
myEnv
, to hold your data so that it does not clutter your workspace. - I used the output of
getSymbols
(as you did in your attempt) because the input tickers are not uppercase. - I named the vector of end dates so that we can loop over that vector and know both the end date and the name of the stock.
- the bulk of the code is an
lapply
loop (wrapped indo.call(cbind, ...)
). I'm looping over the namedend.dates
vector.
- The first line gets the data from
myEnv
, removes NAs, and subsets it to only include data up to the relevant end date. - The next line extracts the close column and subsets it to only include rows where volume is greater than zero.
- The
vapply
loops over a vector of different lookbacks and calculates themean
. That is wrapped insetNames
so that each result is named based on which lookback was used to calculate it.
- The first line gets the data from
- The
lapply
call returns a list of named vectors.do.call(cbind, LIST)
is the same as callingcbind(LIST[[1]], LIST[[2]], LIST[[3]])
exceptLIST
can be a list of any length. - at this point we have a matrix with row names, but no column names. So, I named the columns based on which stock they represent.
Hope this helps.
这篇关于基于其他列计算平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!