组合时间序列对象和列表: [英] Combining time-series objects and lists: Package "termstrc"

查看:83
本文介绍了组合时间序列对象和列表:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R包"termstrc"是为术语结构估算而设计的,它是一种非常有用的工具,但是它要求将数据设置为特别尴尬的格式:列表中的列表.

The R package "termstrc", designed for term-structure estimation, is an incredibly useful tool, but it requires data to be set in a particularly awkward format: lists within lists.

问题:为了创建运行"dyncouponbonds"功能所需的重复子列表格式,在R外部或R内部准备和成形数据的最佳方法是什么?

Question: What is the best way to prepare and shape data, either outside R or inside R, in order to create the repeated sublist format required to run the function "dyncouponbonds"?

"dyncouponbonds"命令要求在重复的子列表中设置数据,从而在债券的列表和这些债券的时不变特征(我们称其为债券列表")后面附加了这些债券的某些时间t特征(价格和应计利息),并在t + 1到T的时间内复制.

The "dyncouponbonds" command requires data to be set in a repeated sublist, whereby a list of bonds and time-invariant features of those bonds (let's call this "bondlist"), is appended with some time t features of those bonds (price and accrued interest), and replicated for time t+1 to T.

下面是一个期间的列表格式示例. "dyncouponbonds"命令要求在所有T时段内,在总括列表中复制此格式.每个期间的ISIN,MATURITYDATE,ISSUEDATE,COUPONRATE都相同.每个时期的价格,应计费用,报废价格和今日价格都会有所不同.

Below is an example of the list format for one period. The "dyncouponbonds" command requires this format to be replicated, within an umbrella list, for all T periods. ISIN, MATURITYDATE, ISSUEDATE, COUPONRATE will be identical for each period. PRICE, ACCRUED, CASHFLOWS and TODAY will be different for each period.

R> str(govbonds$GERMANY)

List of 8
$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" ...
$ MATURITYDATE:Class 'Date' num [1:52] 13924 13952 13980 14043 ...
$ ISSUEDATE :Class 'Date' num [1:52] 11913 13215 12153 13298 ...
$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 ...
$ PRICE : num [1:52] 100 99.9 99.8 99.8 ...
$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 ...
$ CASHFLOWS :List of 3
..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" ...
..$ CF : num [1:384] 104 103 103 103 ...
..$ DATE:Class 'Date' num [1:384] 13924 13952 13980 14043 ...
$ TODAY :Class 'Date' num 13908

推荐答案

这是一个相当高级的数据处理问题. R有许多强大的数据处理工具,您无需离开R即可准备(公认的相当钝的)dyncouponbonds对象.的确,您实际上不应该这样做,因为从另一种语言中获取结构,然后转换为dyncouponbonds,这将是更多的工作.

This a fairly advanced data manipulation question. R has many powerful data manipulation tools and you're not going to need to move away from R to prepare the (admittedly fairly obtuse) dyncouponbonds object. Indeed you actually shouldn't, because taking a structure from another language and then turning into dyncouponbonds will simply be more work.

我要确保的第一件事是您对lapply函数非常熟悉.您将充分利用它.您将使用它来创建couponbonds对象的列表,这就是dyncouponbonds的实际含义.但是,创建息票债券对象要困难一些,这主要是因为CASHFLOWS子列表需要每个与债券的ISIN相关的现金流量以及现金流量的日期.为此,您将使用lapply和一些相当高级的下标.子集功能也将派上用场.

The first thing I would make sure is that you are very familiar with the lapply function. You're going to be making plenty of use of it. You're going to be using it to create a list of couponbonds objects, which is what dyncouponbonds actually is. Creating couponbonds objects however is a little tougher, mainly because of the CASHFLOWS sublist which wants each cashflow associated with the bond's ISIN and with the date of the cashflow. For this you'll use lapply and some fairly advanced subscripting. The subset function will also come in handy.

这个问题在很大程度上还取决于您从何处获取数据,以及将其从彭博社中获取是不平凡的,主要是因为您将需要使用BDS函数和"DES_CASH_FLOW"字段返回历史记录每个债券以获取其现金流量.我说的是历史,因为如果您使用的是二元债券,那么我假设您将要进行历史的收益率曲线分析.您需要将BDS函数的"SETTLE_DT"字段覆盖为使用BDP函数和字段"FIRST_SETTLE_DT"为债券收到的值,以便从债券寿命开始时获得所有现金流量(否则,它只会从今天返回,这对历史分析没有好处.但是我离题了.如果您不使用Bloomberg,我不知道您将从何处获得此数据.

This question also very much depends on where you will be getting the data from, and getting it out of Bloomberg is non-trivial, mainly because you will need to go back in history using the BDS function and "DES_CASH_FLOW" field for each bond to get its cashflows. I say history, because if you're using dyncouponbonds I'm assuming you will want to do historic yield curve analysis. You'll need to override the BDS function's "SETTLE_DT" field, to the value that you will have received for the bond using the BDP function and field "FIRST_SETTLE_DT", so that you get all the cashflows from the beginning of the bond's life (otherwise it'll only return from today, and that's no good for historic analysis). But I digress. If you're not using bloomberg I don't know where you'll get this data from.

然后,您需要获取每个债券的静态数据,即期限,ISIN,票息率和发行日期.您将需要历史价格和应计利息数据.同样,如果使用Bloomberg,您将为此使用BDP函数,并在下面的代码中看到字段,并使用历史数据函数BDH(我将其包装为bbdh).再次假设您是Bloomberg用户,代码如下:

You'll then need to get the static data for each bond, namely the maturity, the ISIN, and the coupon rate and the issue date. And you'll need historic price and accrued interest data. Again if using bloomberg, you'll use the BDP function for this with fields you'll see in the code, below, and the historic data function BDH which I have wrapped as bbdh. Assuming again that you're a bloomberg user, here is the code:

bbGetCountry <- function(cCode, up = FALSE) {
# this function is going to get all the data out of bloomberg that we need for a
# country, and update it if ncessary
    if (up == TRUE) startDate <- as.Date("2012-01-01") else startDate <- histStartDate 
    # first get all the curve members for history
    wdays <- wdaylist(startDate, Sys.Date()) # create the list of working days from startdate
    actives <- lapply(wdays, function(x) { 
        bds(conn, BBcurveIDs[cCode], "CURVE_MEMBERS", override_fields = "CURVE_DATE",
        override_values = format(x, "%Y%m%d"))
    })
    names(actives) <- wdays
    uniqueActives <- unique(unlist(actives)) # there will be puhlenty duplicates. Get rid of them
    # now get the unchanging bond data
    staticData <- bdp(conn, uniqueActives, bbStaticDataFields)
    # now get the cash flowdata
    cfData <- lapply(uniqueActives, function(x) {
        bds(conn, x, "DES_CASH_FLOW_ADJ", override_fields = "SETTLE_DT", 
            override_values = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d"))
    })
    names(cfData) <- uniqueActives
    # now for historic data
    historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
    names(historicData) <- bbHistoricDataFields   # put the names in otherwise we get a numbered list
    allDates <- as.Date(index(historicData$LAST_PRICE)) # all the dates we will find settlement dates for for all bonds. No posix
    save(actives, file = paste("data/", cCode, "actives.dat", sep = ""))      #save all the files now
    save(staticData, file = paste("data/", cCode, "staticData.dat", sep = ""))
    save(cfData, file = paste("data/", cCode, "cfData.dat", sep = ""))
    save(historicData, file = paste("data/", cCode, "historicData.dat", sep = ""))
    #save(settleDates, file = paste("data/", cCode, "settleDates.dat", sep = ""))
    assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData,    #
        historicData = historicData), pos = 1)

}

我上面使用的bbdh函数是Rbbg库的bdh函数的包装,看起来像这样:

the bbdh function I use above is wrapper around the Rbbg library's bdh function and looks like this:

bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
        #this function gets secs over years from bloomberg daily data
            if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
            if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d") #convert date classes to bb string
            if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d") # if we've been passed wrong format character string 
            rawd <- bdh(conn, secs, flds, startDate, always.display.tickers = TRUE, include.non.trading.days = TRUE,
                option_names = c("nonTradingDayFillOption", "nonTradingDayFillMethod"),
                option_values = c("NON_TRADING_WEEKDAYS", "PREVIOUS_VALUE"))
            rawd <- dcast(rawd, date ~ ticker) #put into columns
            colnames(rawd) <- sub(" .*", "", colnames(rawd)) #remove the govt, currncy bits from bb tickers
            return(xts(rawd[, -1], order.by = as.POSIXct(rawd[, 1])))
        }

国家/地区代码来自将两个字母名称与Bloomberg收益曲线描述相关联的结构:

The country code comes from a structure which associates two letter names with bloomberg yield curve descriptions:

BBcurveIDs  <- list(PO = "YCGT0084 Index", #Portugal
                    DE = "YCGT0016 Index", 
                    FR = "YCGT0014 Index", 
                    SP = "YCGT0061 Index",
                    IT = "YCGT0040 Index",
                    AU = "YCGT0001 Index", #Australia
                    AS = "YCGT0063 Index", #Austria
                    JP = "YCGT0018 Index",
                    GB = "YCGT0022 Index",
                    HK = "YCGT0095 Index",
                    CA = "YCGT0007 Index",
                    CH = "YCGT0082 Index",
                    NO = "YCGT0078 Index",
                    SE = "YCGT0021 Index",
                    IR = "YCGT0062 Index",
                    BE = "YCGT0006 Index",
                    NE = "YCGT0020 index", 
                    ZA = "YCGT0090 Index",
                    PL = "YCGT0177 Index", #Poland
                    MX = "YCGT0251 Index")

因此bbGetCountry将根据以下Bloomberg字段创建4种不同的数据结构,分别称为Actives,staticData,dynamicData和HistoricData:

So bbGetCountry will create 4 different data structures, called actives, staticData, dynamicData, and historicData, all from the following bloomberg fields:

bbStaticDataFields <- c("ID_ISIN",
                      "ISSUER", 
                      "COUPON",
                      "CPN_FREQ",
                      "MATURITY",
                      "CALC_TYP_DES",                    # pricing calculation type 
                      "INFLATION_LINKED_INDICATOR",     # N or Y, in R returned as TRUE or FALSE
                      "ISSUE_DT",
                      "FIRST_SETTLE_DT",
                      "PX_METHOD",                      # PRC or YLD 
                      "PX_DIRTY_CLEAN",                 # market convention dirty or clean
                      "DAYS_TO_SETTLE",
                      "CALLABLE",
                      "MARKET_SECTOR_DES",
                      "INDUSTRY_SECTOR",
                      "INDUSTRY_GROUP",
                      "INDUSTRY_SUBGROUP")

bbDynamicDataFields <- c("IS_STILL_CALLABLE",
                        "RTG_MOODY",
                        "RTG_MOODY_WATCH",
                        "RTG_SP",
                        "RTG_SP_WATCH",
                        "RTG_FITCH",
                        "RTG_FITCH_WATCH")

bbHistoricDataFields <- c("PX_BID",
                          "PX_ASK",
                          #"PX_CLEAN_BID",
                          #"PX_CLEAN_ASK",
                          "PX_DIRTY_BID",
                          "PX_DIRTY_ASK",
                          #"ASSET_SWAP_SPD_BID",
                          #"ASSET_SWAP_SPD_ASK",
                          "LAST_PRICE",
                          #"SETTLE_DT",
                          "YLD_YTM_MID")

现在您可以使用所有这些数据结构来创建优惠券对象:

Now you're ready to create couponbond objects, using all these data structures:

createCouponBonds <- function(cCode, dateString) {
    cdata <- get(paste(cCode, "data", sep = "")) # get the data set
    today <- as.Date(dateString)
    settleDate <- today
    daycount <- 0
    while(daycount < 3) {
        settleDate <- settleDate + 1
        if (!(weekdays(settleDate) %in% c("Saturday", "Sunday"))) daycount <- daycount + 1
    }
    goodbonds <- subset(cdata$staticData, COUPON != 0 & INFLATION_LINKED_INDICATOR == FALSE) # clean out zeros and tbills
    goodbonds <- goodbonds[rownames(goodbonds) %in% cdata$actives[[dateString]][, 1], ]
    stripnames <- sapply(strsplit(rownames(goodbonds), " "), function(x) x[1])
    pxbid <- cdata$historicData$PX_BID[today, stripnames]
    pxask <- cdata$historicData$PX_ASK[today, stripnames]
    pxdbid <- cdata$historicData$PX_DIRTY_BID[today, stripnames]
    pxdask <- cdata$historicData$PX_DIRTY_ASK[today, stripnames]
    price <- as.numeric((pxbid + pxask) / 2)
    accrued <- as.numeric(pxdbid - pxbid)
    cashflows <- lapply(rownames(goodbonds), function(x) {
        goodflows <- cdata$cfData[[x]][as.Date(cdata$cfData[[x]][, "Date"]) >= today, ]
        #gfstipnames <- sapply(strsplit(rownames(goodflows), " "), function(x) x[1]) dunno if I need this
        isin <- rep(cdata$staticData[x, "ID_ISIN"], nrow(goodflows))
        cf <- apply(goodflows[, 2:3], 1, sum) / 10000
        dt <- as.Date(goodflows[, 1])
        return(list(isin = isin, cf = cf, dt = dt))
    })
    isinvec <- unlist(lapply(cashflows, function(x) x$isin))
    cfvec <- as.numeric(unlist(lapply(cashflows, function(x) x$cf)))
    datevec <- unlist(lapply(cashflows, function(x) x$dt))
    govbonds <- list(ISIN = goodbonds$ID_ISIN, 
                     MATURITYDATE = as.Date(goodbonds$MATURITY),
                     ISSUEDATE = as.Date(goodbonds$FIRST_SETTLE_DT),
                     COUPONRATE = as.numeric(goodbonds$COUPON) / 100,
                     PRICE = price,
                     ACCRUED = accrued,
                     CASHFLOWS = list(ISIN = isinvec, CF = cfvec, DATE = as.Date(datevec)),
                     TODAY = settleDate)
    govbonds <- list(govbonds)
    names(govbonds) <- cCode
    class(govbonds) <- "couponbonds"
    return(govbonds)
}

仔细研究现金流<-lapply ...函数,因为这是您创建子列表的地方,并且是您问题的答案的核心,尽管当然,如何完成很大程度上取决于关于您如何决定构建中间数据结构的信息,我只给了您一种可能性.我意识到我的答案很复杂,但是问题非常复杂.您所需的所有代码也不在此答案中,缺少一些帮助程序功能,但是如果您与我联系,我很乐意提供它们.当然,核心功能的骨架就在这里,实际上,很多问题是首先获取数据并进行适当地结构化.您正确地猜想每个键的某些数据是静态的,某些数据是动态的,而某些则是历史的.因此,中间数据结构的维对于不同部分的优惠券对象是不同的.尽管我已经为每个列表/数据框使用了单独的列表/数据框,并根据需要通过绑定ID进行了链接,但是如何表示取决于您.

Take a close look at the cashflows <- lapply... function because this is where you'll create the sublist and is the core of the answer to your question, although of course, how this is done depends very much on how you have decided to build the intermediate data structures, and I have given you just one possibility. I realise that my answer is complex, but the problem is very complex. All the code you need is not in this answer either, a few helper functions are missing, but I am happy to provide them if you contact me. Certainly the skeleton of the core functions is all here, and actually, much of the problem is getting the data in the first place, and structuring it appropriately. You correctly surmise that some of the data is static for each bond, some of it is dynamic, and some of it is historical. So the dimensions of the intermediate datas structures are different for different pieces of the couponbonds objects. How you represent that is up to you, though I have used separate lists / data frames for each, linked via the bond IDs where necessary.

上面的函数将使用一个日期字符串,因此您可以使用上面提到的lapply来对每个历史数据点进行处理,嘿,"presto",dyncouponds:

The function above will take a date string so you can do it for each of your historic data points, using the above-mentioned lapply, and hey "presto", dyncouponds:

spl <<- lapply(dodates, function(x) createCouponBonds("SP", x))
    names(spl) <<- lapply(spl, function(x) x$SP$TODAY)
    class(spl) <- "dyncouponbonds"

你去了.你要的...

如果您不使用Bloomberg,您的输入数据结构将有很大的不同,但是,正如我刚开始所说,请非常熟悉lapply和sapply.显然,还有许多其他方法可以解决此问题,但以上方法适用于彭博社.如果您理解此代码,那么您一定会知道您正在为其他数据源做什么.

If you're not using bloomberg, your input data structures will be very different but, as I said starting out, get super familiar with lapply and sapply. OBviously there are many other ways this problem could be solved, but the above works for Bloomberg. If you understand this code, you'll surely know what you're doing for other data sources.

最后请注意,来自findata.org的 Rbbg 软件包用于与Bloomberg进行交互.

Finally please note that the Rbbg package from findata.org is used to interface to bloomberg.

这篇关于组合时间序列对象和列表:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆