如何在R中的data.table中使用自定义函数 [英] How to use a custom function in data.table in R

查看:118
本文介绍了如何在R中的data.table中使用自定义函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的交易数据。它显示了从 from 列中的帐户到 to 列中的帐户的交易,并带有日期和金额信息

This is my transaction data. It shows the transactions made from the accounts in from column to the accounts in to column with the date and the amount information

data 

id          from    to          date        amount  
<int>       <chr>   <chr>       <date>      <dbl>
19521       6644    6934        2005-01-01  700.0
19524       6753    8456        2005-01-01  600.0
19523       9242    9333        2005-01-01  1000.0
…           …       …           …           …
1056317     7819    7454        2010-12-31  60.2
1056318     6164    7497        2010-12-31  107.5
1056319     7533    7492        2010-12-31  164.1

我想计算最近6个月之前交易网络上的亲密度中心度度量每个特定交易的日期,并希望将此信息保存为原始数据中的新列。

I want to calculate closeness centrality measure on the networks of transactions made in the last 6 month prior to the date each particular transaction was made and want to save this information as a new column in the original data.

我将在此处使用的示例数据是:

example data I'll use here is:

structure(list(id = c(83324L, 87614L, 88898L, 89874L, 94765L, 
100277L, 101587L), from = c("5370", "7816", "8046", "5492", "8756", 
"5370", "9254"), to = c("9676", "5370", "5370", "5370", "5370", 
"9105", "5370"), date = structure(c(13391, 13400, 13404, 13409, 
13428, 13452, 13452), class = "Date"), amount = c(261.1, 16400, 
3500, 2700, 19882, 182, 14.6)), row.names = c(NA, -7L), class = "data.frame")

现在,以下代码可以很好地在一个小的数据集中完成此操作:

Now, this following code works very well to accomplish this in a small dataset:

library(tnet)
closeness_fnc <- function(data){
  accounts <- data[date == max(date),from]
  id <- data[date == max(date),id]  
  
  # for directed networks
  df <- data %>% group_by(from, to) %>% mutate(weights = sum(amount)) %>% select(from, to, weights) %>% distinct
  cl <- closeness_w(df, directed = T, gconly=FALSE, alpha = 0.5) 
  
  list(
    id = id,
    closeness_directed = cl[,"n.closeness"][accounts]
  )

}

network_data <- data[, closeness_fnc(data[(date >= end_date - 180) & (date <= end_date)]), .(end_date = date)] %>% select(-end_date)
# adding this info into the original data
data <- merge(x = data, y = network_data, by = "id")

因此,输出与我预期的一样:

So, the output is as I expected:

# data
id      from    to      date        amount  closeness_directed 
<int>   <chr>   <chr>   <date>      <dbl>   <dbl> 
83324   5370    9676    2006-08-31  261.1   1.00000000
87614   7816    5370    2006-09-09  16400.0 0.98744695
88898   8046    5370    2006-09-13  3500.0  0.35329017
89874   5492    5370    2006-09-18  2700.0  0.25176754
94765   8756    5370    2006-10-07  19882.0 0.39233504
100277  5370    9105    2006-10-31  182.0   0.07167582
101587  9254    5370    2006-10-31  14.6    0.02390589

但是,由于我的数据有超过100万行,因此该代码需要一天以上的时间才能完成(运行时间超过12小时,尚未完成)。

However, since my data has over 1 million rows, this code will take more than a day to complete(it runs for more than 12 hours and hasn't yet finished).

我遇到了类似的运行时间问题这里,我想对这段代码应用相同的逻辑。因此,我修改了我的代码,如下所示:

I had a similar running time problem here and I want to apply the same logic to this code. So, I modified my code as follows:

library(tnet)
closeness_fnc <- function(data){
  accounts <- data[date == max(date),from]
  id <- data[date == max(date),id]  
  
  # for directed networks
  df <- data %>% group_by(from, to) %>% mutate(weights = sum(amount)) %>% select(from, to, weights) %>% distinct
  cl <- closeness_w(df, directed = T, gconly=FALSE, alpha = 0.5) 
  
  closeness_directed <- cl[,"n.closeness"][accounts]
  closeness_directed <- as.data.frame(closeness_directed)
  closeness_directed$from <- rownames(closeness_directed)
  rownames(closeness_directed) <- NULL

  return(closeness_directed)
}

# this is the approach given in the link I provided:
setDT(data)[, date_minus_180 := date - 180]
data[, ':=' (closeness_directed = data[data, closeness_fnc(data), 
     on = .(from, date <= date, date >= date_minus_180), by = .EACHI]$closeness_directed
     )] %>% select(-date_minus_180)

但是,显然不行,因为

data[data, closeness_fnc(data), 
     on = .(from, date <= date, date >= date_minus_180), by = .EACHI]

提供输出

from   date         date     closeness_directed      from   
<chr>  <date>      <date>       <dbl>                <chr>

5370    2006-08-31  2006-03-04  0.07167582           5370
5370    2006-08-31  2006-03-04  0.02390589           9254
7816    2006-09-09  2006-03-13  0.07167582           5370
7816    2006-09-09  2006-03-13  0.02390589           9254
8046    2006-09-13  2006-03-17  0.07167582           5370
8046    2006-09-13  2006-03-17  0.02390589           9254
5492    2006-09-18  2006-03-22  0.07167582           5370
5492    2006-09-18  2006-03-22  0.02390589           9254
8756    2006-10-07  2006-04-10  0.07167582           5370
8756    2006-10-07  2006-04-10  0.02390589           9254
1-10 of 14 rows

那么,现在如何调整代码以解决问题?

So, now how can I adjust the code here to solve the problem?


更大的数据集

A larger dataset


structure(list(id = c(19521L, 19522L, 19523L, 19524L, 19525L, 
19526L, 19527L, 19528L, 19529L, 19530L, 19531L, 0L, 19532L, 19533L, 
19534L, 21971L, 21972L, 21973L, 21974L, 21975L, 21976L, 21977L, 
21978L, 21979L, 21980L, 21981L, 1L, 21761L, 21762L, 21763L, 21764L, 
21765L, 21766L, 21767L, 21982L, 21983L, 21984L, 21768L, 21769L, 
21770L, 21771L, 21772L, 21773L, 2L, 21774L, 21775L, 21776L, 21777L, 
21778L, 21779L, 21780L, 21781L, 21782L, 3L, 21783L, 21784L, 21785L, 
21786L, 21787L, 21788L, 21789L, 21790L, 21791L, 21792L, 21793L, 
21794L, 21795L, 21796L, 4L, 21797L, 21798L, 21799L, 21800L, 21801L, 
21802L, 21803L, 21804L, 21805L, 21806L, 21807L, 21808L, 21809L, 
21810L, 21811L, 21812L, 21813L, 21814L, 21815L, 5L, 21816L, 21817L, 
21818L, 21819L, 21820L, 21821L, 21822L, 21823L, 21824L, 21825L, 
21826L, 21827L, 21828L, 21829L, 21830L, 6L, 21831L, 21832L, 21833L, 
21834L, 21835L, 21836L, 21837L, 21838L, 7L, 21839L, 21840L, 21841L, 
21842L, 21843L, 21844L, 21845L, 21846L, 21847L, 21848L, 21849L, 
21850L, 21851L, 21852L, 21853L, 21854L, 21855L, 21856L, 21857L, 
8L, 21858L, 21859L, 9L, 10L, 21860L, 21861L, 21862L, 21863L, 
21864L, 21865L, 21866L, 21867L, 21868L, 21869L, 21870L, 21871L, 
21872L, 21873L, 21874L, 21875L, 21876L, 21877L, 21878L, 21879L, 
21880L, 21881L, 21882L, 21883L, 21884L, 21885L, 21886L, 21887L, 
21888L, 21889L, 21890L, 21891L, 21892L, 21893L, 21894L, 21895L, 
21896L, 21897L, 21898L, 21899L, 21900L, 11L, 21901L, 21902L, 
21903L, 21904L, 21905L, 21906L, 21907L, 21908L, 21909L, 12L, 
21910L, 21911L, 21912L, 21913L, 21914L, 21915L, 21916L, 21917L, 
21918L, 21919L, 13L, 21920L, 21921L, 21922L, 21923L, 21924L, 
21925L, 21926L, 21927L, 21928L, 21929L, 21930L, 21931L, 21932L, 
21933L, 21934L, 21935L, 21936L, 14L, 21937L, 21938L, 21939L, 
21940L, 21941L, 21942L, 21957L, 21958L, 21959L, 21960L, 21961L, 
21962L, 21963L, 21964L, 15L, 21965L, 21966L, 21967L, 21968L, 
21969L, 21970L, 21985L, 21986L, 21987L, 21988L, 21989L, 21990L, 
21991L, 21992L, 21993L, 21994L, 21995L, 21996L, 16L, 17L, 21551L, 
21552L, 21553L, 21554L, 21555L, 21556L, 21557L, 21558L, 21559L, 
21560L, 21561L, 21562L, 21563L, 21564L, 21565L, 21566L, 21567L, 
21997L, 21998L, 18L, 21568L, 21569L, 21570L, 21571L, 21572L, 
21573L, 21574L, 21575L, 21576L, 21577L, 21578L, 21579L, 21580L, 
21581L, 19L, 21582L, 21583L, 21584L, 21585L, 21586L, 21587L, 
21588L, 21589L, 21590L, 21591L, 21592L, 20L, 21593L, 21594L, 
21595L, 21596L, 21597L, 21598L, 21599L, 21600L, 21601L, 21602L, 
21603L, 21604L, 21605L, 21606L, 21L, 21607L, 21608L, 21609L, 
21610L, 21611L, 21612L, 21613L, 21614L, 21615L, 21616L, 21617L, 
21618L, 21619L, 21620L, 21621L, 21622L, 21623L, 21624L, 21625L, 
21626L, 22L, 21627L, 21628L, 21629L, 21630L, 21631L, 21632L, 
21633L, 21634L, 21635L, 21636L, 21637L, 21638L, 21639L, 21640L, 
21641L, 21642L, 21643L, 21644L, 21645L, 23L, 21646L, 21647L, 
21648L, 21649L, 21650L, 21651L, 21652L, 21653L, 21654L, 21655L, 
21656L, 21657L, 21658L, 24L, 21659L, 21660L, 21661L, 21662L, 
21663L, 21664L, 21665L, 21666L, 21667L, 21668L, 21669L, 25L, 
21670L, 21671L, 21672L, 21673L, 21674L, 21675L, 21676L, 21677L, 
21678L, 21679L, 21680L, 21681L, 21682L, 21683L, 26L, 21684L, 
21685L, 21686L, 21687L, 21688L, 21689L, 21690L, 21691L, 21692L, 
21693L, 21694L, 21695L, 21696L, 21697L, 21698L, 21699L, 21700L, 
21701L, 21702L, 21703L, 27L, 21704L, 21719L, 21720L, 21721L, 
21722L, 21723L, 21724L, 21725L, 21726L, 21727L, 21728L, 21729L, 
21730L, 21731L, 21732L, 28L, 21733L, 21734L, 21735L, 21736L, 
21737L, 21738L, 21739L, 21740L, 29L, 21741L, 21742L, 21743L, 
21744L, 21745L, 21746L, 21747L, 21748L, 21749L, 21750L, 21751L, 
21752L, 21753L, 21754L, 21755L, 21756L, 21757L, 21758L, 30L, 
31L, 32L, 33L, 34L, 35L, 36L, 37L, 21229L, 21230L, 21231L, 21232L, 
21233L, 21234L, 21235L, 21236L, 21237L, 21238L, 21239L, 21240L, 
21241L, 21242L, 21243L, 21244L, 21245L, 21246L, 21247L, 21248L, 
21249L, 21250L, 21251L, 21252L, 21253L, 21254L, 21255L, 21256L, 
21257L, 21258L), from = c("6644", "9843", "9242", "6753", "7075", 
"8685", "5513", "6340", "6042", "5587", "7237", "5695", "9582", 
"8539", "7939", "9077", "8946", "5591", "8380", "5865", "7867", 
"9457", "6968", "7971", "6150", "9361", "9379", "8409", "9740", 
"7226", "7531", "6752", "7362", "6661", "5730", "5417", "9049", 
"7057", "6252", "9476", "6228", "8896", "7371", "8170", "7122", 
"6694", "5450", "9435", "5619", "8289", "9862", "5504", "6555", 
"9845", "7537", "9482", "6810", "8257", "8490", "6588", "9652", 
"7303", "5852", "5746", "9198", "6917", "8688", "9460", "9640", 
"7054", "8628", "7065", "9006", "6832", "6185", "8422", "6914", 
"7069", "7848", "8436", "5494", "6375", "5653", "8912", "9794", 
"8413", "6527", "9101", "5815", "6923", "8184", "6811", "8130", 
"6539", "8643", "6329", "7744", "8211", "9641", "8003", "5599", 
"8715", "7108", "9573", "8583", "5648", "6444", "5660", "8191", 
"9830", "5931", "7921", "6753", "8314", "7940", "6265", "6604", 
"6509", "5618", "5860", "6469", "9525", "5887", "6626", "7145", 
"6862", "5741", "9144", "9862", "9163", "7297", "7599", "8427", 
"8865", "9418", "8636", "6530", "9155", "6934", "8817", "9028", 
"5521", "5943", "7443", "9557", "8239", "6819", "9761", "5983", 
"6830", "6368", "5381", "8782", "8008", "9160", "9862", "8008", 
"9615", "6920", "6164", "6278", "9729", "8960", "6358", "5939", 
"8902", "9522", "7344", "9070", "6594", "8058", "6639", "7896", 
"6325", "7804", "9554", "9725", "8475", "7746", "7536", "9671", 
"9761", "5415", "6837", "8327", "9061", "8981", "9226", "5862", 
"7085", "8925", "6226", "6849", "8432", "9545", "5837", "5440", 
"9732", "8695", "7690", "5829", "9373", "7977", "6361", "7320", 
"7603", "6303", "7077", "7850", "5792", "9588", "9204", "8648", 
"8950", "7106", "6334", "6843", "7060", "9606", "5520", "9725", 
"9350", "7463", "8130", "7947", "9668", "9490", "6241", "8830", 
"6374", "9528", "7919", "8532", "6795", "6934", "8162", "9275", 
"8106", "8615", "9206", "8283", "6265", "7052", "7737", "8422", 
"7815", "9028", "7932", "6125", "6671", "7800", "9835", "5573", 
"7874", "8931", "6748", "8192", "6822", "6950", "8020", "8555", 
"8986", "7644", "5736", "8421", "6224", "8374", "8304", "9101", 
"8677", "9208", "7008", "6074", "9409", "6269", "9721", "9304", 
"9117", "5420", "9691", "7728", "8422", "8579", "7495", "9838", 
"8139", "9571", "5385", "5454", "9620", "7723", "9249", "7033", 
"7966", "5837", "9844", "5793", "5747", "6362", "6925", "9318", 
"6780", "6934", "7150", "6818", "7246", "5514", "9574", "7838", 
"5540", "6646", "6893", "6417", "8039", "8721", "8763", "6401", 
"6510", "7970", "7117", "6001", "7505", "7646", "5600", "6522", 
"8395", "5601", "5418", "6296", "8790", "7622", "9012", "8165", 
"7624", "5468", "9316", "9030", "7155", "5702", "7492", "8503", 
"9868", "6807", "6404", "9076", "7213", "8735", "7849", "8551", 
"9351", "6693", "6795", "9653", "9504", "6948", "9358", "9280", 
"8168", "5456", "9138", "8420", "9312", "8930", "6375", "8695", 
"7699", "6748", "5506", "9475", "5776", "5517", "5644", "8680", 
"5474", "7534", "9363", "9586", "6508", "6193", "5401", "8032", 
"8461", "9387", "5812", "7564", "5917", "5434", "5794", "7840", 
"9085", "8331", "7060", "7175", "6669", "8896", "6352", "7432", 
"9810", "8776", "6934", "6112", "8869", "8248", "9450", "6974", 
"7264", "7336", "6880", "7866", "7777", "7502", "5615", "9777", 
"7371", "9214", "6374", "6039", "7714", "9056", "8358", "8963", 
"8657", "8846", "9319", "7220", "7764", "8967", "8683", "9137", 
"6971", "9747", "7449", "8259", "5373", "7300", "6273", "8391", 
"7862", "5696", "6622", "5456", "9240", "7021", "7313", "7247", 
"6679", "8102", "6812", "9473", "6345", "7935", "9696", "5541", 
"8939", "5417", "6887", "8998", "7977", "9110", "8666", "6670", 
"8975", "7518", "5601", "7549", "7841", "8888", "5808", "9545", 
"9460", "9361", "9807", "6860", "9811", "5935", "8966", "8684", 
"5915", "8892", "8493", "7894", "6342", "6382", "8461", "7833", 
"7201", "7253", "6720", "6175", "9201", "5682", "5473", "7173", 
"6094", "8810", "5874", "6947", "8462", "6885", "6201"), to = c("6934", 
"9115", "9333", "8456", "6510", "7207", "6046", "7047", "6213", 
"9493", "6248", "7468", "8925", "6727", "6912", "6727", "9811", 
"9493", "9251", "6375", "6460", "6375", "8130", "5773", "6510", 
"6951", "6213", "6671", "6153", "6634", "9440", "8220", "8512", 
"8105", "8786", "5773", "6454", "5997", "8374", "7207", "6253", 
"9251", "8456", "7517", "6935", "6143", "8220", "9628", "5837", 
"9115", "6517", "9628", "8078", "6143", "6912", "7047", "6460", 
"7517", "6442", "9333", "6646", "5997", "8395", "6153", "9012", 
"6248", "7468", "8105", "6254", "9811", "7518", "6217", "6951", 
"8551", "9012", "5605", "6671", "7084", "8925", "5985", "8130", 
"5443", "8665", "8657", "8395", "6883", "6334", "8472", "6669", 
"5715", "5409", "8876", "8869", "9450", "5610", "6934", "6043", 
"7253", "6646", "7564", "6934", "5668", "6986", "7382", "6934", 
"8671", "6646", "8336", "9750", "8967", "9137", "8912", "5373", 
"9240", "6934", "8925", "6273", "6566", "6164", "9240", "6145", 
"7247", "7134", "5606", "9682", "5635", "8820", "8763", "7492", 
"5837", "6634", "8323", "6616", "6374", "8678", "7293", "6143", 
"8105", "7843", "6375", "7207", "5997", "9628", "9240", "9811", 
"5837", "8395", "8456", "9811", "9333", "9251", "6153", "6213", 
"6248", "9115", "8925", "6634", "6671", "8130", "6646", "9333", 
"6727", "6510", "6460", "8220", "9493", "9750", "6934", "6912", 
"6951", "7047", "9012", "9750", "5773", "7517", "7468", "8456", 
"7207", "6192", "9131", "6046", "7143", "7047", "6213", "6333", 
"7603", "6248", "9620", "6995", "9770", "5835", "8925", "5614", 
"8846", "8134", "7468", "8887", "8631", "9744", "9251", "6217", 
"6934", "7247", "8697", "6727", "5606", "9664", "6460", "6442", 
"8374", "6334", "9440", "9493", "9845", "7492", "5605", "8078", 
"9202", "6454", "5635", "8657", "8606", "8395", "9037", "5773", 
"6951", "6807", "9770", "8631", "9845", "8512", "6253", "6989", 
"6375", "7248", "8665", "8786", "8887", "5668", "6374", "6883", 
"9519", "8134", "6510", "5443", "6646", "6634", "5373", "7084", 
"6033", "8967", "8105", "9565", "9723", "8925", "7222", "6361", 
"8739", "8739", "6502", "9085", "5980", "5980", "5385", "5773", 
"7001", "9200", "7603", "7471", "9620", "5610", "6794", "9457", 
"8336", "6935", "5409", "5621", "5614", "9664", "7517", "7518", 
"6669", "6517", "6114", "7207", "9628", "9251", "8456", "8078", 
"6935", "6772", "9535", "8869", "7222", "7034", "6986", "6566", 
"8220", "7155", "7446", "9202", "6934", "9333", "6046", "9535", 
"8678", "6273", "6896", "7345", "9115", "8183", "6634", "6254", 
"7471", "9628", "9333", "9457", "9457", "9137", "6043", "8671", 
"6479", "6503", "5715", "7143", "5592", "6912", "7047", "6460", 
"7517", "6143", "9712", "8472", "7382", "6995", "6192", "7518", 
"6145", "8912", "6844", "7253", "7109", "8763", "5997", "5985", 
"6807", "6153", "6329", "7213", "8551", "7564", "7155", "6248", 
"7468", "8105", "5605", "6503", "8820", "5562", "8697", "7109", 
"9811", "6984", "6951", "8323", "9450", "9012", "6616", "5922", 
"9682", "9839", "8041", "5443", "9039", "8178", "7293", "8665", 
"8657", "8846", "7990", "8168", "7646", "8472", "9803", "8041", 
"8879", "9085", "8178", "7624", "8221", "5776", "8422", "9085", 
"8545", "8321", "5473", "6994", "6673", "6934", "7769", "5409", 
"6104", "8876", "7818", "8941", "5610", "7825", "7770", "6043", 
"7253", "8790", "7564", "8178", "8846", "6954", "7382", "6986", 
"6194", "8671", "9741", "5384", "8846", "8653", "6659", "9750", 
"9744", "9138", "9321", "7124", "8912", "5866", "7718", "5468", 
"7321", "6795", "6042", "6566", "6164", "9084", "6507", "9033", 
"6807", "9240", "6540", "6857", "8945", "7134", "5606", "9390", 
"9682", "6359", "8757", "8763", "8280", "7049", "6205", "7604", 
"9729", "7492", "6085", "8239", "6299", "9845", "9240", "8323", 
"6616", "6671", "6669", "8657", "7471", "9744", "5443", "5837", 
"8395", "8551", "8456", "8472", "8374", "5610", "9811", "9682", 
"9333", "9251", "9202", "7603", "6192", "6143", "6153", "6329", 
"6213", "6273", "6248", "7109", "7143", "8041", "8665", "8925", 
"9115", "6634", "6671"), date = structure(c(12784, 12784, 12784, 
12784, 12785, 12785, 12786, 12786, 12786, 12786, 12786, 12787, 
12787, 12787, 12787, 12788, 12788, 12788, 12788, 12789, 12789, 
12790, 12790, 12790, 12790, 12790, 12791, 12791, 12791, 12791, 
12791, 12791, 12791, 12791, 12791, 12791, 12791, 12792, 12792, 
12792, 12792, 12792, 12792, 12793, 12793, 12793, 12793, 12793, 
12794, 12794, 12794, 12794, 12794, 12795, 12795, 12795, 12795, 
12795, 12795, 12795, 12795, 12796, 12796, 12796, 12796, 12796, 
12796, 12796, 12797, 12797, 12797, 12797, 12797, 12797, 12797, 
12797, 12798, 12798, 12799, 12800, 12800, 12800, 12801, 12801, 
12801, 12802, 12802, 12802, 12803, 12803, 12804, 12804, 12804, 
12804, 12804, 12805, 12805, 12805, 12805, 12805, 12806, 12806, 
12806, 12806, 12807, 12807, 12807, 12807, 12807, 12807, 12808, 
12808, 12808, 12809, 12809, 12809, 12809, 12809, 12809, 12809, 
12810, 12810, 12810, 12810, 12810, 12811, 12811, 12811, 12811, 
12812, 12812, 12812, 12812, 12813, 12813, 12813, 12814, 12814, 
12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 
12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 
12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 
12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 12814, 
12814, 12814, 12814, 12815, 12815, 12816, 12816, 12816, 12816, 
12816, 12816, 12816, 12816, 12816, 12816, 12817, 12817, 12817, 
12817, 12817, 12817, 12817, 12818, 12818, 12818, 12818, 12819, 
12819, 12819, 12819, 12819, 12819, 12819, 12819, 12819, 12819, 
12819, 12819, 12819, 12819, 12819, 12819, 12819, 12819, 12820, 
12820, 12820, 12820, 12820, 12820, 12820, 12820, 12820, 12820, 
12820, 12820, 12820, 12820, 12820, 12821, 12821, 12821, 12821, 
12821, 12821, 12821, 12821, 12821, 12821, 12821, 12821, 12821, 
12821, 12821, 12821, 12821, 12821, 12821, 12822, 12822, 12822, 
12822, 12822, 12822, 12822, 12822, 12822, 12822, 12822, 12822, 
12822, 12822, 12822, 12822, 12822, 12822, 12822, 12822, 12822, 
12823, 12823, 12823, 12823, 12823, 12823, 12823, 12823, 12823, 
12823, 12823, 12823, 12823, 12823, 12823, 12824, 12824, 12824, 
12824, 12824, 12824, 12824, 12824, 12824, 12824, 12824, 12824, 
12825, 12825, 12825, 12825, 12825, 12825, 12825, 12825, 12825, 
12825, 12825, 12825, 12825, 12825, 12825, 12826, 12826, 12826, 
12826, 12826, 12826, 12826, 12826, 12826, 12826, 12826, 12826, 
12826, 12826, 12826, 12826, 12826, 12826, 12826, 12826, 12826, 
12827, 12827, 12827, 12827, 12827, 12827, 12827, 12827, 12827, 
12827, 12827, 12827, 12827, 12827, 12827, 12827, 12827, 12827, 
12827, 12827, 12828, 12828, 12828, 12828, 12828, 12828, 12828, 
12828, 12828, 12828, 12828, 12828, 12829, 12829, 12830, 12830, 
12830, 12830, 12830, 12830, 12831, 12831, 12831, 12831, 12831, 
12831, 12832, 12832, 12832, 12832, 12832, 12832, 12832, 12832, 
12833, 12833, 12833, 12833, 12833, 12833, 12833, 12834, 12834, 
12834, 12834, 12834, 12834, 12834, 12834, 12834, 12834, 12834, 
12835, 12835, 12835, 12835, 12835, 12836, 12836, 12836, 12836, 
12836, 12837, 12837, 12837, 12837, 12837, 12837, 12837, 12837, 
12837, 12837, 12838, 12838, 12838, 12838, 12838, 12838, 12839, 
12839, 12839, 12839, 12839, 12839, 12839, 12839, 12839, 12840, 
12840, 12840, 12840, 12840, 12840, 12840, 12841, 12841, 12841, 
12841, 12841, 12841, 12841, 12841, 12841, 12841, 12841, 12841, 
12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 
12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 
12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 
12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 12842, 
12842, 12842), class = "Date"), amount = c(700, 900, 1000, 600, 
400, 1100, 600, 1100, 200, 800, 1000, 700, 300, 800, 800, 5123, 
400, 3401, 700, 500, 700, 3242, 500, 400, 5298, 900, 11832, 300, 
500, 600, 1100, 600, 300, 800, 400, 6774, 300, 200, 400, 14264, 
900, 13851, 17366, 1000, 800, 700, 6007, 500, 400, 6207, 900, 
12644, 800, 4276, 6434, 14779, 4507, 6446, 800, 17477, 1100, 
5009, 1000, 5718, 800, 13967, 6959, 15914, 200, 4470, 600, 800, 
10737, 700, 44749, 1000, 46552, 500, 13156, 1000, 23323, 1100, 
200, 300, 10792, 200, 400, 700, 200, 700, 1100, 1000, 700, 500, 
1100, 7268, 300, 200, 16125, 400, 14440, 700, 900, 300, 49752, 
200, 36518, 500, 900, 300, 900, 1000, 200, 19961, 21899, 12336, 
1100, 200, 700, 1100, 900, 1100, 800, 600, 400, 200, 500, 200, 
200, 38000, 16983, 1000, 300, 1000, 300, 800, 13.4, 42.7, 34700, 
12.6, 47.5, 13.3, 37.1, 17, 11.1, 15.5, 22.2, 55.8, 11.8, 50.1, 
45, 15.9, 38.8, 38.2, 20.1, 38.9, 7.1, 107.1, 48, 62.4, 2900, 
21.5, 19.1, 14, 19.5, 15.2, 5282, 94.7, 19.4, 28.2, 42.7, 110.2, 
0.8, 23.1, 20, 19.6, 2000, 5100, 1100, 200, 11900, 1100, 5500, 
7500, 1100, 800, 6000, 200, 600, 800, 25300, 45647, 1000, 700, 
600, 7000, 700, 900, 300, 2900, 5224, 30192, 24381, 400, 5123, 
23330, 700, 8500, 3191, 23041, 5029, 6238, 3401, 900, 20213, 
7618, 19935, 600, 5859, 3375, 12817, 500, 38645, 1600, 10600, 
5500, 700, 3217, 14626, 4550, 4356, 6689, 600, 3242, 9612, 5080, 
5039, 12785, 4212, 17632, 3395, 200, 3399, 5298, 14493, 28157, 
1800, 31348, 5544, 14100, 33045, 1800, 200, 800, 20066, 400, 
1000, 27666, 500, 600, 700, 700, 3151, 1000, 6774, 800, 1500, 
22452, 1100, 44333, 18347, 200, 600, 6242, 13900, 19746, 400, 
48098, 7041, 9100, 10584, 49590, 3021, 500, 14264, 5400, 13851, 
17366, 1200, 5072, 1100, 1100, 47831, 12015, 5200, 8905, 23524, 
6007, 1000, 300, 22349, 31038, 25200, 43737, 12154, 23736, 24863, 
400, 200, 6207, 29700, 14622, 4758, 5810, 12644, 17477, 19588, 
27078, 32594, 25609, 20281, 700, 900, 6310, 14319, 14400, 6434, 
14779, 4507, 6446, 4276, 9600, 13875, 12043, 4391, 4327, 9000, 
6698, 16392, 700, 15263, 1100, 18729, 5009, 3098, 4729, 5718, 
700, 500, 24400, 9658, 14963, 13967, 6959, 15914, 9800, 20567, 
3058, 600, 18497, 6148, 4470, 400, 10737, 15447, 24009, 44749, 
12138, 900, 800, 1000, 900, 4200, 700, 1100, 1300, 8000, 6000, 
29511, 200, 900, 600, 6600, 1100, 44162, 600, 24023, 900, 400, 
200, 300, 800, 33410, 14800, 400, 800, 500, 500, 19136, 33900, 
4100, 10500, 13400, 600, 700, 3700, 1000, 1000, 100, 3300, 800, 
9400, 45925, 41740, 500, 6200, 8000, 200, 3100, 500, 300, 31332, 
62100, 600, 7100, 28361, 4000, 200, 4500, 900, 900, 900, 1000, 
1500, 300, 2500, 2700, 11000, 300, 800, 900, 8900, 23990, 1100, 
1400, 800, 10700, 1800, 1100, 10900, 900, 200, 5200, 800, 200, 
800, 200, 900, 3900, 900, 600, 900, 18.7, 102, 2.9, 3, 39285, 
52.1, 34.1, 18.5, 21.3, 38.3, 160, 110.5, 58.6, 83.4, 34.7, 68.6, 
31, 20.3, 5.4, 89.3, 110.6, 61.5, 72.7, 13.7, 20.7, 25.9, 2.9, 
50.1, 14, 110, 16.2, 39, 73.8, 23.7, 249, 29.6, 117.3, 199)), row.names = c(NA, 
-500L), class = "data.frame")


推荐答案

Consider re-writing your closeness_func as follows:

Consider re-writing your closeness_func as follows:

closeness_info <- function(from, DT) {
  DT <- DT[, .(weights = sum(amount)), by = .(from, to)]
  res <- closeness_w(DT, TRUE, FALSE, alpha = 0.5)
  `names<-`(res[, "n.closeness"], row.names(res))[from]
}

Then you can use the following data.table operation to achieve your goal:

Then you can use the following data.table operation to achieve your goal:

set_closeness <- function(DT) {
  DT[, closeness_directed := closeness_info(.SD$from, DT[between(date, .BY$date - 180, .BY$date), ]), by = date]
}

Now let’s benchmark the performances of this implementation against your original one with the long dataset you provide. Here I call it df. We first create two copies of that dataset. This is necessary as data.table by default uses pass-by-reference. If we do not create two copies of the dataset, then the following tests will always be applied to the same object (i.e. df).

Now let's benchmark the performances of this implementation against your original one with the long dataset you provide. Here I call it df. We first create two copies of that dataset. This is necessary as data.table by default uses pass-by-reference. If we do not create two copies of the dataset, then the following tests will always be applied to the same object (i.e. df).

data1 <- copy(df)
data2 <- copy(df)

We apply your original implementation to data1:

We apply your original implementation to data1:

> system.time({
+   closeness_fnc <- function(data){
+     accounts <- data[date == max(date),from]
+     id <- data[date == max(date),id]  
+     
+     # for directed networks
+     df <- data %>% group_by(from, to) %>% mutate(weights = sum(amount)) %>% select(from, to, weights) %>% distinct
+     cl <- closeness_w(df, directed = T, gconly=FALSE, alpha = 0.5) 
+     
+     list(
+       id = id,
+       closeness_directed = cl[,"n.closeness"][accounts]
+     )
+     
+   }
+   network_data <- data1[, closeness_fnc(data1[(date >= end_date - 180) & (date <= end_date)]), .(end_date = date)] %>% select(-end_date)
+   data1 <- merge(x = data1, y = network_data, by = "id")
+ })
   user  system elapsed 
   1.19    0.07    1.26

Then we apply the new implementation above to data2:

Then we apply the new implementation above to data2:

> system.time({
+   closeness_info <- function(from, DT) {
+     DT <- DT[, .(weights = sum(amount)), by = .(from, to)]
+     res <- closeness_w(DT, TRUE, FALSE, alpha = 0.5)
+     `names<-`(res[, "n.closeness"], row.names(res))[from]
+   }
+   set_closeness <- function(DT) {
+     DT[, closeness_directed := closeness_info(.SD$from, DT[between(date, .BY$date - 180, .BY$date), ]), by = date]
+   }
+   set_closeness(data2)
+ })
   user  system elapsed 
   0.33    0.07    0.40 

Do they produce the same set of results?

Do they produce the same set of results?

> all(data1[order(id), ] == data2[order(id), ])
[1] TRUE

Output:

> data2[order(id), ]
        id from   to       date amount closeness_directed
  1:     0 5695 7468 2005-01-04    700       0.0344016544
  2:     1 9379 6213 2005-01-08  11832       0.0492013976
  3:     2 8170 7517 2005-01-10   1000       0.0097043019
  4:     3 9845 6143 2005-01-12   4276       0.0142486370
  5:     4 9640 6254 2005-01-14    200       0.0022874217
 ---                                                     
496: 21994 6671 6033 2005-02-07  14100       0.0064464840
497: 21995 7800 8967 2005-02-07  33045       0.0098688428
498: 21996 9835 8105 2005-02-07   1800       0.0023032952
499: 21997 7008 6794 2005-02-08    200       0.0006955321
500: 21998 6074 9457 2005-02-08    600       0.0026025058

I think this might be the best performance you can get given that you do not want to rewrite the closeness_w function, which is probably the bottleneck here as per David’s advice.

I think this might be the best performance you can get given that you do not want to rewrite the closeness_w function, which is probably the bottleneck here as per David's advice.

这篇关于如何在R中的data.table中使用自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆