从Vowpal Wabbit的内存中读取数据? [英] Read data from memory in Vowpal Wabbit?
问题描述
有没有办法发送数据训练模型在Vowpal Wabbit没有写入磁盘?
这是我想要做的。我有一个相对较大的数据集在csv(约2gb)适合内存没有问题。我把它加载到一个数据框架,我有一个函数,将数据框架中的数据转换为VW格式。
现在,为了训练一个模型,我必须先将转换后的数据写入文件,然后将该文件提供给VW。写入磁盘部分太长了,特别是因为我想尝试不同的具有不同特征转换的各种模型,因此我必须多次将数据写入磁盘。
所以,假设我可以在R中创建一个字符向量,其中每个元素是一个VW格式的数据行,我如何将它馈送到VW而不写入磁盘?
我考虑使用守护进程模式并将字符向量写入localhost连接,但是我无法在守护进程模式下获得VW到 train - 我不是确保这是可能的。
我愿意使用c ++(通过Rcpp包)如果必要使这项工作。
非常感谢您提前。
更新:
感谢大家的帮助。如果有人感兴趣,我只是输出到VW的建议在答案,如下:
#两个示例行data
datarows < - c(1 | name 1:1 2:4 4:1,-1 | name 1:1 4:1)
#打开到VW $ b的连接$ b con< - pipe(vw -f my_model.vw)
#写入连接并关闭
writeLines(datarows,con)
close(con)
解决方案Vowpal Wabbit支持从标准输入读取数据(cat train.dat | vw) ,所以你可以直接从R打开一个管道。
守护进程模式支持训练。如果你需要增量/连续学习,你可以使用一个虚拟示例,其标记以字符串save开头。您也可以指定模型文件名:
1 save_filename |
另一种选择是使用VW作为库,请参阅示例。
请注意,VW支持各种使用特征命名空间的特征工程。
Is there a way to send data to train a model in Vowpal Wabbit without writing it to disk?
Here's what I'm trying to do. I have a relatively large dataset in csv (around 2gb) which fits in memory with no problem. I load it in R into a data frame, and I have a function to convert the data in that dataframe into VW format.
Now, in order to train a model, I have to write the converted data to a file first, and then feed that file to VW. And the writing to disk part takes way too long, especially since I want to try different various models with different feature transformations, and thus I have to write the data to disk multiple times.
So, assuming I'm able to create a character vector in R, in which each element is a row of data in VW format, how could I feed that into VW without writing it to disk?
I considered using the daemon mode and writing the character vector to a localhost connection, but I couldn't get VW to train in daemon mode -- I'm not sure this is even possible.
I'm willing to use c++ (through the Rcpp package) if necessary to make this work.
Thank you very much in advance.
UPDATE:
Thank you everyone for your help. In case anyone's interested, I just piped the output to VW as suggested in the answer, like so:
# Two sample rows of data datarows <- c("1 |name 1:1 2:4 4:1", "-1 |name 1:1 4:1") # Open connection to VW con <- pipe("vw -f my_model.vw") # Write to connection and close writeLines(datarows, con) close(con)
解决方案Vowpal Wabbit supports reading data from standard input (cat train.dat | vw), so you can open a pipe directly from R.
Daemon mode supports training. If you need incremental/contiguous learning, you can use a trick with a dummy example whose tag starts with string "save". Optionally you can specify the model filename as well:
1 save_filename|
Yet another option is to use VW as library, see an example.
Note that VW supports various feature engineering using feature namespaces.
这篇关于从Vowpal Wabbit的内存中读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!