从Vowpal Wabbit的内存中读取数据? [英] Read data from memory in Vowpal Wabbit?

查看:518
本文介绍了从Vowpal Wabbit的内存中读取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法发送数据训练模型在Vowpal Wabbit没有写入磁盘?



这是我想要做的。我有一个相对较大的数据集在csv(约2gb)适合内存没有问题。我把它加载到一个数据框架,我有一个函数,将数据框架中的数据转换为VW格式。



现在,为了训练一个模型,我必须先将转换后的数据写入文件,然后将该文件提供给VW。写入磁盘部分太长了,特别是因为我想尝试不同的具有不同特征转换的各种模型,因此我必须多次将数据写入磁盘。



所以,假设我可以在R中创建一个字符向量,其中每个元素是一个VW格式的数据行,我如何将它馈送到VW而不写入磁盘?



我考虑使用守护进程模式并将字符向量写入localhost连接,但是我无法在守护进程模式下获得VW到 train - 我不是确保这是可能的。



我愿意使用c ++(通过Rcpp包)如果必要使这项工作。



非常感谢您提前。



更新:



感谢大家的帮助。如果有人感兴趣,我只是输出到VW的建议在答案,如下:

 #两个示例行data 
datarows < - c(1 | name 1:1 2:4 4:1,-1 | name 1:1 4:1)
#打开到VW $ b的连接$ b con< - pipe(vw -f my_model.vw)
#写入连接并关闭
writeLines(datarows,con)
close(con)


解决方案

Vowpal Wabbit支持从标准输入读取数据(cat train.dat | vw) ,所以你可以直接从R打开一个管道。



守护进程模式支持训练。如果你需要增量/连续学习,你可以使用一个虚拟示例,其标记以字符串save开头。您也可以指定模型文件名:

  1 save_filename | 

另一种选择是使用VW作为库,请参阅示例



请注意,VW支持各种使用特征命名空间的特征工程。


Is there a way to send data to train a model in Vowpal Wabbit without writing it to disk?

Here's what I'm trying to do. I have a relatively large dataset in csv (around 2gb) which fits in memory with no problem. I load it in R into a data frame, and I have a function to convert the data in that dataframe into VW format.

Now, in order to train a model, I have to write the converted data to a file first, and then feed that file to VW. And the writing to disk part takes way too long, especially since I want to try different various models with different feature transformations, and thus I have to write the data to disk multiple times.

So, assuming I'm able to create a character vector in R, in which each element is a row of data in VW format, how could I feed that into VW without writing it to disk?

I considered using the daemon mode and writing the character vector to a localhost connection, but I couldn't get VW to train in daemon mode -- I'm not sure this is even possible.

I'm willing to use c++ (through the Rcpp package) if necessary to make this work.

Thank you very much in advance.

UPDATE:

Thank you everyone for your help. In case anyone's interested, I just piped the output to VW as suggested in the answer, like so:

# Two sample rows of data
datarows <- c("1 |name 1:1 2:4 4:1", "-1 |name 1:1 4:1")
# Open connection to VW
con <- pipe("vw -f my_model.vw")
# Write to connection and close
writeLines(datarows, con)
close(con)

解决方案

Vowpal Wabbit supports reading data from standard input (cat train.dat | vw), so you can open a pipe directly from R.

Daemon mode supports training. If you need incremental/contiguous learning, you can use a trick with a dummy example whose tag starts with string "save". Optionally you can specify the model filename as well:

1 save_filename| 

Yet another option is to use VW as library, see an example.

Note that VW supports various feature engineering using feature namespaces.

这篇关于从Vowpal Wabbit的内存中读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆