将后缀附加到csv文件(或SQLite数据库)中的条目列 [英] Append suffix to column of entries in csv file (or in SQLite database)

查看:196
本文介绍了将后缀附加到csv文件(或SQLite数据库)中的条目列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较大的csv文件(1.2gb ...大到我的电脑上的2gb RAM)。对于一列中的每个条目,我想添加1C,以便我可以与另一个dataframe / db表连接/合并。



如果文件不是大,则很容易使用 read.csv 导入到 data ,然后使用数据$ symbol< - paste(data $ symbol,1C,sep =)。但现在我得到不能分配大小x 警告的向量。



是一个手动解决方案, code> scan(),我唯一的选择? (我有点害怕破坏我的数据)谢谢!

解决方案

使用扫描

确保数据只有您需要合并的列,并在您尝试粘贴命令()之前运行 gc()



如果失败,请查看一些此主题中的解决方案。 p>




UPDATE:

如果你碰巧使用* nix的味道, Rtools安装在Windows上,你可以用 gawk 来做到这一点。如果您的数据位于 foo.csv 中,并且您希望将C1添加到第二列,则会创建一个新文件 bar。 csv ,第二列附加C1。

  compy:/ home / josh 
> cat foo.csv
1,one,2,two
3,three,4,four
5,five,6,six

compy:/ home / josh
> gawk -F,'{OFS =,; $ 2 =($ 21C); print}'< foo.csv> bar.csv

compy:/ home / josh
> cat bar.csv
1,one1C,2,two
3,three1C,4,four
5,five1C,6,six
pre>

这可能会比R快,并且消耗可以忽略的内存量。


I have a relatively large csv files (1.2gb... large to the 2gb RAM on one of my computers). To every entry in one column I would like to append "1C" so that I can join/merge with another dataframe/db table.

If the file weren't so large, it would be easy to use read.csv to import to data then use data$symbol <- paste(data$symbol, "1C", sep=""). But now I get the can't allocate vector of size x warning.

Is a manual solution, like scan(), my only option? (I'm a bit afraid of corrupting my data) Thanks!

解决方案

Using scan isn't going to help if you can already get your data into R.

Make sure data only has the columns you need to merge, and run gc() before you try your paste command (gc will help if you're near the margin of your memory limit).

If that fails, look at some of the solutions in this thread.


UPDATE:
And if you happen to be using a flavor of *nix, or if you have Rtools installed on windows, you could do this with gawk. If your data are in foo.csv and you want to add the "C1" to the second column, this will create a new file, bar.csv, with "C1" appended to the second column.

compy: /home/josh
> cat foo.csv 
1,one,2,two
3,three,4,four
5,five,6,six

compy: /home/josh
> gawk -F "," '{OFS=","; $2=($2 "1C"); print}' < foo.csv > bar.csv

compy: /home/josh
> cat bar.csv 
1,one1C,2,two
3,three1C,4,four
5,five1C,6,six

This will likely be faster than R and will consume a negligible amount of memory.

这篇关于将后缀附加到csv文件(或SQLite数据库)中的条目列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆