将后缀附加到csv文件(或SQLite数据库)中的条目列 [英] Append suffix to column of entries in csv file (or in SQLite database)
问题描述
我有一个相对较大的csv文件(1.2gb ...大到我的电脑上的2gb RAM)。对于一列中的每个条目,我想添加1C,以便我可以与另一个dataframe / db表连接/合并。
如果文件不是大,则很容易使用 read.csv
导入到 data
,然后使用数据$ symbol< - paste(data $ symbol,1C,sep =)
。但现在我得到不能分配大小x
警告的向量。
是一个手动解决方案, code> scan(),我唯一的选择? (我有点害怕破坏我的数据)谢谢!
使用扫描$
确保数据
只有您需要合并的列,并在您尝试粘贴
命令()之前运行
gc()
如果失败,请查看一些此主题中的解决方案。 p>
UPDATE:
如果你碰巧使用* nix的味道, Rtools安装在Windows上,你可以用 gawk
来做到这一点。如果您的数据位于 foo.csv
中,并且您希望将C1添加到第二列,则会创建一个新文件 bar。 csv
,第二列附加C1。
compy:/ home / josh
pre>
> cat foo.csv
1,one,2,two
3,three,4,four
5,five,6,six
compy:/ home / josh
> gawk -F,'{OFS =,; $ 2 =($ 21C); print}'< foo.csv> bar.csv
compy:/ home / josh
> cat bar.csv
1,one1C,2,two
3,three1C,4,four
5,five1C,6,six
这可能会比R快,并且消耗可以忽略的内存量。
I have a relatively large csv files (1.2gb... large to the 2gb RAM on one of my computers). To every entry in one column I would like to append "1C" so that I can join/merge with another dataframe/db table.
If the file weren't so large, it would be easy to use
read.csv
to import todata
then usedata$symbol <- paste(data$symbol, "1C", sep="")
. But now I get thecan't allocate vector of size x
warning.Is a manual solution, like
scan()
, my only option? (I'm a bit afraid of corrupting my data) Thanks!解决方案Using
scan
isn't going to help if you can already get your data into R.Make sure
data
only has the columns you need to merge, and rungc()
before you try yourpaste
command (gc
will help if you're near the margin of your memory limit).If that fails, look at some of the solutions in this thread.
UPDATE:
And if you happen to be using a flavor of *nix, or if you have Rtools installed on windows, you could do this withgawk
. If your data are infoo.csv
and you want to add the "C1" to the second column, this will create a new file,bar.csv
, with "C1" appended to the second column.compy: /home/josh > cat foo.csv 1,one,2,two 3,three,4,four 5,five,6,six compy: /home/josh > gawk -F "," '{OFS=","; $2=($2 "1C"); print}' < foo.csv > bar.csv compy: /home/josh > cat bar.csv 1,one1C,2,two 3,three1C,4,four 5,five1C,6,six
This will likely be faster than R and will consume a negligible amount of memory.
这篇关于将后缀附加到csv文件(或SQLite数据库)中的条目列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!