在Linux中使用不同的分隔符从文本文件中提取列 [英] Extracting columns from text file with different delimiters in Linux
问题描述
我有非常大的基因型文件,基本上无法在R中打开,因此我尝试使用linux命令行提取感兴趣的行和列.使用头/尾行很简单,但是我很难弄清楚如何处理列.
I have very large genotype files that are basically impossible to open in R, so I am trying to extract the rows and columns of interest using linux command line. Rows are straightforward enough using head/tail, but I'm having difficulty figuring out how to handle the columns.
如果我尝试使用
cut -c100-105 myfile >outfile
如果每列中包含多个字符的字符串,则这显然不起作用.是否有某种方法可以使用适当的参数修改cut,以便提取列中定义为空格或制表符(或其他任何字符)的列中的整个字符串?
this obviously won't work if there are strings of multiple characters in each column. Is there some way to modify cut with appropriate arguments so that it extracts the entire string within a column, where columns are defined as space or tab (or any other character) delimited?
推荐答案
如果该命令应同时使用制表符和空格作为分隔符,则我将使用awk
:
If the command should work with both tabs and spaces as the delimiter I would use awk
:
awk '{print $100,$101,$102,$103,$104,$105}' myfile > outfile
只要您只需要指定5个字段,就可以直接键入它们,对于更长的范围,您可以使用for
循环:
As long as you just need to specify 5 fields it is imo ok to just type them, for longer ranges you can use a for
loop:
awk '{for(i=100;i<=105;i++)print $i}' myfile > outfile
如果要使用cut
,则需要使用-f
选项:
If you want to use cut
, you need to use the -f
option:
cut -f100-105 myfile > outfile
如果字段分隔符与TAB
不同,则需要使用-d
进行指定:
If the field delimiter is different from TAB
you need to specify it using -d
:
cut -d' ' -f100-105 myfile > outfile
检查手册页以获取有关cut命令的更多信息.
Check the man page for more info on the cut command.
这篇关于在Linux中使用不同的分隔符从文本文件中提取列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!