过滤第一列中仅包含字母的行 [英] Filter lines that have only alphabets in first column
本文介绍了过滤第一列中仅包含字母的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
I am working with google english 1gram dataset link here, it looks like the following:
C'ape 1804 1 1
C'ape 1821 1 1
C'ape 1826 1 1
C'ape 1838 2 2
C'ape 1844 1 1
C'ape 1869 1 1
C'ape 1874 1 1
C'ape 1878 2 2
C'ape 1879 1 1
C'ape 1880 1 1
CABMEL 1873 1 1
CABMEL 1874 1 1
CABMEL 1875 1 1
CABMEL 1879 1 1
CABMEL 1884 1 1
CABMEL 1890 1 1
CABMEL 1899 1 1
CABMEL 1901 1 1
CABMEL 1903 3 2
CABMEL 1910 2 2
CABMEL 1912 1 1
CABMEL 1915 1 1
CABMEL 1926 2 2
CABMEL 1927 3 2
CABMEL 1928 4 2
CABMEL 1930 2 2
至少有4列,有些行也包含5.第一列是1克字符串,我只想提取那些在第一列中仅包含字母(大写或小写)的行仅字母).我认为grep应该这样做,但是我找不到正确的正则表达式来完成这项工作.任何可以轻松完成工作的Unix实用程序? 我相信列是制表符分隔的.
At least 4 columns, and some rows also contain 5. First column is a 1-gram, a string, I want to extract only those lines which have a string in first column that only contains letters (upper case or lower case alphabets only). I am thinking grep should do it but I cannot find the correct regex to do this job. Any unix utilty that can easily get the job done? Columns are tab delimited I believe.
输出将仅包含带有CABMEL的行
Output will contain only the lines with CABMEL
推荐答案
使用Perl:
# Match all lines that start with a-z or A-Z and are followed by a space
perl -ne 'print if m/^[a-z]+\s/i' file
使用awk:
# Match first field's that only contain a-z or A-Z
awk '$1 ~ /^[a-zA-Z]+$/' file
两者都将输出:
CABMEL 1873 1 1
CABMEL 1874 1 1
CABMEL 1875 1 1
CABMEL 1879 1 1
CABMEL 1884 1 1
CABMEL 1890 1 1
CABMEL 1899 1 1
CABMEL 1901 1 1
CABMEL 1903 3 2
CABMEL 1910 2 2
CABMEL 1912 1 1
CABMEL 1915 1 1
CABMEL 1926 2 2
CABMEL 1927 3 2
CABMEL 1928 4 2
CABMEL 1930 2 2
这篇关于过滤第一列中仅包含字母的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文