过滤第一列中仅包含字母的行 [英] Filter lines that have only alphabets in first column

查看：73 发布时间：2020/11/12 22:22:12 regex unix awk sed grep

本文介绍了过滤第一列中仅包含字母的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Google英语1gram数据集

I am working with google english 1gram dataset link here, it looks like the following:

C'ape   1804    1       1
C'ape   1821    1       1
C'ape   1826    1       1
C'ape   1838    2       2
C'ape   1844    1       1
C'ape   1869    1       1
C'ape   1874    1       1
C'ape   1878    2       2
C'ape   1879    1       1
C'ape   1880    1       1
CABMEL  1873    1       1
CABMEL  1874    1       1
CABMEL  1875    1       1
CABMEL  1879    1       1
CABMEL  1884    1       1
CABMEL  1890    1       1
CABMEL  1899    1       1
CABMEL  1901    1       1
CABMEL  1903    3       2
CABMEL  1910    2       2
CABMEL  1912    1       1
CABMEL  1915    1       1
CABMEL  1926    2       2
CABMEL  1927    3       2
CABMEL  1928    4       2
CABMEL  1930    2       2

至少有4列，有些行也包含5.第一列是1克字符串，我只想提取那些在第一列中仅包含字母(大写或小写)的行仅字母).我认为grep应该这样做，但是我找不到正确的正则表达式来完成这项工作.任何可以轻松完成工作的Unix实用程序? 我相信列是制表符分隔的.

At least 4 columns, and some rows also contain 5. First column is a 1-gram, a string, I want to extract only those lines which have a string in first column that only contains letters (upper case or lower case alphabets only). I am thinking grep should do it but I cannot find the correct regex to do this job. Any unix utilty that can easily get the job done? Columns are tab delimited I believe.

输出将仅包含带有CABMEL的行

Output will contain only the lines with CABMEL

推荐答案

使用Perl:

# Match all lines that start with a-z or A-Z and are followed by a space
perl -ne 'print if m/^[a-z]+\s/i' file

使用awk:

# Match first field's that only contain a-z or A-Z
awk '$1 ~ /^[a-zA-Z]+$/' file

两者都将输出:

CABMEL  1873    1       1
CABMEL  1874    1       1
CABMEL  1875    1       1
CABMEL  1879    1       1
CABMEL  1884    1       1
CABMEL  1890    1       1
CABMEL  1899    1       1
CABMEL  1901    1       1
CABMEL  1903    3       2
CABMEL  1910    2       2
CABMEL  1912    1       1
CABMEL  1915    1       1
CABMEL  1926    2       2
CABMEL  1927    3       2
CABMEL  1928    4       2
CABMEL  1930    2       2

这篇关于过滤第一列中仅包含字母的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

过滤第一列中仅包含字母的行 [英] Filter lines that have only alphabets in first column

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

过滤第一列中仅包含字母的行 [英] Filter lines that have only alphabets in first column

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭