根据给定的ID列表从文本文件中提取所有行 [英] extract all lines from text file based on a given list of IDs

查看：200 发布时间：2020/9/15 7:03:37 perl unix awk

本文介绍了根据给定的ID列表从文本文件中提取所有行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有2个文本文件. file1包含ID列表:

I have 2 text files. file1 contains a list of IDs:

file2:

10993   item    0
11002   item    6
10995   item    7
79600   item    7
439481  item    5
272557  item    7
224325  item    7
84156   item    6
572546  item    7
693661  item    7
.....

我试图从file2中选择ID(第一列)在file1中的所有行.当前，我正在做的是循环遍历第一个文件以创建一个正则表达式，例如:

I am trying to select all lines from file2 where the ID (first column) is in file1. Currently, what I am doing is to loop through the first file to create a regex like:

^\b11002\b\|^\b10995\b\|^\b48981\b|^\b79600\b

然后运行:

grep '^11002\|^10995\|^48981|^79600' file2.txt

但是，当file1中的ID数量太大(〜2000)时，正则表达式会变得很长，而grep会变得很慢.还有另一种方法吗?我正在使用Perl + Awk + Unix.

But when the number of IDs in file1 is too large (~2000), the regular expression becomes quite long and grep becomes slow. Is there another way? I am using Perl + Awk + Unix.

推荐答案

使用哈希表 .它可能会占用大量内存，但查找的时间是固定的.这是一个有效且正确的过程-不仅是一个过程，而且是有效且正确的-创建哈希表，使用file1作为键，并使用file2在哈希表中查找键.如果哈希表中有键，则该行将打印到标准输出:

Use a hash table. It can be memory-intensive but lookups are in constant time. This is an efficient and correct procedure — not the only one, but efficient and correct — for creating a hash table, using file1 as keys and file2 for looking up keys in the hash table. If a key is in the hash table, the line is printed to standard output:

#!/usr/bin/env perl

use strict;
use warnings;

open FILE1, "< file1" or die "could not open file1\n";
my $keyRef;
while (<FILE1>) {
   chomp;
   $keyRef->{$_} = 1;
}
close FILE1;

open FILE2, "< file2" or die "could not open file2\n";
while (<FILE2>) {
    chomp;
    my ($testKey, $label, $count) = split("\t", $_);
    if (defined $keyRef->{$testKey}) {
        print STDOUT "$_\n";
    }
}
close FILE2;

在Perl中有很多方法可以做同样的事情.就是说，我看中清晰性而不是花哨的晦涩难懂，因为您永远不知道何时必须返回Perl脚本并进行更改，而且它们很难按原样进行管理.一个人的意见.

There are lots of ways to do the same thing in Perl. That said, I value clarity and explicitness over fancy obscurity, because you never know when you have to come back to a Perl script and make changes, and they are hard enough to manage, as it is. One person's opinion.

这篇关于根据给定的ID列表从文本文件中提取所有行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据给定的ID列表从文本文件中提取所有行 [英] extract all lines from text file based on a given list of IDs

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

根据给定的ID列表从文本文件中提取所有行 [英] extract all lines from text file based on a given list of IDs

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭