用3种不同的方式搜索文件 [英] Searching a file in 3 different ways

查看:47
本文介绍了用3种不同的方式搜索文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在编写一种程序,该程序以3种不同的方式搜索文件.但是首先,要在命令行中选择要使用的搜索程序.

I have been writing a program that searches a file in 3 different ways. But firstly, to choose which search program to use is differentiated in the command line.

例如,在命令行中键入:

For example in the command line I type:

程序1搜索:python file.py'search_term''待搜索文件'

Program 1 search: python file.py 'search_term' 'file-to-be-searched'

程序2搜索:python file.py -z'数字''search_term'待搜索文件"

program 2 search: python file.py -z 'number' 'search_term' 'file-to-be-searched'

程序3搜索:python file.py -x'search_term''待搜索文件'

program 3 search: python file.py -x 'search_term' 'file-to-be-searched'

所有3个搜索脚本都在file.py中.

All 3 search scripts are in the file.py.

到目前为止,我的编码是:

The coding I have so far is:

import re
import sys
#program 1
search_term = sys.argv[1]
f = sys.argv[2]

for line in open(f, 'r'):
    if re.search(search_term, line):
     print line,

# Program 2
flag = sys.argv[1]
num = sys.argv[2]
search_term = sys.argv[3]
f = sys.argv[4]

#program 3
flag = sys.argv[1]
search_term = sys.argv[2]
f = sys.argv[3]

for line in open(f, 'r'):
 if re.match(search_term, line):
  print line,

程序1正常运行,那就没问题.程序2在文件中找到搜索词并在"number"参数定义的词之前和之后打印出许多行,但是我不知道该怎么做.程序3从搜索项中找到确切的匹配项,并打印出search_term之后的所有行.re.match是不足够的,因为它仅从字符串的开头搜索而没有考虑其余部分.

Program 1 works fine thats no problem. Program 2, finds the search-term in the file and prints out a number of lines before and after it defined by the 'number' parameter, but i have no idea about how to do this. Program 3 finds the exact match from the search-term and prints out all the lines after the search_term. re.match is inadequate because it only searches from the beginning of a string it does not consider the rest.

我的最后一个问题是如何区分这三个程序?在命令行中使用这些标志还是没有标志?

My final problem how would I differentiate between the three programs? using the flags or no flag from the command line?

任何帮助将不胜感激.

谢谢

推荐答案

首先,您应该看一下两个非常有用的Python模块:

First of all you should look at two very useful Python modules:

  • fileinput: Iterate over lines from multiple input streams
  • optparse: A powerful command line option parser

fileinput将帮助您从多个文件中读取行,甚至在需要时对其进行修改.使用这些工具,您的程序将更容易扩展和阅读

fileinput will help you read lines from several files and even modify them if you need. You'll program will be much easier to extend and read with these tools

这里是一个例子:

import fileinput
import optparse

if __name__ == '__main__':
    parser = optparse.OptionParser()
    parser.add_option("-z", dest="z", help="Description here")
    parser.add_option("-x", dest="x", help="Description here")
    options, args = parser.parse_args()
    search_term = args[0]
    for line in fileinput.input(args[1:]):
        process(line)

要进行匹配,您可以使用 re.search而不是re.match.文档中的示例:

For matching you can use re.search instead of re.match. An example from the docs:

>>> re.match("o", "dog")  # No match as "o" is not the first letter of "dog".
>>> re.search("o", "dog") # Match as search() looks everywhere in the string.
<_sre.SRE_Match object at ...>


编辑:回答杰西卡的评论

例如,在我的档案中这个词:动物园,动物园和动物学.如果我输入"zoo"作为搜索类型,请全部输入3将被淘汰,而不仅仅是zo0

say for example in my in my file i had the words: zoo, zoos and zoological. If i typed zoo as my search type all 3 would be retured rather than just zo0

您可以将搜索词包装在\ b中以仅匹配单词,例如:

You could wrap the search term in \b to only match the word for example:

>>> re.search(r'\bzoo\b', 'test zoo')
<_sre.SRE_Match object at 0xb75706e8>
>>> re.search(r'\bzoo\b', 'test zoos')
>>> re.search(r'\bzoo\b', 'test zoological')

\ b匹配一个空字符串,但仅在单词的开头或结尾.

\b matches an empty string, but only at the beginning or end of a word.

因此,您可以在脚本中执行以下操作:

So in your script you can do this:

searchterm = r'\b%s\b' % searchterm

注意:这里的r很重要,否则您必须转义'\'

Note: the r here is important otherwise you have to escape the '\'

这篇关于用3种不同的方式搜索文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆