如何使用 Python 中的函数搜索文件中的特定行并将它们写入另一个文件 [英] How to search specific lines in a file and write them to another file, using function in Python

查看:131
本文介绍了如何使用 Python 中的函数搜索文件中的特定行并将它们写入另一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是构建一个日志解析器,它将复制我想要的关键字之间的选定行并写入文件.由于我必须在单个文件中的多个关键字之间进行搜索,因此我想到编写一个函数并在我的脚本中多次使用它.

My aim is build a log parser which will copy selected lined between keywords I want and write to a file. Since I would have to search between multiple keywords in single file hence I thought of writing a function and using it multiple times in my script.

但是,我无法使用以下脚本实现此目的并出现错误:

However I am unable to achieve this with following script and getting an error:

import re

def myfunc (infile ,outfile, search1 , search2):

    fi =  infile.readlines()
    fo =  open(outfile, 'w')

    write1 = False
    for line in fi:
     if re.findall('search1' , str(line)):
        write1 = True
     elif re.findall('search2', str(line)):
        write1 = False
     elif write1:
        fo.write(line)

    fo.close()
    fi.close()

    return;

text_file = open(input("name of inputfile : "))
resultfile =  input("name of outputfile : ")

search1 = "teen"
search2 = "eight"
myfunc (text_file , resultfile , search1 , search2)

我收到以下错误:

Traceback (most recent call last):
  File "C:/Users/zoro/PycharmProjects/text-parsing/write selected test 2 sets.py", line 38, in <module>
    myfunc (text_file , resultfile , search1 , search2)
  File "C:/Users/zoro/PycharmProjects/text-parsing/write selected test 2 sets.py", line 28, in myfunc
    fi.close()
AttributeError: 'list' object has no attribute 'close'

推荐答案

fi = infile.readlines()

这使 fi 成为文件 infile 中的行列表.因此,当您稍后调用 fi.close() 时,您正在尝试关闭一个列表,这当然不起作用.

This makes fi a list of lines in the file infile. So when you later call fi.close(), you are trying to close a list, which of course does not work.

相反,您需要关闭文件,即 infile:

Instead, you need to close the file, i.e. infile:

infile.close()

一般来说,以某种方式更改变量名称是一个好主意,以便它们包含的内容一目了然.infile 是您从中读取的文件对象,所以没关系.outfile 是您要写入的文件的文件名,因此您应该将其命名为 outFileName 或其他名称.fiinfile 中的行列表,因此您应该将其命名为 inFileLines.

In general, it’s a good idea to change the variable names in a way so it’s obvious what they contain. infile is a file object which you read from, so that’s okay. outfile is a file name of the file you want to write to, so you should name it outFileName or something instead. fi is a list of lines in the infile, so you should call it maybe inFileLines.

您还应该避免手动关闭文件对象;相反,使用 with 语句来确保它们自动关闭:

You should also avoid having to close file objects manually; instead, use the with statement to make sure that they are closed automatically:

with open(outfile, 'w') as fo:
    fo.write('stuff')
    # no need to manually close it

最后,您的代码还有一个问题:re.findall('search1' , str(line)) 这将搜索字符串 'search1'线;它不会尊重传递给函数并存储在 search1(和 search2)变量中的值.因此,您需要删除那里的引号:re.findall(search1, line)(您也不需要将该行转换为字符串).

Finally, there is another issue with your code: re.findall('search1' , str(line)) This will search for the string 'search1' in the line; it will not respect the values that are being passed to the function and are stored in the search1 (and search2) variables. So you need to remove the quotes there: re.findall(search1, line) (you also don’t need to convert the line to a string).

此外,如果您仅评估其真值,则使用 re.findall() 并不是最好的方法.相反,使用 re.search 只返回第一个结果(所以对于很长的行,如果你已经找到结果,你就不会继续搜索).如果 search1search2 不包含实际的正则表达式而只包含您想在行中查找的字符串,那么您也应该只使用 in运算符:

Also, using re.findall() if you only evaluate its truth-value is not really the best way. Instead, use re.search which only returns the first result (so for really long lines, you wouldn’t keep searching if you already found a result). And if search1 and search2 won’t contain actual regular expressions but just strings you want to find in the line, then you should also just use the in operator:

if search1 in line:
    write1 = True

<小时>

最后一点:文件句柄应该始终从打开它们的同一级别关闭.所以如果你在函数内部打开一个文件句柄,那么那个函数也应该关闭它.如果您在函数外部打开文件,则函数不应关闭它.关闭文件是打开者的责任,其他情况下关闭文件可能会导致错误的行为,所以你不应该这样做(除非它被明确记录,例如一个函数 doSomethingAndClose 可能会关闭文件)文件).


One final note: File handles should always be closed from the same level they are opened from. So if you open a file handle inside a function, then that function should also close it. If you open a file on the outside of the function, then the function should not close it. It is the opener’s responsibility to close the file, and for other instances to close files may result in wrong behavior, so you shouldn’t do it (unless it’s explicitly documented, e.g. a function doSomethingAndClose may close the file).

使用 with 语句通常可以避免这种情况,因为您从不手动调用 file.close(),并且 with 语句已经确保文件已正确关闭.

Using the with statement generally avoids this, as you never call file.close() manually, and the with statement already makes sure that the file is correctly closed.

如果你想多次使用一个文件,那么你必须从头开始 以便能够再次阅读.在您的情况下,由于您使用 infile.readlines() 将整个文件读入内存,最好只从文件中读取行 一次,然后将其重用于多个函数调用:

If you want to consume a file multiple times, then you would have to seek to the beginning in order to be able to read from it again. In your case, since you are using infile.readlines() to read the whole file into memory, it’s a better idea to just read the lines once from the file and then reuse it for multiple function calls:

text_file = input("name of inputfile : ")
with open(text_file) as infile:
    fi = infile.readlines() # read the lines *once*

    myfunc(fi, …)
    myfunc(fi, …)
    myfunc(fi, …)

这篇关于如何使用 Python 中的函数搜索文件中的特定行并将它们写入另一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆