在 Python 中通过分隔符拆分大文本文件 [英] Splitting large text file by a delimiter in Python

查看:41
本文介绍了在 Python 中通过分隔符拆分大文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为这将是一项简单的任务,但我无法在之前的 StackOverflow 问题中找到我正在寻找的内容……

I imaging this is going to be a simple task but I can't find what I am looking for exactly in previous StackOverflow questions to here goes...

我有一个专有格式的大文本文件,看起来像这样:

I have large text files in a proprietry format that look comething like this:

:Entry
- Name
John Doe

- Date
20/12/1979
:Entry

-Name
Jane Doe
- Date
21/12/1979

等等.

文本文件的大小范围从 10kb 到 100mb.我需要用 :Entry 分隔符分割这个文件.如何根据 :Entry 块处理每个文件?

The text files range in size from 10kb to 100mb. I need to split this file by the :Entry delimiter. How could I process each file based on :Entry blocks?

推荐答案

你可以使用 itertools.groupby:Entry 之后出现的行分组到列表中:

You could use itertools.groupby to group lines that occur after :Entry into lists:

import itertools as it
filename='test.dat'

with open(filename,'r') as f:
    for key,group in it.groupby(f,lambda line: line.startswith(':Entry')):
        if not key:
            group = list(group)
            print(group)

收益

['- Name\n', 'John Doe\n', '\n', '- Date\n', '20/12/1979\n']
['\n', '-Name\n', 'Jane Doe\n', '- Date\n', '21/12/1979\n']

或者,要处理组,您实际上不需要将 group 转换为列表:

Or, to process the groups, you don't really need to convert group to a list:

with open(filename,'r') as f:
    for key,group in it.groupby(f,lambda line: line.startswith(':Entry')):
        if not key:
            for line in group:
                ...

这篇关于在 Python 中通过分隔符拆分大文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆