解析文本文件中的数据 [英] Parsing data from text file

查看：95 发布时间：2020/5/25 0:31:07 python file parsing

本文介绍了解析文本文件中的数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文本文件，其内容如下:

I have a text file that has content like this:

******** ENTRY 01 ********
ID:                  01
Data1:               0.1834869385E-002
Data2:              10.9598489301
Data3:              -0.1091356549E+001
Data4:                715

然后是一个空行，并重复更多类似的块，所有这些块都具有相同的数据字段.

And then an empty line, and repeats more similar blocks, all of them with the same data fields.

我正在将C ++代码移植到Python，并且某个部分逐行获取文件，检测文本标题，然后检测每个字段文本以提取数据.这看起来根本不是一个智能代码，而且我认为Python必须具有一些库才能轻松解析此类数据.毕竟，它几乎看起来像是CSV！

I am porting to Python a C++ code, and a certain part gets the file line by line, detects the text title and then detect each field text to extract the data. This doesn't look like a smart code at all, and I think Python must have some library to parse data like this easily. After all, it almost look like a CSV!

对此有任何想法吗?

推荐答案

实际上，它与CSV相距很远.

It is very far from CSV, actually.

您可以将该文件用作迭代器；以下生成器函数将产生完整的部分:

You can use the file as an iterator; the following generator function yields complete sections:

def load_sections(filename):
    with open(filename, 'r') as infile:
        line = ''
        while True:
            while not line.startswith('****'): 
                line = next(infile)  # raises StopIteration, ending the generator
                continue  # find next entry

            entry = {}
            for line in infile:
                line = line.strip()
                if not line: break

                key, value = map(str.strip, line.split(':', 1))
                entry[key] = value

            yield entry

这会将文件视为迭代器，这意味着任何循环都会将文件前进到下一行.外循环仅用于一个部分到另一个部分的移动.内部的while和for循环可以完成所有实际工作；首先跳过行，直到找到****标头节(否则将其丢弃)，然后循环遍历所有非空行以创建节.

This treats the file as an iterator, meaning that any looping advances the file to the next line. The outer loop only serves to move from section to section; the inner while and for loops do all the real work; first skip lines until a **** header section is found (otherwise discarded), then loop over all non-empty lines to create a section.

循环使用该功能:

for section in load_sections(filename):
    print section

在文本文件中重复样本数据会导致:

Repeating your sample data in a text file results in:

>>> for section in load_sections('/tmp/test.txt'):
...     print section
... 
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}
{'Data4': '715', 'Data1': '0.1834869385E-002', 'ID': '01', 'Data3': '-0.1091356549E+001', 'Data2': '10.9598489301'}

如果需要，可以在其中添加一些数据转换器；键到callable的映射会做到:

You can add some data converters to that if you want to; a mapping of key to callable would do:

converters = {'ID': int, 'Data1': float, 'Data2': float, 'Data3': float, 'Data4': int}

然后在生成器函数中执行entry[key] = converters.get(key, lambda v: v)(value)，而不是entry[key] = value.

then in the generator function, instead of entry[key] = value do entry[key] = converters.get(key, lambda v: v)(value).

这篇关于解析文本文件中的数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析文本文件中的数据 [英] Parsing data from text file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

解析文本文件中的数据 [英] Parsing data from text file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭