从txt文件中以逗号分隔的Matrix-续 [英] Comma separated Matrix from txt files - continued

查看：97 发布时间：2020/5/7 19:41:11 python matrix

本文介绍了从txt文件中以逗号分隔的Matrix-续的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从包含表达式频率分布的文本文件列表中形成一个矩阵.因此，我从目录中创建了所有这些文本文件(lof)的列表，并使用它来构建矩阵(感谢gboffy).该列表中的每个文件名都采用以下结构:CompanyName-SerialNumber_IssueDate_IFRS.txt(示例:GoldmanSachs-123456_31.12.2014_IFRS.txt).每个文件的内容的结构也完全相同:

I need to form a matrix from a list of textfiles containing frequency distribution of expressions. Therefore, I created a list of all that text files (lof) from a directory and used it to build a matrix (thanks to gboffy). Each filename in that list is structured in a way: CompanyName-SerialNumber_IssueDate_IFRS.txt (Example: GoldmanSachs-123456_31.12.2014_IFRS.txt). Each file's content is structured in a exact same way too:

CompanyABC-123456_31.12.2012_IFRS.txt

Company ABC-123456_31.12.2012
financial statement:4
corporate-taxes:8
assets:2
available-for-sale property:0
auditors:213

Company123-789102_31.12.2012_IFRS.txt

Company123-789102_31.12.2012
financial statement:15
corporate-taxes:3
assets:8
available-for-sale property:2
auditors:23

我想要的输出应该是写入到txt的单个矩阵文件，每个公司文件由一行组成，其中包括(CompanyName，Serial Number，IssueDate，Frequency1，Frequency2，...，FrequencyN):

My desired output from this should be a single matrix file written to txt with one line for each company file consisting of (CompanyName,Serial Number,IssueDate,Frequency1,Frequency2,...,FrequencyN):

'CompanyABC','123456','31.12.2012','4','8','2','0','213' \n
'Company123','789102','31.12.2012','15','3','8','2','23' \n

到目前为止，这是我的代码:

Here is my code so far:

       def list_textfiles(directory, min_file_size):
            # Creates a list of all files stored in DIRECTORY ending on '.txt' with minimum file size
            textfiles = []
            for root, dirs, files in os.walk(directory):
                for name in files:
                    filename = os.path.join(root, name)
                    if os.stat(filename).st_size > min_file_size:
                        textfiles.append(filename)
            return textfiles

        directory = 'C:/CompanyFiles'
        minimum_size = 30000
        lof = list_textfiles(directory, minimum_size)

        res = []

        for f in lof:
            res += [[entry.split(':')[1] for entry in cdata ]
                    for cdata in [data.splitlines() for data in open(f).read().split('\n\n')]]

        with open('C:/CompanyFiles/Matrix.txt', 'wt') as outfile:
            outfile.write(str(res))

如何修改代码以实现如上所述的输出?

How can I modify my code to achieve the output as stated above?

从txt文件中以逗号分隔的Matrix-续 [英] Comma separated Matrix from txt files - continued

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从txt文件中以逗号分隔的Matrix-续 [英] Comma separated Matrix from txt files - continued

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭