需要帮助在Linux机器上在Python中排序已处理的mdb文件 [英] Need help to sort processed mdb file in Python on Linux machine

查看:116
本文介绍了需要帮助在Linux机器上在Python中排序已处理的mdb文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从.mdb文件中提取一个表,然后过滤该表,并将结果输出到短的.csv文件。到目前为止,我能够提取所需的表,并将其内容保存到.CSV。但我不知道我该如何排序数据和提取所需的行我需要。我想我可以保存整个.csv然后重新打开它,但它需要大量的空间,因为我需要处理大约2000个mdb文件。我只想提取某些行。

 循环测试_时间电流电压
1 7.80E-002 0.00E + 000 1.21 E-001
1 3.01E + 001 0.00E + 000 1.19E-001
1 6.02E + 001 0.00E + 000 1.17E-001
2 9.02E + 001 0.00E + 000 1.14E-001
2 1.20E + 002 0.00E + 000 1.11E-001
2 1.50E + 002 0.00E + 000 1.08E-001
2 1.80E + 002 0.00E + 000 1.05E-001
2 2.10E + 002 0.00E + 000 1.02E-001
3 2.40E + 002 0.00E + 000 9.93E-002
3 2.70E + 002 0.00E +000 9.66E-002
3 3.00E + 002 0.00E + 000 9.38E-002
3 3.10E + 002 4.00E-001 1.26E + 000

例如,在上表中,我想做以下事情:


  1. 提取每个周期的最后一行,或者更高级,按时间对周期进行排序,并提取具有最新时间的周期行。正如你可以看到,最后一行并不总是有最新的时间,由于我们的测试机毛刺,但通常是。

  2. 提取前5个周期的所有行。

  3. 从周期4提取所有行30。

以下是我的代码:

  import sys,subprocess,glob 

mdbfiles = glob.glob('*。res')
对于mdbfiles中的DATABASE:

subprocess.call ([mdb-schema,DATABASE,mysql])

table_names = subprocess.Popen([mdb-tables,-1,DATABASE],
stdout = subprocess.PIPE).communicate()[0]
tables = table_names.splitlines()

sys.stdout.flush()

a = str('Channel_Normal_Table ')

表中的表:
如果table!=''和table == a:

filename = DATABASE.replace(。res )+.csv
file = open(filename,'w')
print(Dumping+ table)
contents = subprocess.Popen([mdb-export ,DATABASE,table],
stdout = subprocess.PIPE).communicate()[0]

我需要在这里排序并提取我需要的数据


file.write(contents)
file.close()


解决方案

它可能更容易处理一个扁平的行列表,但将其转换为一个结构,这将允许首先查询数据更容易。类似于列表的每个字典代表一个周期:

  cycles = {} 

rows = content.splitlines()#将`content`文本块分割成单独的行

for row in rows [1:]:#问题的第一行是一个标题 - [1:]跳过它
row = rows.split()#按空格分隔每一行
cycle = cycles.setdefault(row [0],{'id':row [0],'rows':[ }
cycle ['rows']。append({'cycle':row [0],'test_time':row [1],'current':row [2],...} b



然后你可以通过test_time对它们进行排序:

  for key,循环cycle.items():
cycles ['rows'] sort(key = itemgetter('test_time'))

然后你可以处理你的数据每个周期的最后一行:

  for key,cycle in cycles.items():
output_row(cycles ['rows'] [ - 1])$ ​​b $ b

最后五个周期的行:

  in sorted(cycles.items())[: -  5]:
output_rows(cycles ['rows'])

从4到30提取行:

  for idx在范围(4,31):
cycle = cycles [str(idx)]
output_rows(cycles ['rows'])


I am trying to extract a table from .mdb file, then filter that table and spit out the result into short .csv file. So far I was able to extract the table needed and save it's content into .CSV. But I dont know how can I sort that data and extract the necessary rows I need. I guess I could save the the whole .csv and then reopen it, but it would take huge amount of space since I need to process about 2000 mdb files. I just want to extract certain rows.

Cycle Test_Time  Current    Voltage
1     7.80E-002 0.00E+000   1.21E-001
1     3.01E+001 0.00E+000   1.19E-001
1     6.02E+001 0.00E+000   1.17E-001
2     9.02E+001 0.00E+000   1.14E-001
2     1.20E+002 0.00E+000   1.11E-001
2     1.50E+002 0.00E+000   1.08E-001
2     1.80E+002 0.00E+000   1.05E-001
2     2.10E+002 0.00E+000   1.02E-001
3     2.40E+002 0.00E+000   9.93E-002
3     2.70E+002 0.00E+000   9.66E-002
3     3.00E+002 0.00E+000   9.38E-002
3     3.10E+002 4.00E-001   1.26E+000

For example, in the table above I want to do the following things:

  1. Extract the last row of each cycle or, more advanced, sort the cycle by time and extract the row of the cycle with the latest time. As you can see, Last row does not always have the latest time due to our testing machine glitch, but usually does. But the bigger the number the later the time.
  2. Extract all the rows for last five cycles.
  3. Extract all the rows from cycle 4 to cycle 30.

Here is my code:

import sys, subprocess, glob

mdbfiles = glob.glob('*.res')
for DATABASE in mdbfiles: 

    subprocess.call(["mdb-schema", DATABASE, "mysql"])

    table_names = subprocess.Popen(["mdb-tables", "-1", DATABASE],
                                   stdout=subprocess.PIPE).communicate()[0]
    tables = table_names.splitlines()

    sys.stdout.flush()

    a=str('Channel_Normal_Table')

    for table in tables:
        if table != '' and table==a:

            filename = DATABASE.replace(".res","") + ".csv"
            file = open(filename, 'w')
            print("Dumping " + table)
            contents = subprocess.Popen(["mdb-export", DATABASE, table],
                                        stdout=subprocess.PIPE).communicate()[0]

            # I NEED TO PUT SOMETHING HERE TO SORT AND EXTRACT THE DATA I NEED


            file.write(contents)
            file.close()

解决方案

It may be easier not to deal with a flat list of rows but convert it to a stucture which would allow to "query" the data easier first. Something like a list of dicts, where each dict represents a cycle:

cycles = {}

rows = contents.splitlines()  # split the `contents` text blob into individual lines

for row in rows[1:]:  # the first line in your question is a header - [1:] skips it
    row = rows.split()  # split each line by whitespace
    cycle = cycles.setdefault(row[0], {'id': row[0], 'rows': []}
    cycle['rows'].append({'cycle':row[0], 'test_time': row[1], 'current': row[2], ...})

Then you can sort them by test_time:

for key, cycle in cycles.items():
    cycles['rows'].sort(key=itemgetter('test_time'))

Then you can process your data. The last row of each cycle:

 for key, cycle in cycles.items():
    output_row(cycles['rows'][-1])

Rows of the last five cycles:

 for key, cycle in sorted(cycles.items())[:-5]:
    output_rows(cycles['rows'])

Extract rows from 4 to 30:

for idx in range(4, 31):
    cycle = cycles[str(idx)]
    output_rows(cycles['rows'])

这篇关于需要帮助在Linux机器上在Python中排序已处理的mdb文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆