在Python中排序序列的最佳方法是什么? [英] What is the best way to sort a sequence in Python?

查看:156
本文介绍了在Python中排序序列的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据需要在一行中发生的某些条件对表进行排序。
表格的简化版本:

 编号时间
1 23
2 45
3 67
4 23
5 11
6 45
7 123
8 34

...



我需要检查时间是否连续小于40次。像我需要检查行1-5,然后2-6等...然后打印并保存到文件的第一次和最后一次。喜欢,如果满足行2-6的条件,我将需要打印时间为2号和6号。检查应该在条件满足后停止。无需检查其他行。我实现了一个带两个临时变量的计数器,以检查到目前为止一行中的3个项目。它工作正常。但是,如果我想检查连续发生30次的条件,我不能手动创建30个临时变量。什么是最好的方法来实现呢?我想我只需要一种循环。



这是我的代码的一部分:

  csv.reader(open(filename))
计数器,temp1,temp2,numrow = 0,0,0,0

读取器中的行:
numrow + = 1
if numrow< 5:
col0,col1,col4,col5,col6,col23,col24,col25 = float(row [0]),
float(row [1]),float row [24]),float(row [24]),float(row [5]),float b $ b如果col1 <= 40:
list1 =(col1,col3,col4,col5,col6,col23,col24,col25)
计数器+ = 1
如果counter == 3 :
print(Cell#%s%filename [-10:-5])
print LAYOUT.format(* headers_short)
print LAYOUT.format(* temp1)
print LAYOUT.format(* temp2)
print LAYOUT.format(* list1)
print

elif计数器== 1:
temp1 = list1

elif counter == 2:
temp2 = list1

else:
counter = 0
pre>

我实现了Bakuriu建议的解决方案,似乎工作。但是,什么是结合众多测试的最好方法?像我需要检查几个条件。让我们说:
v




  • 连续10个周期内效率低于40,

  • <

  • 的时间少于40次,共有25个周期。



现在我只需打开csv.reader并运行该函数。我想这不是最有效的方式,虽然它的工作。对不起,我只是一个完整的noob。

  csvfiles = glob.glob('processed_data / *。stat')
在csvfiles中的文件名:

flag = []
flag.append(filename [-12:-5])
reader = csv.reader(open
for a,row_group in enumerate(row_grouper(reader,10)):
如果all(float(row [1])<40,row_group中的行):
str1 =Efficiency在周期中小于40+ str(a + 1)+' - '+ str(a + 10)#i是组中第一行的索引。
flag.append(str1)
break #stop处理其他行。

reader = csv.reader(open(filename))
for b,row_group in enumerate(row_grouper(reader,5)):
if all ])< 40 for row in row_group):
str1 =容量小于40分钟的周期+ str(a + 1)+' - '+ str(a + 5)
.append(str1)
break #stop处理其他行。

reader = csv.reader(open(filename))
for b,row_group in enumerate(row_grouper(reader,25)):
if all ])<40 for row in row_group):
str1 =时间小于周期数<40+ str(a + 1)+' - '+ str(a + 25)
flag.append(str1)
break #stop处理其他行。

if len(flag)> 1:

for i in flag:
print i
print'\\\
'


解决方案

您不需要对数据进行排序,您要查找的条件是否发生在最后的 N 行数据中。固定大小的 collections.deque 对这类事物有好处。

  import csv 
from collections import deque
filename ='table.csv'
GROUP_SIZE = 5
THRESHOLD = 40
cond_deque = deque(maxlen = GROUP_SIZE)

数据文件:
reader = csv.reader(datafile)#assume delimiter =','
reader.next()#skip header row
for linenum,enumerate ):#处理文件的行
i in(0,1,4,5)中的col0,col1,col4,col5,col6,col23,col24,col25 =(
float(row [i] ,6,23,24,25))
cond_deque.append(col1 if cond_deque.count(True)== GROUP_SIZE:
print'lines {} - {}具有{}连续行,col1 < {}'。格式(
linenum-GROUP_SIZE + 1,linenum,GROUP_SIZE,THRESHOLD)
发现break#,因此停止查找


I am trying to sort the table based on certain conditions that need to happen in a row. Simplified version of a table:

Number  Time
   1    23
   2    45
   3    67
   4    23
   5    11
   6    45
   7    123
   8    34

...

I need to check if time was <40 5 times in a row. Like I need to check rows 1-5, then 2-6 etc... And then print and save to a file the first and last time. Like, if the condition is met for rows 2-6 I will need to print time for Number 2 and Number 6. The checking should stop after condition has been met. No need to check other rows. I implemented a counter with two temp variables to check for 3 items in a row so far. It works fine. But, if I want to check for the condition that happened 30 times in a row, I can not just create 30 temp variables manually. What is the best way to achieve that? I guess I will just need some kind of a loop. Thanks!

Here is part of my code:

reader = csv.reader(open(filename))
counter, temp1, temp2, numrow = 0, 0, 0, 0

for row in reader:
    numrow+=1
    if numrow <5:
        col0, col1, col4, col5, col6, col23, col24, col25 = float(row[0]),
            float(row[1]), float(row[4]), float(row[5]),float(row[6]), 
            float(row[23]), float(row[24]), float(row[25])
        if col1 <= 40:
            list1=(col1, col3, col4, col5, col6, col23, col24, col25)
            counter += 1
            if counter == 3:
                print("Cell# %s" %filename[-10:-5])
                print LAYOUT.format(*headers_short)
                print LAYOUT.format(*temp1)
                print LAYOUT.format(*temp2)
                print LAYOUT.format(*list1)
                print ""

            elif counter == 1:
                temp1=list1

            elif counter == 2:
                temp2=list1

        else:
            counter = 0

I implemented solution suggested by Bakuriu and it seems to be working. But what will be the best way to combine numerous testing? Like I need to check for several conditions. Lets say: v

  • efficiency for less than 40 in 10 cycles in a row,
  • capacity for less than 40 in 5 cycles in row
  • time for less than 40 for 25 cycles in a row
  • and some others...

Right now I just open csv.reader for every testing and run the function. I guess it is not the most efficient way, although it works. Sorry, I am just a complete noob.

csvfiles = glob.glob('processed_data/*.stat')
for filename in csvfiles: 

    flag=[]
    flag.append(filename[-12:-5])
    reader = csv.reader(open(filename))
    for a, row_group in enumerate(row_grouper(reader,10)):
        if all(float(row[1]) < 40 for row in row_group):         
            str1= "Efficiency is less than 40 in cycles "+ str(a+1)+'-'+str(a+10)  #i is the index of the first row in the group.
            flag.append(str1)
            break #stop processing other rows.

    reader = csv.reader(open(filename))    
    for b, row_group in enumerate(row_grouper(reader,5)):
        if all(float(row[3]) < 40 for row in row_group):
            str1= "Capacity is less than 40 minutes in cycles "+ str(a+1)+'-'+str(a+5)
            flag.append(str1)
            break #stop processing other rows.

    reader = csv.reader(open(filename))    
    for b, row_group in enumerate(row_grouper(reader,25)):
        if all(float(row[3]) < 40 for row in row_group):
            str1= "Time is less than < 40 in cycles "+ str(a+1)+'-'+str(a+25)
            flag.append(str1)
            break #stop processing other rows.

   if len(flag)>1:

       for i in flag:
            print i
        print '\n'

解决方案

You don't really need to sort your data, just keep track of whether the condition you're looking for has occurred in the last N rows of data. Fixed-size collections.deques are good for this sort of thing.

import csv
from collections import deque
filename = 'table.csv'
GROUP_SIZE = 5
THRESHOLD = 40
cond_deque = deque(maxlen=GROUP_SIZE)

with open(filename) as datafile:
    reader = csv.reader(datafile) # assume delimiter=','
    reader.next() # skip header row
    for linenum, row in enumerate(reader, start=1):  # process rows of file
        col0, col1, col4, col5, col6, col23, col24, col25 = (
            float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25))
        cond_deque.append(col1 < THRESHOLD)
        if cond_deque.count(True) == GROUP_SIZE:
            print 'lines {}-{} had {} consecutive rows with col1 < {}'.format(
                linenum-GROUP_SIZE+1, linenum, GROUP_SIZE, THRESHOLD)
            break  # found, so stop looking

这篇关于在Python中排序序列的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆