从csv文件的每一列获取最大值 [英] getting max value from each column of the csv file

查看:304
本文介绍了从csv文件的每一列获取最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人会帮我解决以下问题。我已经尝试了我自己,我已附加的解决方案。我已经使用2-d列表,但我想要一个不同的解决方案没有2-d列表,这应该更pythonic。



pl建议我任何人有任何其他



Q)考虑在CSV文件中自1990年以来每月给予的N个公司的股票价格。文件格式如下,第一行为标题。



年,月,公司A,公司B,公司C,......... ....公司N



1990年1月10日,15日,20日,.......... 50



1990,2月,10,15,20,,.......... 50



p>







2013年9月,50,10,15 ............ 500



应该是这种格式。
a)列出每个公司年份和月份的最高价格。



这里是我的回答使用2-d列表。 strong>

  def generate_list(file_path):
'''
返回包含文件数据的列表列表。

data_list = None #local variable
try:
file_obj = open(file_path,'r')
try:
gen = (line.split(',')在file_obj中的行)#generator,每次生成一行,直到EOF(文件结束)
为j,enumerate(gen)中的行:
data_list:
#if dl是None,那么创建包含n个空列表的列表,其中n是列数。
data_list = [[] for i in range(len(line))]
如果行[-1] .find('\\\
'):
line [-1] line [-1] [: - 1]#去除最后一个列表元素的'\\\
'字符

#loop将数字从字符串转换为float,并将其他字符串作为字符串
对于i,l在枚举(行)中:
如果i> = 2且j> = 1:
data_list [i] .append(float(l))
else:
data_list [i] .append(l)
除了IOError,io_except:
print io_except
finally:
file_obj.close :
print io_exception

return data_list

def generate_result(file_path):
'''
返回包含,年,月,
公司名称)。
'''
data_list = generate_list(file_path)
re = [] #list将结果存储在tuple formet中,如下所示[(max_price,year,month,company_name),... ]
如果data_list:
for i,d in enumerate(data_list):
if i> = 2:
m = max(data_list [i] [1:])公司的max_price
idx = data_list [i] .index(m)#getting索引列表中的max_price
yr = data_list [0] [idx] #getting年,使用列表中的max_price索引
mon = data_list [1] [idx] #getting month使用列表中的max_price索引
com = data_list [i] [0] #getting company_name
re.append((m, yr,mon,com))
return re


如果__name__ =='__main__':
file_path ='C:/ Document and Settings / RajeshT / Desktop / nothing / imp /新文件夹/ tst.csv'
re = generate_result(file_path)
print'result',re

我已经尝试用generator解决它,但在这种情况下,它只给一个公司的结果,即只有一列。

  p ='filepath.csv'

f = open(p,'r')
head = f.readline()
gen =((float(line.split(',')[n]),line.split(',',2)[0:2]
x = max((i for(i,...))对于n,在范围(2,len(head.split(',')) i in gen),key = lambda x:x [0])
print x

您可以采用csv格式的以下提供的输入数据。

 年,月,公司1,公司2,公司3,company 4,company 5 
1990,jan,201,245,243,179,133
1990,feb,228,123,124,121,180
1990,march,63,13,158,88,79
1990,april, 68,187,67,135
1990,may,109,128,46,185,236
1990,6月,53,36,202,73,210
1990,July,194,38,48,207,72
1990,august, 147,116,149,93,114
1990,9月,51,215,15,38,46
1990,十月,16,200,115,205,118
1990,十一月,241,86,58,183,100
1990,十二月, 97,143,77,84
1991,jan,190,68,236,202,19
1991,feb,39,209,133,221,161
1991,march,246,81,38,100,122
1991,april,37,137,106,138, 26
1991,may,147,48,182,235,47
1991,June,57,20,156,38,245
1991,July,165,153,145,70,157
1991,august,154,16,162, 32,21
1991年9月,64,160,55,220,138
1991,十月,162,72,162,222,179
1991,十一月,215,207,37,176,30
1991年12月,106,153,31,247, 69

预期输出结果如下。

  [(246.0,'1991','march','company 1'),
(245.0,'1990','jan','company 2' $ b(243.0,'1990','jan','company 3'),
(247.0,'1991','december','company 4'),
(245.0, ,'june','company 5']]


解决方案

使用 collections.OrderedDict collections.namedtuple

  import csv 
来自集合import OrderedDict,namedtuple

('abc1')as f:
reader = csv.reader(f)
tup = namedtuple('tup',['price','year','month'])
d = OrderedDict()
names = next(reader)[2:]
名称中的名称:
#initialize dict
d [name] = tup year','month')
for row in reader:
year,month = row [:2]#在py3.x中使用年,月,在zip中的价格(names,map(int,row [2:])):#map(int,prices)py3.x
if d [name] .price< price:
d [name] = tup(price,year,month)
print d

输出

  OrderedDict([
('company 1',tup (price = 246,year ='1991',month ='march')),
('company 2',tup(price = 245,year ='1990',month ='jan')),
('company 3',tup(price = 243,year ='1990',month ='jan')),
('company 4',tup(price = 247,year ='1991' ,month ='december')),
('company 5',tup(price = 245,year ='1991',month ='june'))])


Would anybody help me to solve the following problem. I have tried it on my own and I have attached the solution also. I have used 2-d list, but I want a different solution without 2-d list, which should be more pythonic.

pl suggest me any of you have any other way of doing this.

Q) Consider Share prices for a N number of companies given for each month since year 1990 in a CSV file. Format of the file is as below with first line as header.

Year,Month,Company A, Company B,Company C, .............Company N

1990, Jan, 10, 15, 20, , ..........,50

1990, Feb, 10, 15, 20, , ..........,50

.

.

.

.

2013, Sep, 50, 10, 15............500

The solution should be in this format. a) List for each Company year and month in which the share price was highest.

Here is my answer using 2-d list.

def generate_list(file_path):
    '''
        return list of list's containing file data.'''

    data_list=None   #local variable    
    try:
        file_obj = open(file_path,'r')
        try:
            gen = (line.split(',') for line in file_obj)  #generator, to generate one line each time until EOF (End of File)
            for j,line in enumerate(gen):
                if not data_list:
                    #if dl is None then create list containing n empty lists, where n will be number of columns.
                    data_list = [[] for i in range(len(line))]
                    if line[-1].find('\n'):
                        line[-1] = line[-1][:-1] #to remove last list element's '\n' character

                #loop to convert numbers from string to float, and leave others as strings only
                for i,l in enumerate(line):
                    if i >=2 and j >= 1:
                        data_list[i].append(float(l))
                    else:            
                        data_list[i].append(l)
        except IOError, io_except:
            print io_except
        finally:
            file_obj.close()
    except IOError, io_exception:
        print io_exception

    return data_list

def generate_result(file_path):
    '''
        return list of tuples containing (max price, year, month,
company name).
    '''
    data_list = generate_list(file_path)
    re=[]   #list to store results in tuple formet as follow [(max_price, year, month, company_name), ....]
    if data_list:
        for i,d in enumerate(data_list):
            if i >= 2:
                m = max(data_list[i][1:])      #max_price for the company
                idx = data_list[i].index(m)    #getting index of max_price in the list
                yr = data_list[0][idx]          #getting year by using index of max_price in list
                mon = data_list[1][idx]        #getting month by using index of max_price in list
                com = data_list[i][0]          #getting company_name
                re.append((m,yr,mon,com))
        return re


if __name__ == '__main__':
    file_path = 'C:/Document and Settings/RajeshT/Desktop/nothing/imp/New Folder/tst.csv'
    re = generate_result(file_path)
    print 'result ', re

I have tried to solve it with generator also, but in that case it was giving result for only one company i.e. only one column.

p = 'filepath.csv'

f = open(p,'r')
head = f.readline()
gen = ((float(line.split(',')[n]), line.split(',',2)[0:2], head.split(',')[n]) for n in range(2,len(head.split(','))) for i,line in enumerate(f))
x = max((i for i in gen),key=lambda x:x[0])
print x

you can take the below provided input data which is in csv format..

year,month,company 1,company 2,company 3,company 4,company 5
1990,jan,201,245,243,179,133
1990,feb,228,123,124,121,180
1990,march,63,13,158,88,79
1990,april,234,68,187,67,135
1990,may,109,128,46,185,236
1990,june,53,36,202,73,210
1990,july,194,38,48,207,72
1990,august,147,116,149,93,114
1990,september,51,215,15,38,46
1990,october,16,200,115,205,118
1990,november,241,86,58,183,100
1990,december,175,97,143,77,84
1991,jan,190,68,236,202,19
1991,feb,39,209,133,221,161
1991,march,246,81,38,100,122
1991,april,37,137,106,138,26
1991,may,147,48,182,235,47
1991,june,57,20,156,38,245
1991,july,165,153,145,70,157
1991,august,154,16,162,32,21
1991,september,64,160,55,220,138
1991,october,162,72,162,222,179
1991,november,215,207,37,176,30
1991,december,106,153,31,247,69

expected output is following.

[(246.0, '1991', 'march', 'company 1'),
 (245.0, '1990', 'jan', 'company 2'),
 (243.0,   '1990', 'jan', 'company 3'),
 (247.0, '1991', 'december', 'company 4'),
 (245.0, '1991', 'june', 'company 5')]

Thanks in advance...

解决方案

Using collections.OrderedDict and collections.namedtuple:

import csv
from collections import OrderedDict, namedtuple

with open('abc1') as f:
    reader = csv.reader(f)
    tup = namedtuple('tup', ['price', 'year', 'month'])
    d = OrderedDict()
    names = next(reader)[2:]
    for name in names:
        #initialize the dict
        d[name] = tup(0, 'year', 'month')
    for row in reader:
        year, month = row[:2]         # Use year, month, *prices = row in py3.x
        for name, price in zip(names, map(int, row[2:])): # map(int, prices) py3.x
            if d[name].price < price:
                d[name] = tup(price, year, month)
print d        

Output:

OrderedDict([
('company 1', tup(price=246, year='1991', month='march')),
('company 2', tup(price=245, year='1990', month='jan')),
('company 3', tup(price=243, year='1990', month='jan')),
('company 4', tup(price=247, year='1991', month='december')),
('company 5', tup(price=245, year='1991', month='june'))])

这篇关于从csv文件的每一列获取最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆