使用Python从CSV文件中查找中位数 [英] Find the median from a CSV File using Python

查看:268
本文介绍了使用Python从CSV文件中查找中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为'salaries.csv'的CSV文件,文件内容如下:

I have a CSV file named 'salaries.csv' The content of the files is as follows:

城市,工作,工资
德里,医生500位
德里,律师400
德里,水管工,100
伦敦,医生,800名
伦敦律师700
伦敦,水管工,300
东京,医生900
东京律师800,
东京,水管工,400
律师,医生300
律师,律师400
律师,水管工,500
香港,医生,1800
香港律师1100
香港,水管工,1000
莫斯科,医生,300
莫斯科律师200,
莫斯科,水管工,100
柏林,医生800位
柏林,水管工,900
巴黎,医生900位
巴黎律师800
巴黎,水管工,500
巴黎,狗捕手400

City,Job,Salary
DelDoctors,500
DelLawyers,400
DelPlumbers,100
London,Doctors,800
London,Lawyers,700
London,Plumbers,300
Tokyo,Doctors,900
Tokyo,Lawyers,800
Tokyo,Plumbers,400
Lawyers,Doctors,300
Lawyers,Lawyers,400
Lawyers,Plumbers,500
Hong Kong,Doctors,1800
Hong Kong,Lawyers,1100
Hong Kong,Plumbers,1000
Moscow,Doctors,300
Moscow,Lawyers,200
Moscow,Plumbers,100
Berlin,Doctors,800
Berlin,Plumbers,900
Paris,Doctors,900
Paris,Lawyers,800
Paris,Plumbers,500
Paris,Dog catchers,400

我需要打印每个专业的中位数工资.我尝试了一个代码,显示了一些错误.

I need to print the median salary of each profession. I tried a code, which shows some error.

我的代码是:

from StringIO import StringIO
import sqlite3
import csv
import operator #from operator import itemgetter, attrgetter

data = open('sal.csv', 'r').read()
string = ''.join(data)
f = StringIO(string)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (City text, Job text, Salary real)''')
conn.commit()
count = 0

for e in reader:
    if count==0:
        print ""
    else:
        e[0]=str(e[0])
        e[1]=str(e[1])
        e[2] = float(e[2])
        c.execute("""insert into data values (?,?,?)""", e)
        count=count+1
        conn.commit()

labels = []
counts = []
count = 0
c.execute('''select count(Salary),Job from data group by Job''')

for row in c:
      for i in row:
            if count==0:
               counts.append(i)
               count=count+1
           else:
                count=0
      labels.append(i)

c.execute('''select Salary,Job from data order by Job''')

count = 1
count1 = 1
temp = 0
pri = 0
lis = []

for row in c:
      lis.append(row)
for cons in counts:
      if cons%2 == 0:
         pri = cons/2
     else:
         pri = (cons+1)/2
     if count1 == 1:
        for li in lis:
              if count == pri:
                  print "Median is ",li
        count = count + 1
        count = 0
        temp = pri+cons
     else:
        for li in lis:
              if count == temp:
                  print "Median is",li
              count = count+1
              count = 0
              temp = temp + pri
       count1 = count1 + 1

但是,它显示了一些错误:

However, it is showing some error:

IndentationError('expected an indented block', ('', 28, 2, 'if count==0:\n'))

如何解决该错误?

推荐答案

您可以使用defaultdict将每个专业的所有薪水都放入,然后仅获取中位数即可.

You can use defaultdict to put all the salaries for each profession then just get the median.

import csv
from collections import defaultdict

with open("C:/Users/jimenez/Desktop/a.csv","r") as f:
    d = defaultdict(list)
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        d[row[1]].append(float(row[2]))   

for k,v in d.iteritems():
    print "{} median is {}".format(k,sorted(v)[len(v) // 2])
    print "{} average is {}".format(k,sum(v)/len(v))

输出

Plumbers median is 500.0
Plumbers average is 475.0
Lawyers median is 700.0
Lawyers average is 628.571428571
Dog catchers median is 400.0
Dog catchers average is 400.0
Doctors median is 800.0
Doctors average is 787.5

这篇关于使用Python从CSV文件中查找中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆