如何计算 CSV 文件中字段的模式? [英] How to calculate the mode for a field in a CSV file?

查看:23
本文介绍了如何计算 CSV 文件中字段的模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个文本文件:

Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;Yes

我想计算每个类别的中位数.例如,我想从 sellerRating 计算模式.到目前为止我有这个(因为我还需要计算平均值,但我设法做到了):

I want to calculate the median from each category. So for example I want to calculate the mode from sellerRating. I have this so far (because I also needed to calculate the averages but I managed to do that):

import csv
import locale
import statistics
from pprint import pprint, pformat

import locale

locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')

avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}


num_values = 0
with open('bijlage.txt', newline='') as bestand:
     csvreader = csv.DictReader(bestand, delimiter=';')
     for row in csvreader:
        num_values += 1
        for avg_name in avg_names:
             averages[avg_name] += locale.atof(row[avg_name])


for avg_name, total in averages.items():
    averages[avg_name] = total / num_values

print('raw results:')
pprint(averages)

print()
print('Averages:')
for avg_name in avg_names:
    rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
                           grouping=True)
    print('  {:<13} {:>10}'.format(avg_name, rounded))

我尝试这样做:

from statistics import mode
mode(averages)

但这不起作用,我现在卡住了.我是一个 python 初学者,所以如果你回答我的问题,你能解释一下为什么那应该是 anwser 以便我可以学习.

But that does not work and I am stuck now. I am a python beginner so if you anwser my problem could you explain me why that should be the anwser so I can learn.

推荐答案

Pandas 是一个相当不错的库.
pip install pandas

Pandas is quite a nice library for this.
pip install pandas

import pandas as pd
df = pd.read_csv('bijlage.csv', delimiter=';', decimal=',')  # 'bijlage.txt' in your case
sellerRating_median = df['sellerRating'].median()
print('Seller rating median: {}'.format(sellerRating_median)

除了median(),还有mean()来计算平均值
您也可以使用 mode() 来计算序列的众数,但这会返回一个数字列表,因此您必须使用 mode()[0]获得第一个.

Besides median(), there is also mean() to calculate the average
You can also use mode() to calculate the mode of the sequence, but this returns a list of numbers, so you'll have to use mode()[0] to get the first one.

这篇关于如何计算 CSV 文件中字段的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆