如何计算 CSV 文件中字段的模式? [英] How to calculate the mode for a field in a CSV file?
问题描述
我有这个文本文件:
Category;currency;sellerRating;Duration;endDay;ClosePrice;OpenPrice;Competitive?
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Music/Movie/Game;US;3249;5;Mon;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;No
Automotive;US;3115;7;Tue;0,01;0,01;Yes
我想计算每个类别的中位数.例如,我想从 sellerRating
计算模式.到目前为止我有这个(因为我还需要计算平均值,但我设法做到了):
I want to calculate the median from each category. So for example I want to calculate the mode from sellerRating
. I have this so far (because I also needed to calculate the averages but I managed to do that):
import csv
import locale
import statistics
from pprint import pprint, pformat
import locale
locale.setlocale(locale.LC_ALL, 'Dutch_Netherlands.1252')
avg_names = 'sellerRating', 'Duration', 'ClosePrice', 'OpenPrice'
averages = {avg_name: 0 for avg_name in avg_names}
num_values = 0
with open('bijlage.txt', newline='') as bestand:
csvreader = csv.DictReader(bestand, delimiter=';')
for row in csvreader:
num_values += 1
for avg_name in avg_names:
averages[avg_name] += locale.atof(row[avg_name])
for avg_name, total in averages.items():
averages[avg_name] = total / num_values
print('raw results:')
pprint(averages)
print()
print('Averages:')
for avg_name in avg_names:
rounded = locale.format_string('%.2f', round(averages[avg_name], 2),
grouping=True)
print(' {:<13} {:>10}'.format(avg_name, rounded))
我尝试这样做:
from statistics import mode
mode(averages)
但这不起作用,我现在卡住了.我是一个 python 初学者,所以如果你回答我的问题,你能解释一下为什么那应该是 anwser 以便我可以学习.
But that does not work and I am stuck now. I am a python beginner so if you anwser my problem could you explain me why that should be the anwser so I can learn.
推荐答案
Pandas 是一个相当不错的库.pip install pandas
Pandas is quite a nice library for this.
pip install pandas
import pandas as pd
df = pd.read_csv('bijlage.csv', delimiter=';', decimal=',') # 'bijlage.txt' in your case
sellerRating_median = df['sellerRating'].median()
print('Seller rating median: {}'.format(sellerRating_median)
除了median()
,还有mean()
来计算平均值
您也可以使用 mode()
来计算序列的众数,但这会返回一个数字列表,因此您必须使用 mode()[0]
获得第一个.
Besides median()
, there is also mean()
to calculate the average
You can also use mode()
to calculate the mode of the sequence, but this returns a list of numbers, so you'll have to use mode()[0]
to get the first one.
这篇关于如何计算 CSV 文件中字段的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!