编写一个函数,该函数返回并打印列中所有值中的最大值 [英] Writing a function that returns and prints the maximum value, out of all the values in a column

查看:128
本文介绍了编写一个函数,该函数返回并打印列中所有值中的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这张桌子:

使用Jupyter Notebook创建的DataFrame表.

这实际上只是表的一部分.

This is actually only part of the table.

完整表实际上是一个.csv文件,通过使用.head()函数,仅显示前五行.

The complete table is actually a .csv file, and by using .head() function, only the first five rows are shown.

我需要编写一个函数,该函数返回并打印第二列中所有值中的最大值,其标签为"Gold".
该函数应该返回一个字符串值.

I need to write a function that returns and prints the maximum value, out of all the values in the second column, which its label is 'Gold'.
That function should return a single string value.

在写问题之前,我查看了多个资料来源,尝试了多种方法来解决我的问题.

I looked up at several sources before writing my question, trying many ways to solve my problem.

这似乎是一个非常简单的解决方案,但不幸的是我没有成功找到它.
(此查询可能有几种可选的解决方案...?)

It seems to be a very easy solution, but unfortunately I didn't succeed to find it.
(Are there maybe several optional solutions to this query...?)

请帮助我,我很困惑.
谢谢!

Please help me, I'm totally confused.
Thanks!

以下是所有来源:

http://www.datasciencemadesimple.com /get-maximum-value-column-python-pandas/

以下是我尝试解决此问题的所有方法,有些方法存在语法错误:

And here are all the ways I've tried to solve the problem, some had syntax errors:

1.a:用于找出最大值的传统算法,例如C语言:"for"循环.

1.a: The traditional algorithm to find out the maximum value, like in C language: a 'for' loop.

def answer_one():

row=1

max_gold = df['Gold'].row  # Setting the initial maximum.

for col in df.columns: 

    if col[:2]=='Gold': # finding the column.    

        # now iterating through all the rows, finding finally the absolute maximum:

        for row in df.itertuples():  # I also tried: for row=2 in df.rows:

            if(df['Gold'].row > max_gold)  # I also tried: if(row.Gold > max_gold)

                 max_gold = df['Gold'].row  #  I also tried: max_gold = row.Gold

return df.max_gold

我在将打印功能合并到上面的代码中遇到问题,因此我单独添加了它:

I had problems how to merge the printing function into the code above, so I added it separately:

1.b:

for row in df.itertuples():
    print(row.Gold)         # or: print(max_gold)

1.c:

for col in df.columns: 

if col[:2]=='Gold':

    df[df['Gold'].max()]

2.

def answer_one():

df = pd.DataFrame(columns=['Gold']) # syntax error.

for row in df.itertuples():    # The same as the separated code sction above.
        print(row.Gold)

3.

def answer_one():

print(df[['Gold']][df.Value == df.Value.max()]) # I don't know if "Value" is a key word or not.

def answer_one():
return df['Gold'].max() # right syntax, wrong result (not the max value). 

5.

def answer_one():

s=data.max()

print '%s' % (s['Gold']) # syntax error. 

6.a:

def answer_one():

df.loc[df['Gold'].idxmax()] # right syntax, wrong output (all the column indexes of the table are shown in a column)

6.b:

def answer_one():

df.loc[:,['Gold']]  # or: df.loc['Gold']  

df['Gold'].max()

推荐答案

很好的第一个问题,我假设您正在Coursera上进行Python进行数据科学课程学习?

Great first question, I assume you're doing the python for datascience course on coursera?

正如已经指出的那样,df['Gold'].max()是正确的,但是,如果数据类型错误,它将不会返回预期的结果.所以首先要确保它是一个数字.您可以通过运行df['Gold'].dtype进行检查,如果该数据集的输出不是int64,则可以通过运行df.loc[:,'Gold'] = df.loc[:,'Gold'].str.replace(',','').astype(int)对其进行更正,然后df['Gold'].max()返回1022.

As already pointed out, df['Gold'].max() is correct however, if the datatype is wrong, it will not return the expected result. So first thing is to make sure it's a number. You can check this by running df['Gold'].dtype if the output isn't int64 for this dataset you can likely correct it by running df.loc[:,'Gold'] = df.loc[:,'Gold'].str.replace(',','').astype(int) after that df['Gold'].max() will return 1022.

在谈到for循环时,在这种情况下,您可以遍历Gold系列中的所有值,而不是遍历所有列和所有行.请注意,python使用0索引!因此,如果您将第1行用作起点,则如果最大值位于第一行(row0)中,并且使用[Index]而不是.Index进行索引,则会得到错误的结果.因此for循环可能看起来像这样.

When it comes to the for loop, you can in this case iterate over all values in the Gold series, instead of both iterating over all the columns and all the rows. Note that python uses 0 indexing! so if you would used row 1 as starting point you would get the wrong result if the largest value is in the first row (row0), and you index by using [Index] and not .Index. So the for loop could look like this.

CurrentMax = df['Gold'][0]
for value in df['Gold']:
    if value>CurrentMax:
        CurrentMax = value
print(CurrentMax)

包装为功能

def rowbyrow(df=df):
    CurrentMax = df['Gold'][0]
    for value in df['Gold']:
        if value>CurrentMax:
            CurrentMax = value
    #print(CurrentMax) if you want to print the result when running
    return CurrentMax

关于第3点,我相信您要追求的是下面的结果,它会按Gold的值等于最大值的位置过滤Gold,因为您在Gold周围使用了两个方括号,这将返回一个数据框,而不仅仅是价值: df[['Gold']][df.Gold == df.Gold.max()] 用一个括号将返回一系列: df['Gold'][df.Gold == df.Gold.max()]

Regarding point 3. I believe what you're after is below, it filters Gold by where the value of Gold is equal to the maximum value, as you used two brackets around Gold this will return a dataframe and not just the value: df[['Gold']][df.Gold == df.Gold.max()] with one bracket it would return a series: df['Gold'][df.Gold == df.Gold.max()]

关于第5点,如果您使用的是python 3,可能会导致语法错误?在python 3中,您需要在print语句后使用(),这样以下代码应该可以工作:

Regarding point 5, syntax error might be caused if you're using python 3? In python 3 you need to use () after print statement so below should work:

s=df.max()
print('%s' % (s['Gold']))

关于第6点:a,如果您只想输出特定的列,则需要在过滤条件(由,分隔)之后传递该列,如下所示:

Regarding point 6:a if you want to output only a specific column, you need to pass that column(s) after the filtering condition (separated by a ,) like below:

df.loc[df['Gold'].idxmax(),'Gold']

如果您想返回几列,则可以传递一个列表,例如

if you want to return several columns you can pass a list e.g.

df.loc[df['Gold'].idxmax(),['Country','Gold']]

对于点1:c,[:2]将返回前两个字母.因此,与四个字母词Gold相比,总是错误的.

for point 1:c, [:2] will return the first two letters. So will always be false when compared with the four letter word Gold.

一些性能比较:

1.

%%timeit
df.loc[df['Gold'].idxmax(),'Gold']
10000 loops, best of 3: 76.6 µs per loop

2.

%%timeit
s=df.max()
'%s' % (s['Gold'])
1000 loops, best of 3: 733 µs per loop

3.

%%timeit
rowbyrow()
10000 loops, best of 3: 71 µs per loop

4.

%%timeit
df['Gold'].max()
10000 loops, best of 3: 106 µs per loop

我很惊讶地看到函数rowbyrow()的执行速度最快.

I was surprised to see that the function rowbyrow() had the fastest result.

创建具有10k个随机值的序列后,rowbyrow()仍然是最快的.

After creating a series with 10k random values, rowbyrow() was still the fastest.

看这里:

df = pd.DataFrame((np.random.rand(10000, 1)), columns=['Gold']) 

%%timeit  # no. 1
df['Gold'].max()

The slowest run took 10.30 times longer than the fastest.   
10000 loops, best of 3: 127 µs per loop


%%timeit  # no. 2
rowbyrow()

The slowest run took 8.12 times longer than the fastest.   
10000 loops, best of 3: 72.7 µs per loop

这篇关于编写一个函数,该函数返回并打印列中所有值中的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆