查找列中的特定字符串,并找到与该字符串对应的最大值 [英] Finding specific strings within a column and finding the max corresponding to that string

查看:374
本文介绍了查找列中的特定字符串,并找到与该字符串对应的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在想:

1。)如何在列中找到特定字符串

2.)字符串,我如何找到它对应的最大

3.)如何计算该列中每一行的字符串数量

1.) how do I find a specific string in a column
2.) given that string, how would I find it's corresponding max
3.) How do I count the number of strings for each row in that column

我有一个名为sports.csv的csv文件

I have a csv file called sports.csv

 import pandas as pd
 import numpy as np

#loading the data into data frame
X = pd.read_csv('sports.csv')

两个感兴趣的列是总计 Gym 列:

the two columns of interest are the Totals and Gym column:

 Total  Gym
40  Football|Baseball|Hockey|Running|Basketball|Swimming|Cycling|Volleyball|Tennis|Ballet
37  Baseball|Tennis
61  Basketball|Baseball|Ballet
12  Swimming|Ballet|Cycling|Basketball|Volleyball|Hockey|Running|Tennis|Baseball|Football
78  Swimming|Basketball
29  Baseball|Tennis|Ballet|Cycling|Basketball|Football|Volleyball|Swimming
31  Tennis
54  Tennis|Football|Ballet|Cycling|Running|Swimming|Baseball|Basketball|Volleyball
33  Baseball|Hockey|Swimming|Cycling
17  Football|Hockey|Volleyball

注意 列有每个相应的运动的多个字符串。我试图找到一种方法来找到所有的健身房有棒球,并找到一个与最大总。但是,我只对至少有两个运动的健身房感兴趣,我不想考虑:

Notice that the Gym column has multiple strings for each corresponding sport.I'm trying to find a way to find all of the gyms that have Baseball and find the one with the max total. However, I'm only interested in gyms that have at least two other sports i.e. I wouldn't want to consider:

  Total   Gym
  37    Baseball|Tennis


推荐答案

可以使用 pandas

轻松地做到这一点。首先,将字符串拆分为tab分隔符上的列表,然后迭代列表,并选择长度大于2的那些,因为您希望棒球以及其他两项运动作为标准。

First, split the strings into a list on the tab delimiter followed by iterating over the list and choosing the ones with the length greater than 2 as you would want baseball along with two other sports as the criteria.

In [4]: df['Gym'] = df['Gym'].str.split('|').apply(lambda x: ' '.join([i for i in x if len(x)>2]))

In [5]: df
Out[5]: 
   Total                                                Gym
0     40  Football Baseball Hockey Running Basketball Sw...
1     37                                                   
2     61                         Basketball Baseball Ballet
3     12  Swimming Ballet Cycling Basketball Volleyball ...
4     78                                                   
5     29  Baseball Tennis Ballet Cycling Basketball Foot...
6     31                                                   
7     54  Tennis Football Ballet Cycling Running Swimmin...
8     33                   Baseball Hockey Swimming Cycling
9     17                         Football Hockey Volleyball

使用 str.contains 搜索字符串 Baseball $ b> $ c>

Using str.contains to search for the string Baseball in the column Gym.

In [6]: df = df.loc[df['Gym'].str.contains('Baseball')]

In [7]: df
Out[7]: 
   Total                                                Gym
0     40  Football Baseball Hockey Running Basketball Sw...
2     61                         Basketball Baseball Ballet
3     12  Swimming Ballet Cycling Basketball Volleyball ...
5     29  Baseball Tennis Ballet Cycling Basketball Foot...
7     54  Tennis Football Ballet Cycling Running Swimmin...
8     33                   Baseball Hockey Swimming Cycling

计算相应的字符串计数。

Compute respective string counts.

In [8]: df['Count'] = df['Gym'].str.split().apply(lambda x: len([i for i in x]))

该数据框对应于 Totals 列中的最大值。

Followed by choosing the subset of the dataframe corresponding to the maximum value in the Totals column.

In [9]: df.loc[df['Total'].idxmax()]
Out[9]: 
Total                            61
Gym      Basketball Baseball Ballet
Count                             3
Name: 2, dtype: object

这篇关于查找列中的特定字符串,并找到与该字符串对应的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆