将正则表达式应用于 pandas 数据框 [英] applying regex to a pandas dataframe

查看：70 发布时间：2020/5/23 21:59:33 python regex pandas

本文介绍了将正则表达式应用于 pandas 数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在将正则表达式函数应用于python数据框中的列时遇到问题.这是我的数据框的标题:

I'm having trouble applying a regex function a column in a python dataframe. Here is the head of my dataframe:

               Name   Season          School   G    MP  FGA  3P  3PA    3P%
 74       Joe Dumars  1982-83   McNeese State  29   NaN  487   5    8  0.625   
 84      Sam Vincent  1982-83  Michigan State  30  1066  401   5   11  0.455   
 176  Gerald Wilkins  1982-83     Chattanooga  30   820  350   0    2  0.000   
 177  Gerald Wilkins  1983-84     Chattanooga  23   737  297   3   10  0.300   
 243    Delaney Rudd  1982-83     Wake Forest  32  1004  324  13   29  0.448

我以为我对将函数应用于Dataframes掌握得很好，所以也许我缺少Regex技能.

I thought I had a pretty good grasp of applying functions to Dataframes, so maybe my Regex skills are lacking.

这是我整理的内容:

import re

def split_it(year):
    return re.findall('(\d\d\d\d)', year)

 df['Season2'] = df['Season'].apply(split_it(x))

TypeError: expected string or buffer

输出将是名为Season2的列，其中包含连字符之前的年份.我敢肯定，没有正则表达式，这是一种更简便的方法，但更重要的是，我正在努力弄清楚我做错了什么

Output would be a column called Season2 that contains the year before the hyphen. I'm sure theres an easier way to do it without regex, but more importantly, i'm trying to figure out what I did wrong

非常感谢您提前提供帮助.

Thanks for any help in advance.

推荐答案

当我尝试您的代码(它的一种变体)时，我会得到NameError: name 'x' is not defined，但不是.

When I try (a variant of) your code I get NameError: name 'x' is not defined-- which it isn't.

您可以使用

df['Season2'] = df['Season'].apply(split_it)

或

df['Season2'] = df['Season'].apply(lambda x: split_it(x))

，但是第二个只是编写第一个的更长，更慢的方式，因此没有太多意义(除非您要处理其他参数，我们不在这里.)您的函数将返回列表，但是:

but the second one is just a longer and slower way to write the first one, so there's not much point (unless you have other arguments to handle, which we don't here.) Your function will return a list, though:

>>> df["Season"].apply(split_it)
74     [1982]
84     [1982]
176    [1982]
177    [1983]
243    [1982]
Name: Season, dtype: object

尽管您可以轻松更改它. FWIW，我将使用向量化字符串操作并执行类似的操作

although you could easily change that. FWIW, I'd use vectorized string operations and do something like

>>> df["Season"].str[:4].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

或

>>> df["Season"].str.split("-").str[0].astype(int)
74     1982
84     1982
176    1982
177    1983
243    1982
Name: Season, dtype: int64

这篇关于将正则表达式应用于 pandas 数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将正则表达式应用于 pandas 数据框 [英] applying regex to a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将正则表达式应用于 pandas 数据框 [英] applying regex to a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭