数据框如何基于许多str值更新列 [英] Dataframe how to update a column based many str values

查看:66
本文介绍了数据框如何基于许多str值更新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个小型财务管理程序,该程序将从CSV导入我的交易到Python.我想根据在详细信息" 列中找到的字符串将值分配给新列类别" .我可以做一个,但是我的问题是,如果我有很多可能的字符串,该怎么办?例如,str.contains('RALPHS')会将列值替换为'groceries',依此类推.

I am creating a small financial management program which imports my transactions from CSV into Python. I want to assign values to a new column 'category' based on strings found in the 'details' column. I can do it for one, but my question is how do I do it if I had a huge list of possible strings? For example str.contains('RALPHS') will replace that column value with 'groceries', and so on.

例如,下面有一个字符串列表:

For example, below I have a list of strings:

dining = ['CARLS', 'SUBWAY', 'DOMINOS']

,如果在我的系列中找到了这些字符串中的任何一个,那么它将把相应的类别系列更新为正在用餐".

and if either of those strings is found in my series, then it will update the respective category series to be 'dining'.

这是下面的一个小的可运行示例.

Here is a small run-able example below.

import pandas as pd
import numpy as np

data = [
    [-68.23 , 'PAYPAL TRANSFER'],
    [-12.46, 'RALPHS #0079'],
    [-8.51, 'SAVE AS YOU GO'],
    [25.34, 'VENMO CASHOUT'],
    [-2.23 , 'PAYPAL TRANSFER'],
    [-64.29 , 'PAYPAL TRANSFER'],
    [-7.06, 'SUBWAY'],
    [-7.03, 'CARLS JR'],
    [-2.35, 'SHELL OIL'],
    [-35.23, 'CHEVRON GAS']
]

df = pd.DataFrame(data, columns=['amount', 'details'])
df['category'] = np.nan
str_xfer = 'TRANSFER'
df['category'] = (df['details'].str.contains(str_xfer)).astype(int)
df['category'] = df['category'].replace(
                                                            to_replace=1,
                                                            value='transfer')

df

    amount  details             category
0   -68.23  PAYPAL TRANSFER     transfer
1   -12.46  RALPHS              0
2   -8.51   SAVE AS YOU GO      0
3   25.34   VENMO CASHOUT       0
4   -2.23   PAYPAL TRANSFER     transfer
5   -64.29  PAYPAL TRANSFER     transfer
6   -7.06   SUBWAY              0
7   -7.03   CARLS JR            0
8   -2.35   SHELL OIL           0
9   -35.23  CHEVRON GAS         0

非常感谢.

推荐答案

如果您有一个值,我们可以使用

If you have one value, we can use str.extract:

df['category'] = df['details'].str.extract(f'({str_xfer})')

   amount          details  category
0  -68.23  PAYPAL TRANSFER  TRANSFER
1  -12.46     RALPHS #0079       NaN
2   -8.51   SAVE AS YOU GO       NaN
3   25.34    VENMO CASHOUT       NaN
4   -2.23  PAYPAL TRANSFER  TRANSFER
5  -64.29  PAYPAL TRANSFER  TRANSFER


如果要匹配多个字符串,我们必须先用|分隔字符串,|是正则表达式中的运算符.


If you have multiple strings to match, we have to delimit your strings first by |, which is the or operator in regular expressions.

str_xfer = ['TRANSFER', 'RALPHS', 'CASHOUT']
str_xfer = '|'.join(str_xfer)

df['category'] = df['details'].str.extract(f'({str_xfer})')

   amount          details  category
0  -68.23  PAYPAL TRANSFER  TRANSFER
1  -12.46     RALPHS #0079    RALPHS
2   -8.51   SAVE AS YOU GO       NaN
3   25.34    VENMO CASHOUT   CASHOUT
4   -2.23  PAYPAL TRANSFER  TRANSFER
5  -64.29  PAYPAL TRANSFER  TRANSFER

这篇关于数据框如何基于许多str值更新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆