pandas 和应用函数以匹配字符串 [英] Pandas and apply function to match a string

查看:69
本文介绍了 pandas 和应用函数以匹配字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df列,其中包含各种链接,其中一些包含字符串"search".

I have a df column containing various links, some of them containing the string "search".

我想创建一个函数-应用于列-返回包含"search""other"的列.

I want to create a function that - being applied to the column - returns a column containing "search" or "other".

我写了一个像这样的函数

I write a function like:

search = 'search'
def page_type(x):
if x.str.contains(search):
    return 'Search'
else:
    return 'Other'   

df['link'].apply(page_type)

但是它给了我一个错误,例如:

but it gives me an error like:

AttributeError:"unicode"对象没有属性"str"

AttributeError: 'unicode' object has no attribute 'str'

我猜我在调用str.contains()时会丢失某些东西.

I guess I'm missing something when calling the str.contains().

推荐答案

我认为您需要

I think you need numpy.where:

df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']})

print (df)
         link
0      search
1  homepage d
2    login dd
3   profile t
4          ff

search = 'search'
profile = 'profile'
homepage = 'homepage'
login = "login"

def page_type(x):
    if search in x:
        return 'Search'
    elif profile in x:
        return 'Profile'
    elif homepage in x:
        return 'Homepage'
    elif login in x:
        return 'Login'
    else:
        return 'Other'  

df['link_new'] = df['link'].apply(page_type)

df['link_type'] = np.where(df.link.str.contains(search),'Search', 
                  np.where(df.link.str.contains(profile),'Profile', 
                  np.where(df.link.str.contains(homepage), 'Homepage', 
                  np.where(df.link.str.contains(login),'Login','Other')))) 


print (df)
         link  link_new link_type
0      search    Search    Search
1  homepage d  Homepage  Homepage
2    login dd     Login     Login
3   profile t   Profile   Profile
4          ff     Other     Other

时间:

#[5000 rows x 1 columns]
df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']})
df = pd.concat([df]*1000).reset_index(drop=True)

In [346]: %timeit df['link'].apply(page_type)
1000 loops, best of 3: 1.72 ms per loop

In [347]: %timeit np.where(df.link.str.contains(search),'Search', np.where(df.link.str.contains(profile),'Profile', np.where(df.link.str.contains(homepage), 'Homepage', np.where(df.link.str.contains(login),'Login','Other'))))
100 loops, best of 3: 11.7 ms per loop

这篇关于 pandas 和应用函数以匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆