在Python DataFrame中拆分字符串 [英] Splitting a string in a Python DataFrame

查看:725
本文介绍了在Python DataFrame中拆分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python中有一个带有名称列的DataFrame(例如Joseph Haydn,Wolfgang Amadeus Mozart,Antonio Salieri等).

I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).

我想要一个带有姓氏的新专栏:海顿(Haydn),莫扎特(Mozart),萨列里(Salieri)等.

I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.

我知道如何分割字符串,但是我找不到将其应用于系列或数据框"列的方法.

I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.

推荐答案

如果有:

import pandas
data = pandas.DataFrame({"composers": [ 
    "Joseph Haydn", 
    "Wolfgang Amadeus Mozart", 
    "Antonio Salieri",
    "Eumir Deodato"]})

假设您只需要名字(而不是像Amadeus这样的中间名):

assuming you want only the first name (and not the middle name like Amadeus):

data.composers.str.split('\s+').str[0]

将给出:

0      Joseph
1    Wolfgang
2     Antonio
3       Eumir
dtype: object

您可以将其分配给同一数据框中的新列:

you can assign this to a new column in the same dataframe:

data['firstnames'] = data.composers.str.split('\s+').str[0]

姓氏将是:

data.composers.str.split('\s+').str[-1]

给出:

0      Haydn
1     Mozart
2    Salieri
3    Deodato
dtype: object

(另请参见 Python熊猫:在数组列中选择元素用于访问数组"列中的元素)

(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)

对于除姓氏以外的所有内容,您可以将" ".join(..)应用于除每行的除最后一个元素([:-1])之外的所有内容:

For all but the last names you can apply " ".join(..) to all but the last element ([:-1]) of each row:

data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))

给出:

0              Joseph
1    Wolfgang Amadeus
2             Antonio
3               Eumir
dtype: object

这篇关于在Python DataFrame中拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆