在Python DataFrame中拆分字符串 [英] Splitting a string in a Python DataFrame
问题描述
我在Python中有一个带有名称列的DataFrame(例如Joseph Haydn,Wolfgang Amadeus Mozart,Antonio Salieri等).
I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).
我想要一个带有姓氏的新专栏:海顿(Haydn),莫扎特(Mozart),萨列里(Salieri)等.
I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.
我知道如何分割字符串,但是我找不到将其应用于系列或数据框"列的方法.
I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.
推荐答案
如果有:
import pandas
data = pandas.DataFrame({"composers": [
"Joseph Haydn",
"Wolfgang Amadeus Mozart",
"Antonio Salieri",
"Eumir Deodato"]})
假设您只需要名字(而不是像Amadeus这样的中间名):
assuming you want only the first name (and not the middle name like Amadeus):
data.composers.str.split('\s+').str[0]
将给出:
0 Joseph
1 Wolfgang
2 Antonio
3 Eumir
dtype: object
您可以将其分配给同一数据框中的新列:
you can assign this to a new column in the same dataframe:
data['firstnames'] = data.composers.str.split('\s+').str[0]
姓氏将是:
data.composers.str.split('\s+').str[-1]
给出:
0 Haydn
1 Mozart
2 Salieri
3 Deodato
dtype: object
(另请参见 Python熊猫:在数组列中选择元素用于访问数组"列中的元素)
(see also Python Pandas: selecting element in array column for accessing elements in an 'array' column)
对于除姓氏以外的所有内容,您可以将" ".join(..)
应用于除每行的除最后一个元素([:-1]
)之外的所有内容:
For all but the last names you can apply " ".join(..)
to all but the last element ([:-1]
) of each row:
data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))
给出:
0 Joseph
1 Wolfgang Amadeus
2 Antonio
3 Eumir
dtype: object
这篇关于在Python DataFrame中拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!