获取最后的“列"对 pandas DataFrame中的列进行.str.split()操作后 [英] Get last "column" after .str.split() operation on column in pandas DataFrame

查看:196
本文介绍了获取最后的“列"对 pandas DataFrame中的列进行.str.split()操作后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas DataFrame中有一个列,想在一个空格上拆分.使用DataFrame.str.split(' ')进行拆分非常简单,但是我无法从最后一个条目中创建新列.当我.str.split()该列时,我得到了一个数组列表,我不知道如何操纵它来为我的DataFrame获取一个新列.

I have a column in a pandas DataFrame that I would like to split on a single space. The splitting is simple enough with DataFrame.str.split(' '), but I can't make a new column from the last entry. When I .str.split() the column I get a list of arrays and I don't know how to manipulate this to get a new column for my DataFrame.

这里是一个例子.列中的每个条目均包含符号数据价格",我想将价格分开(并在一半的情况下最终删除"p" ...或"c").

Here is an example. Each entry in the column contains 'symbol data price' and I would like to split off the price (and eventually remove the "p"... or "c" in half the cases).

import pandas as pd
temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
temp2 = temp.ticker.str.split(' ')

产生

0    ['spx', '5/25/2001', 'p500']
1    ['spx', '5/25/2001', 'p600']
2    ['spx', '5/25/2001', 'p700']

但是temp2[0]仅给出一个列表条目的数组,而temp2[:][-1]失败.如何将每个数组中的最后一个条目转换为新列?谢谢!

But temp2[0] just gives one list entry's array and temp2[:][-1] fails. How can I convert the last entry in each array to a new column? Thanks!

推荐答案

您可以使用tolist方法作为中介:

You could use the tolist method as an intermediary:

In [99]: import pandas as pd

In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})

In [101]: d1.ticker.str.split().tolist()
Out[101]: 
[['spx', '5/25/2001', 'p500'],
 ['spx', '5/25/2001', 'p600'],
 ['spx', '5/25/2001', 'p700']]

您可以从中创建一个新的DataFrame:

From which you could make a new DataFrame:

In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), 
   .....:                   columns="symbol date price".split())

In [103]: d2
Out[103]: 
  symbol       date price
0    spx  5/25/2001  p500
1    spx  5/25/2001  p600
2    spx  5/25/2001  p700

出于良好的考虑,您可以确定价格:

For good measure, you could fix the price:

In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)

In [105]: d2
Out[105]: 
  symbol       date  price
0    spx  5/25/2001    500
1    spx  5/25/2001    600
2    spx  5/25/2001    700

PS:但是如果您真的只想要最后一列,则apply就足够了:

PS: but if you really just want the last column, apply would suffice:

In [113]: temp2.apply(lambda x: x[2])
Out[113]: 
0    p500
1    p600
2    p700
Name: ticker

这篇关于获取最后的“列"对 pandas DataFrame中的列进行.str.split()操作后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆