从列表列中提取值 [英] Extract values from a column of lists

查看:58
本文介绍了从列表列中提取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框,列sequence的值是一个列表:

I have the following data frame, the value of the column sequence is a list:

id      sequence
001    [A, B, C, E, F]
002    [A, C]
003    []
004    [D]

我想创建两个新列,分别称为 first second_to_last : first 指示序列中列表的第一个元素列, second_to_last 指示 sequence 列中列表的倒数第二个元素.我期望新的 df 是这样的:

I want to create two new columns called first and second_to_last: first indicating the first element of the list in the sequence column, second_to_last indicating the second to last element of the list in the sequence column. I am expecting the new df to be like:

id      sequence             first    second_to_last
001    [A, B, C, E, F]        A        E
002    [A, C]                 A        A
003    []                     None     None
004    [D]                    D        None

我尝试使用以下代码:

df['first'] = df['sequence'][0]
df['second_to_last'] = df['sequence'][-2]

但是出现以下错误:

There was a problem running this cell
ValueError Length of values does not match length of index 
ValueErrorTraceback (most recent call last)
<ipython-input-9-f08abfd1f93c> in <module>()

----> 2 df['first'] = df['sequence'][0]
      3 df['second_to_last'] = df['sequence'][-2]
      4 df

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2427         else:
   2428             # set column
-> 2429             self._set_item(key, value)
   2430 
   2431     def _setitem_slice(self, key, value):

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2493 
   2494         self._ensure_valid_index(value)
-> 2495         value = self._sanitize_column(key, value)
   2496         NDFrame._set_item(self, key, value)
   2497 

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value, broadcast)
   2664 
   2665             # turn me into an ndarray
-> 2666             value = _sanitize_index(value, self.index, copy=False)
   2667             if not isinstance(value, (np.ndarray, Index)):
   2668                 if isinstance(value, list) and len(value) > 0:

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_index(data, index, copy)
   2877 
   2878     if len(data) != len(index):
-> 2879         raise ValueError('Length of values does not match length of ' 'index')
   2880 
   2881     if isinstance(data, PeriodIndex):

ValueError: Length of values does not match length of index

提取列 first second_to_last 的值的正确方法应该是什么?谢谢!

What should be the correct way of extract values for column first and second_to_last? Thanks!

推荐答案

选项1
处理熊猫中的字符串列/其他可变对象时,需要使用 str 访问器.

df['first'] = df['sequence'].str[0]
df['second_to_last'] = df['sequence'].str[-2]

df
   id         sequence first second_to_last
0   1  [A, B, C, E, F]     A              E
1   2           [A, C]     A              A
2   3               []   NaN            NaN
3   4              [D]     D            NaN


选项2
另一个选择是定义您自己的函数,以其给定索引检索项目:


Option 2
Another option would be defining your own function to retrieve items at the their given index:

def get_value(d, i):
    try:
        return d[i]
    except IndexError:
        return np.nan

环绕 df.sequence :

df['first'] = [get_value(d, 0) for d in df.sequence]
df['second_to_last'] = [get_value(d, -2) for d in df.sequence]

df

   id         sequence first second_to_last
0   1  [A, B, C, E, F]     A              E
1   2           [A, C]     A              A
2   3               []   NaN            NaN
3   4              [D]     D            NaN

这篇关于从列表列中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆