从列表列中提取值 [英] Extract values from a column of lists
问题描述
我有以下数据框,列sequence
的值是一个列表:
I have the following data frame, the value of the column sequence
is a list:
id sequence
001 [A, B, C, E, F]
002 [A, C]
003 []
004 [D]
我想创建两个新列,分别称为 first
和 second_to_last
: first
指示序列中列表的第一个元素列, second_to_last
指示 sequence
列中列表的倒数第二个元素.我期望新的 df
是这样的:
I want to create two new columns called first
and second_to_last
: first
indicating the first element of the list in the sequence
column, second_to_last
indicating the second to last element of the list in the sequence
column. I am expecting the new df
to be like:
id sequence first second_to_last
001 [A, B, C, E, F] A E
002 [A, C] A A
003 [] None None
004 [D] D None
我尝试使用以下代码:
df['first'] = df['sequence'][0]
df['second_to_last'] = df['sequence'][-2]
但是出现以下错误:
There was a problem running this cell
ValueError Length of values does not match length of index
ValueErrorTraceback (most recent call last)
<ipython-input-9-f08abfd1f93c> in <module>()
----> 2 df['first'] = df['sequence'][0]
3 df['second_to_last'] = df['sequence'][-2]
4 df
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
2427 else:
2428 # set column
-> 2429 self._set_item(key, value)
2430
2431 def _setitem_slice(self, key, value):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in _set_item(self, key, value)
2493
2494 self._ensure_valid_index(value)
-> 2495 value = self._sanitize_column(key, value)
2496 NDFrame._set_item(self, key, value)
2497
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value, broadcast)
2664
2665 # turn me into an ndarray
-> 2666 value = _sanitize_index(value, self.index, copy=False)
2667 if not isinstance(value, (np.ndarray, Index)):
2668 if isinstance(value, list) and len(value) > 0:
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in _sanitize_index(data, index, copy)
2877
2878 if len(data) != len(index):
-> 2879 raise ValueError('Length of values does not match length of ' 'index')
2880
2881 if isinstance(data, PeriodIndex):
ValueError: Length of values does not match length of index
提取列 first
和 second_to_last
的值的正确方法应该是什么?谢谢!
What should be the correct way of extract values for column first
and second_to_last
? Thanks!
推荐答案
选项1
处理熊猫中的字符串列/其他可变对象时,需要使用 str
访问器.
df['first'] = df['sequence'].str[0]
df['second_to_last'] = df['sequence'].str[-2]
df
id sequence first second_to_last
0 1 [A, B, C, E, F] A E
1 2 [A, C] A A
2 3 [] NaN NaN
3 4 [D] D NaN
选项2
另一个选择是定义您自己的函数,以其给定索引检索项目:
Option 2
Another option would be defining your own function to retrieve items at the their given index:
def get_value(d, i):
try:
return d[i]
except IndexError:
return np.nan
环绕 df.sequence
:
df['first'] = [get_value(d, 0) for d in df.sequence]
df['second_to_last'] = [get_value(d, -2) for d in df.sequence]
df
id sequence first second_to_last
0 1 [A, B, C, E, F] A E
1 2 [A, C] A A
2 3 [] NaN NaN
3 4 [D] D NaN
这篇关于从列表列中提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!