使用另一列作为索引的Pandas子字符串 [英] Pandas substring using another column as the index

查看：143 发布时间：2020/5/24 2:56:00 python string pandas substring

本文介绍了使用另一列作为索引的Pandas子字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用包含开始索引的一列来子选择一个字符串列.

I'm trying to use one column containing the start index to subselect a string column.

df = pd.DataFrame({'string': ['abcdef', 'bcdefg'], 'start_index': [3, 5]})
expected = pd.Series(['def', 'g'])

我知道您可以使用以下子字符串

I know that you can substring with the following

df['string'].str[3:]

但是，就我而言，开始索引可能会有所不同，所以我尝试了:

However, in my case, the start index may vary, so I tried:

df['string'].str[df['start_index']:]

但是它返回NaNs.

如果我不想使用循环/列表理解怎么办?即首选矢量化方法.

What if I don't want to use a loop / list comprehension; i.e. vectorized method preferred.

在这个小的测试用例中，列表理解似乎更快.

In this small test case, it seems like list comprehension is faster.

from itertools import islice
%timeit df.apply(lambda x: ''.join(islice(x.string, x.start_index, None)), 1)
%timeit pd.Series([x[y:] for x , y in zip(df.string,df.start_index) ])

631 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
101 µs ± 233 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

推荐答案

使用带有两列zip的for循环，为什么我们在这里使用for循环，您可以检查

Using for loop with zip of two columns , why we are using for loop here, you can check the link

[x[y:] for x , y in zip(df.string,df.start_index) ]
Out[328]: ['def', 'g']

这篇关于使用另一列作为索引的Pandas子字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用另一列作为索引的Pandas子字符串 [英] Pandas substring using another column as the index

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用另一列作为索引的Pandas子字符串 [英] Pandas substring using another column as the index

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭