剥离字符串并获取开始索引和结束索引 [英] Stripping a string and getting start index and end index
问题描述
Python 中是否有任何直接的方法来剥离字符串并获取开始索引和结束索引?
示例:给定字符串 ' hello world!'
,我想要被剥离的字符串 'hello world!'
以及起始索引 2
和索引 14
.
' 世界你好!'.strip()
只返回剥离后的字符串.
我可以写一个函数:
def strip(str):'''取一个字符串作为输入.返回剥离的字符串以及开始索引和结束索引.示例:'你好世界!' -->('你好世界!', 2, 14)该函数的计算效率不高,因为它对字符串执行多次传递.'''str_stripped = str.strip()index_start = str.find(str_stripped)index_end = index_start + len(str_stripped)返回 str_stripped、index_start、index_end定义主():str = '你好世界!'str_stripped, index_start, index_end = strip(str)打印('index_start: {0}\tindex_end: {1}'.format(index_start, index_end))如果 __name__ == "__main__":主要的()
但我想知道 Python 或一个流行的库是否提供了任何内置的方法来做到这一点.
一种选择(可能不是最直接的)是使用正则表达式:
<预><代码>>>>进口重新>>>s = '你好世界!'>>>match = re.search(r"^\s*(\S.*?)\s*$", s)>>>match.group(1)、match.start(1)、match.end(1)('你好世界!', 2, 14)^\s*(\S.*?)\s*$
模式中的位置:
^
是一个字符串的开头\s*
零个或多个空格字符(\S.*?)
是一个捕获组,它将捕获一个非空格字符,后跟任意字符在 非贪婪时尚$
是一个字符串的结尾
Is there any straightforward way in Python to strip a string and get the start index and the end index?
Example: Given the string ' hello world! '
, I want to the stripped string 'hello world!'
As well as the start index 2
and the and index 14
.
' hello world! '.strip()
only returns the stripped string.
I could write a function:
def strip(str):
'''
Take a string as input.
Return the stripped string as well as the start index and end index.
Example: ' hello world! ' --> ('hello world!', 2, 14)
The function isn't computationally efficient as it does more than one pass on the string.
'''
str_stripped = str.strip()
index_start = str.find(str_stripped)
index_end = index_start + len(str_stripped)
return str_stripped, index_start, index_end
def main():
str = ' hello world! '
str_stripped, index_start, index_end = strip(str)
print('index_start: {0}\tindex_end: {1}'.format(index_start, index_end))
if __name__ == "__main__":
main()
but I wonder whether Python or one popular library provides any built-in way to do so.
One option (probably not the most straight-forward) would be to do it with regular expressions:
>>> import re
>>> s = ' hello world! '
>>> match = re.search(r"^\s*(\S.*?)\s*$", s)
>>> match.group(1), match.start(1), match.end(1)
('hello world!', 2, 14)
where in ^\s*(\S.*?)\s*$
pattern:
^
is a beginning of a string\s*
zero or more space characters(\S.*?)
is a capturing group that would capture a non-space character followed by any characters any number of times in a non-greedy fashion$
is an end of a string
这篇关于剥离字符串并获取开始索引和结束索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!