用Python抓取一行的空白/缩进 [英] Grab a line's whitespace/indention with Python
问题描述
基本上,如果我有一排以缩进开头的文本,那么获取该缩进并将其放入Python变量中的最佳方法是什么?例如,如果该行是:
Basically, if I have a line of text which starts with indention, what's the best way to grab that indention and put it into a variable in Python? For example, if the line is:
\t\tthis line has two tabs of indention
然后它将返回'\ t \ t'.或者,如果该行是:
Then it would return '\t\t'. Or, if the line was:
this line has four spaces of indention
然后它将返回四个空格.
Then it would return four spaces.
所以我猜你可以说我只需要从字符串中去除所有内容,从第一个非空白字符到最后一个字符.有想法吗?
So I guess you could say that I just need to strip everything from a string from first non-whitespace character to the end. Thoughts?
推荐答案
import re
s = "\t\tthis line has two tabs of indention"
re.match(r"\s*", s).group()
// "\t\t"
s = " this line has four spaces of indention"
re.match(r"\s*", s).group()
// " "
要删除前导空格,请使用 lstrip .
And to strip leading spaces, use lstrip.
由于可能有很多人质疑正则表达式的效率,所以我进行了一些分析以检查每个案例的效率.
As there are down votes probably questioning the efficiency of regex, I've done some profiling to check the efficiency of each cases.
RegEx> Itertools >> lstrip
RegEx > Itertools >> lstrip
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*")s=" hello world!"*10000', number=100000)
0.10037684440612793
>>> timeit.timeit('"".join(itertools.takewhile(lambda x:x.isspace(),s))', 'import itertools;s=" hello world!"*10000', number=100000)
0.7092740535736084
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" hello world!"*10000', number=100000)
0.51730513572692871
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" hello world!"*10000', number=100000)
2.6478431224822998
非常短的字符串,非常短的前导空格
lstrip> RegEx> Itertools
Very short string, very short leading space
lstrip > RegEx > Itertools
如果您可以将字符串的长度限制为不超过千个字符,则lstrip技巧可能会更好.
If you can limit the string's length to thousounds of chars or less, the lstrip trick maybe better.
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" hello world!"*100', number=100000)
0.099548101425170898
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" hello world!"*100', number=100000)
0.53602385520935059
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" hello world!"*100', number=100000)
0.064291000366210938
这显示lstrip技巧的缩放比例大致为O(√n),并且如果前导空格的数量不是很多,则RegEx和itertool方法为O(1).
This shows the lstrip trick scales roughly as O(√n) and the RegEx and itertool methods are O(1) if the number of leading spaces is not a lot.
lstrip >> RegEx >>> Itertools
lstrip >> RegEx >>> Itertools
如果有很多前导空格,请不要使用RegEx.
If there are a lot of leading spaces, don't use RegEx.
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*2000', number=10000)
0.047424077987670898
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*2000', number=10000)
0.2433168888092041
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*2000', number=10000)
3.9949162006378174
很长的字符串,很长的前导空格
lstrip >>> RegEx >>>>>>>> Itertools
Very long string, very long leading space
lstrip >>> RegEx >>>>>>>> Itertools
>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*200000', number=10000)
4.2374031543731689
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*200000', number=10000)
23.877214908599854
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*200000', number=100)*100
415.72158336639404
这显示了如果非空间部分不是很多,所有方法的缩放比例大致为O(m).
This shows all methods scales roughly as O(m) if the non-space part is not a lot.
这篇关于用Python抓取一行的空白/缩进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!