在 Python 中拆分空字符串时,为什么 split() 返回空列表而 split(' ') 返回 ['']? [英] When splitting an empty string in Python, why does split() return an empty list while split(' ') returns ['']?

查看:38
本文介绍了在 Python 中拆分空字符串时,为什么 split() 返回空列表而 split(' ') 返回 ['']?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 split(' ') 获取一个字符串中的行,发现 ''.split() 返回一个空列表,[],而 ''.split(' ') 返回 [''].这种差异有什么具体原因吗?

有没有更方便的方法来计算字符串中的行数?

解决方案

问题:我使用 split(' ') 获取一个字符串中的行,发现 ''.split() 返回一个空列表,[],而 ''.split(' ') 返回 [''].

str.split() 方法有两种算法.如果没有给出参数,它会在重复运行空格时拆分.但是,如果给出参数,则将其视为单个分隔符,不会重复运行.

在拆分空字符串的情况下,第一个模式(无参数)将返回一个空列表,因为空格被吃掉并且结果列表中没有值.

相反,第二种模式(带有诸如 之类的参数)将产生第一个空字段.考虑一下如果你写了 ' '.split(' '),你会得到两个字段(一个拆分,给你两半).

<块引用>

问题:这种差异有什么具体原因吗?

当数据在具有可变数量空白的列中对齐时,第一种模式很有用.例如:

<预><代码>>>>数据 = '''加利福尼亚州沙斯塔 14,200麦金利阿拉斯加 20,300富士日本 12,400'''>>>对于 data.splitlines() 中的行:打印(行.拆分())['沙斯塔'、'加利福尼亚'、'14,200']['麦金利'、'阿拉斯加'、'20,300']['富士'、'日本'、'12,400']

第二种模式适用于分隔数据,例如 CSV,其中重复的逗号表示空领域.例如:

<预><代码>>>>数据 = '''Guido,BDFL,,阿姆斯特丹Barry,FLUFL,美国蒂姆,,,美国'''>>>对于 data.splitlines() 中的行:打印(line.split(','))['Guido'、'BDFL'、''、'阿姆斯特丹']['巴里', 'FLUFL', '', '美国']['蒂姆', '', '', '美国']

注意,结果字段的数量比分隔符的数量多一.想想剪一根绳子.如果你不做任何切割,你就有一块.切一刀,得到两块.切两刀,得到三片.Python 的 str.split(delimiter) 方法也是如此:

<预><代码>>>>''.split(',') # 没有剪切['']>>>','.split(',') # 一剪['', '']>>>',,'.split(',') # 两次切割['', '', '']

<块引用>

问题:有没有更方便的方法来计算字符串中的行数?

是的,有几种简单的方法.一种使用 str.count() 和另一个使用 str.splitlines().除非最后一行缺少 ,否则两种方法都会给出相同的答案.如果缺少最后的换行符,str.splitlines 方法将给出准确的答案.一种更快但也准确的技术使用计数方法,但随后会针对最后的换行符对其进行更正:

<预><代码>>>>数据 = '''1号线2号线3号线第 4 行'''>>>data.count(' ') # 不准确3>>>len(data.splitlines()) # 准确,但速度慢4>>>data.count(' ') + (not data.endswith(' ')) # 准确快速4

<块引用>

来自@Kaz 的问题:为什么两种截然不同的算法硬塞到一个函数中?

str.split 的签名大约有 20 年的历史,那个时代的许多 API 都非常实用.虽然不完美,但方法签名并不糟糕".任何一个.在大多数情况下,Guido 的 API 设计选择经受住了时间的考验.

当前的 API 并非没有优势.考虑如下字符串:

ps_aux_header = '用户 PID %CPU %MEM VSZ'patient_header = '姓名、年龄、身高、体重'

当被要求将这些字符串分成多个字段时,人们倾向于使用相同的英文单词split"来描述两者.当被要求阅读诸如 fields = line.split()fields = line.split(',') 之类的代码时,人们倾向于正确地将这些语句解释为;将一行拆分为多个字段".

Microsoft Excel 的 text-to-columns 工具 做出了类似的 API 选择和在同一个工具中合并了两种分割算法.尽管涉及多个算法,但人们似乎在心理上将场分裂建模为一个单一的概念.

I am using split(' ') to get lines in one string, and found that ''.split() returns an empty list, [], while ''.split(' ') returns ['']. Is there any specific reason for such a difference?

And is there any more convenient way to count lines in a string?

解决方案

Question: I am using split(' ') to get lines in one string, and found that ''.split() returns an empty list, [], while ''.split(' ') returns [''].

The str.split() method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.

In contrast, the second mode (with an argument such as ) will produce the first empty field. Consider if you had written ' '.split(' '), you would get two fields (one split, gives you two halves).

Question: Is there any specific reason for such a difference?

This first mode is useful when data is aligned in columns with variable amounts of whitespace. For example:

>>> data = '''
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print(line.split())

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

The second mode is useful for delimited data such as CSV where repeated commas denote empty fields. For example:

>>> data = '''
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print(line.split(','))

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

Note, the number of result fields is one greater than the number of delimiters. Think of cutting a rope. If you make no cuts, you have one piece. Making one cut, gives two pieces. Making two cuts, gives three pieces. And so it is with Python's str.split(delimiter) method:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

Question: And is there any more convenient way to count lines in a string?

Yes, there are a couple of easy ways. One uses str.count() and the other uses str.splitlines(). Both ways will give the same answer unless the final line is missing the . If the final newline is missing, the str.splitlines approach will give the accurate answer. A faster technique that is also accurate uses the count method but then corrects it for the final newline:

>>> data = '''
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('
')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('
') + (not data.endswith('
'))   # Accurate and fast
4    

Question from @Kaz: Why the heck are two very different algorithms shoe-horned into a single function?

The signature for str.split is about 20 years old, and a number of the APIs from that era are strictly pragmatic. While not perfect, the method signature isn't "terrible" either. For the most part, Guido's API design choices have stood the test of time.

The current API is not without advantages. Consider strings such as:

ps_aux_header  = 'USER               PID  %CPU %MEM      VSZ'
patient_header = 'name,age,height,weight'

When asked to break these strings into fields, people tend to describe both using the same English word, "split". When asked to read code such as fields = line.split() or fields = line.split(','), people tend to correctly interpret the statements as "splits a line into fields".

Microsoft Excel's text-to-columns tool made a similar API choice and incorporates both splitting algorithms in the same tool. People seem to mentally model field-splitting as a single concept even though more than one algorithm is involved.

这篇关于在 Python 中拆分空字符串时,为什么 split() 返回空列表而 split(' ') 返回 ['']?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆