如何在Python中从列表中删除日期 [英] How to remove dates from a list in Python
问题描述
我有一个标记化文本列表(list_of_words),看起来像这样:
I have a list of tokenized text (list_of_words) that looks something like this:
list_of_words =
['08/20/2014',
'10:04:27',
'pm',
'complet',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complet',
...]
,并且我试图从该列表中删除日期和时间的所有实例.我尝试使用.remove()函数,但无济于事.我尝试将通配符(例如'../../....)传递到我正在使用的停用词列表中,但这没有用,我终于尝试编写以下代码:
and I'm trying to strip out all the instances of dates and times from this list. I've tried using the .remove() function, to no avail. I've tried passing wildcard characters, such as '../../...." to a list of stopwords I was sorting with, but that didn't work. I finally tried writing the following code:
for line in list_of_words:
if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line):
list_of_words.remove(line)
但这也不起作用.如何从列表中删除日期或时间格式的所有内容?
but that doesn't work either. How can I strip out everything formatted like a date or time from my list?
推荐答案
说明
^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$
此正则表达式将执行以下操作:
This regular expression will do the following:
- 查找看起来像日期
12/23/2016
和时间12:34:56
的字符串
- 找到同样也是
am
或pm
的字符串,它们可能也是源列表中先前时间的一部分
- find strings which look like dates
12/23/2016
and times12:34:56
- find strings which also are also
am
orpm
which are probably part of the preceding time in the source list
实时演示
- Regex: https://regex101.com/r/yE8oB9/2
- Python: http://codepad.org/X9D3pd7s
- Regex: https://regex101.com/r/yE8oB9/2
- Python: http://codepad.org/X9D3pd7s
样品清单
08/20/2014
10:04:27
pm
complete
vendor
per
mfg/recommend
08/20/2014
10:04:27
pm
complete
处理后列出
complete
vendor
per
mfg/recommend
complete
示例Python脚本
import re
SourceList = ['08/20/2014',
'10:04:27',
'pm',
'complete',
'vendor',
'per',
'mfg/recommend',
'08/20/2014',
'10:04:27',
'pm',
'complete']
OutputList = filter(
lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord),
SourceList)
for ThisValue in OutputList:
print ThisValue
说明
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[:\/,] any character of: ':', '\/', ','
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
[0-9]{2,4} any character of: '0' to '9' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
am 'am'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
pm 'pm'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
这篇关于如何在Python中从列表中删除日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!