如何在Python中从列表中删除日期 [英] How to remove dates from a list in Python

查看:497
本文介绍了如何在Python中从列表中删除日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个标记化文本列表(list_of_words),看起来像这样:

I have a list of tokenized text (list_of_words) that looks something like this:

list_of_words = 
['08/20/2014',
 '10:04:27',
 'pm',
 'complet',
 'vendor',
 'per',
 'mfg/recommend',
 '08/20/2014',
 '10:04:27',
 'pm',
 'complet',
 ...]

,并且我试图从该列表中删除日期和时间的所有实例.我尝试使用.remove()函数,但无济于事.我尝试将通配符(例如'../../....)传递到我正在使用的停用词列表中,但这没有用,我终于尝试编写以下代码:

and I'm trying to strip out all the instances of dates and times from this list. I've tried using the .remove() function, to no avail. I've tried passing wildcard characters, such as '../../...." to a list of stopwords I was sorting with, but that didn't work. I finally tried writing the following code:

for line in list_of_words:
    if re.search('[0-9]{2}/[09]{2}/[0-9]{4}',line):
        list_of_words.remove(line)

但这也不起作用.如何从列表中删除日期或时间格式的所有内容?

but that doesn't work either. How can I strip out everything formatted like a date or time from my list?

推荐答案

说明

^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$

此正则表达式将执行以下操作:

This regular expression will do the following:

  • 查找看起来像日期12/23/2016和时间12:34:56
  • 的字符串
  • 找到同样也是ampm的字符串,它们可能也是源列表中先前时间的一部分
  • find strings which look like dates 12/23/2016 and times 12:34:56
  • find strings which also are also am or pm which are probably part of the preceding time in the source list

实时演示

  • Regex: https://regex101.com/r/yE8oB9/2
  • Python: http://codepad.org/X9D3pd7s

样品清单

08/20/2014
10:04:27
pm
complete
vendor
per
mfg/recommend
08/20/2014
10:04:27
pm
complete

处理后列出

complete
vendor
per
mfg/recommend
complete

示例Python脚本

import re

SourceList = ['08/20/2014',
                 '10:04:27',
                 'pm',
                 'complete',
                 'vendor',
                 'per',
                 'mfg/recommend',
                 '08/20/2014',
                 '10:04:27',
                 'pm', 
                 'complete']

OutputList = filter(
    lambda ThisWord: not re.match('^(?:(?:[0-9]{2}[:\/,]){2}[0-9]{2,4}|am|pm)$', ThisWord),
    SourceList)


for ThisValue in OutputList:
  print ThisValue

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?:                      group, but do not capture (2 times):
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
      [:\/,]                   any character of: ':', '\/', ','
----------------------------------------------------------------------
    ){2}                     end of grouping
----------------------------------------------------------------------
    [0-9]{2,4}               any character of: '0' to '9' (between 2
                             and 4 times (matching the most amount
                             possible))
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    am                       'am'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    pm                       'pm'
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------

这篇关于如何在Python中从列表中删除日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆