通过拆分从字符串中获取日期 [英] Get date from string by splitting
问题描述
我有一批原始文本文件.每个文件都以Date>>month.day year 新闻垃圾
开头.
I have a batch of raw text files. Each file begins with Date>>month.day year News garbage
.
garbage
是一大堆我不需要的文本,而且长度各不相同.Date>>
和 News
这两个词总是出现在同一个地方,不会改变.
garbage
is a whole lot of text I don't need, and varies in length. The words Date>>
and News
always appear in the same place and do not change.
我想复制 month day year 并将此数据插入到 CSV 文件中,每个文件都换行,格式为 day month year.
I want to copy month day year and insert this data into a CSV file, with a new line for every file in the format day month year.
如何将月日年复制到单独的变量中?
How do I copy month day year into separate variables?
我尝试在已知单词之后和已知单词之前拆分字符串.我熟悉 string[x:y],但我基本上想将 x 和 y 从数字更改为实际单词(即 string[Date>>:News])
I tryed to split a string after a known word and before a known word. I'm familiar with string[x:y], but I basically want to change x and y from numbers into actual words (i.e. string[Date>>:News])
import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
pass
else:
# Tell the script the file is in this directory and can be written
file = open(folder+'/'+filename, "r+")
filecontents = file.read()
thestring = str(filecontents)
print thestring[9:20]
示例文本文件:
Date>>January 2. 2012 News 122
5 different news agencies have reported the story of a man washing his dog.
推荐答案
这是使用 re
模块:
Here's a solution using the re
module:
import re
s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
month, day, year = m.groups()
print("{} {} {}").format(month, day, year)
输出:
January 2 2012
实际上,还有另一个使用 re.split
的更好 (imo) 解决方案,请参阅 罗宾发布的链接.使用这种方法,您可以这样做:
Actually, there's another nicer (imo) solution using re.split
described in the link Robin posted. Using that approach you can just do:
month, day, year = re.split(">>| |\. ", s)[1:4]
这篇关于通过拆分从字符串中获取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!