多行正则表达式帮助 [英] Multiline regex help
问题描述
Hey Folks,
我在一堆看起来像这样的文件中有一些信息:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
废话
等...
无论如何,这些字段等等。在给定文件中重复几次(
重复次数因文件而异)。
" RelevantInfo"后面的行号。线条真的是我追求的。理想情况下,我想要这样的东西:
RelevantInfo1 = 10/10/04#变量名称实际上并不重要
RelevantInfo3 = 23#它只是用于说明我是什么信息
#试图阻止。
分数[RelevantInfo1] [RelevantInfo3] = 22#来自RelevantInfo2的值
从所有文件中收集。
所以,会有几个这些得分中的每个文件都有一堆
的文件。最终,我有兴趣将它们打印成csv文件但是
一旦被困在我的厄运阵列中就应该相对容易
< cue evil laughter> 。
我有一个相当难看的解决方案 (我使用这个术语*非常*松散)
使用awk和他的faithfail伴侣sed,但我更喜欢
python中的内容。
感谢您的时间。
-
McGowan的麦迪逊大道公理:
如果是项目被宣传为低于50美元,你可以打赌它不是19.95美元。
Hey Folks,
I''ve got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I''m after. Ideally, I would like to have
something like so:
RelevantInfo1 = 10/10/04 # The variable name isn''t actually important
RelevantInfo3 = 23 # it''s just there to illustrate what info I''m
# trying to snag.
Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
Collected from all of the files.
So, there would be several of these "scores" per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
<cue evil laughter>.
I''ve got a fairly ugly "solution" (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.
Thanks for your time.
--
McGowan''s Madison Avenue Axiom:
If an item is advertised as "under $50", you can bet it''s not $19.95.
推荐答案
50",你可以打赌它'不是
50", you can bet it''s not
19.95。
19.95.
Yatima写道:
嘿嘿伙计,
<我有一些文件中有一些看起来像这样的信息:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
相关信息2
22
BlahBlah
343
RelevantInfo3
23
傲慢
Crap
34
等等......无论如何,这些领域都是如此。在给定文件中重复多次(
重复次数因文件而异)。
RelevantInfo之后的行上的数字。线条真的是我追求的。理想情况下,我希望有类似的东西:
RelevantInfo1 = 10/10/04#变量名称实际上并不重要
RelevantInfo3 = 23#it'只是在那里说明我是什么信息?试图阻止。
这是一种创建[RelevantInfo,value]对列表的方法:
import cStringIO
raw_data =''''''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
废话
34'''''
raw_data = cStringIO.StringIO(raw_data)
data = []
for raw_data中的行:
if line.startswith(''RelevantInfo''):
key = line.strip ()
value = raw_data.next()。strip()
data.append([key,value])
打印数据
分数[RelevantInfo1] [RelevantInfo3] = 22#来自RelevantInfo2的值
我不知道你的意思这样。你想建立一个乐谱词典吗?
肯特
收集所有文件。
所以,那里这些得分中的几个将是得分。每个文件,有一堆文件。最终,我有兴趣将它们打印成csv文件,但是一旦它们被困在我的厄运阵列中就应该相对容易
< cue evil laughter> ;.
>我有一个相当丑陋的解决方案 (我非常*松散地使用这个术语)
使用awk和他的信仰伴侣sed,但我更喜欢
python中的内容。
感谢您的时间。
Hey Folks,
I''ve got some info in a bunch of files that kind of looks like so:
Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34
and so on...
Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I''m after. Ideally, I would like to have
something like so:
RelevantInfo1 = 10/10/04 # The variable name isn''t actually important
RelevantInfo3 = 23 # it''s just there to illustrate what info I''m
# trying to snag.
Here is a way to create a list of [RelevantInfo, value] pairs:
import cStringIO
raw_data = ''''''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34''''''
raw_data = cStringIO.StringIO(raw_data)
data = []
for line in raw_data:
if line.startswith(''RelevantInfo''):
key = line.strip()
value = raw_data.next().strip()
data.append([key, value])
print data
Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
I''m not sure what you mean by this. Do you want to build a Score dictionary as well?
Kent
Collected from all of the files.
So, there would be several of these "scores" per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
<cue evil laughter>.
I''ve got a fairly ugly "solution" (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.
Thanks for your time.
这篇关于多行正则表达式帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!