多行正则表达式帮助 [英] Multiline regex help

查看:58
本文介绍了多行正则表达式帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hey Folks,


我在一堆看起来像这样的文件中有一些信息:


Gibberish

53

MoreGarbage

12

RelevantInfo1

10/10/04

NothingImportant

ThisDoesNotMatter

44

RelevantInfo2

22

BlahBlah

343

RelevantInfo3

23

Hubris

废话




等...


无论如何,这些字段等等。在给定文件中重复几次(

重复次数因文件而异)。

" RelevantInfo"后面的行号。线条真的是我追求的。理想情况下,我想要这样的东西:


RelevantInfo1 = 10/10/04#变量名称实际上并不重要

RelevantInfo3 = 23#它只是用于说明我是什么信息

#试图阻止。


分数[RelevantInfo1] [RelevantInfo3] = 22#来自RelevantInfo2的值


从所有文件中收集。


所以,会有几个这些得分中的每个文件都有一堆

的文件。最终,我有兴趣将它们打印成csv文件但是

一旦被困在我的厄运阵列中就应该相对容易

< cue evil laughter> 。


我有一个相当难看的解决方案 (我使用这个术语*非常*松散)

使用awk和他的faithfail伴侣sed,但我更喜欢

python中的内容。


感谢您的时间。


-

McGowan的麦迪逊大道公理:

如果是项目被宣传为低于50美元,你可以打赌它不是19.95美元。

Hey Folks,

I''ve got some info in a bunch of files that kind of looks like so:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

and so on...

Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I''m after. Ideally, I would like to have
something like so:

RelevantInfo1 = 10/10/04 # The variable name isn''t actually important
RelevantInfo3 = 23 # it''s just there to illustrate what info I''m
# trying to snag.

Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2

Collected from all of the files.

So, there would be several of these "scores" per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
<cue evil laughter>.

I''ve got a fairly ugly "solution" (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.

Thanks for your time.

--
McGowan''s Madison Avenue Axiom:
If an item is advertised as "under $50", you can bet it''s not $19.95.

推荐答案

50",你可以打赌它'不是
50", you can bet it''s not


19.95。
19.95.


Yatima写道:
嘿嘿伙计,
<我有一些文件中有一些看起来像这样的信息:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
相关信息2
22
BlahBlah
343
RelevantInfo3
23
傲慢
Crap
34

等等......无论如何,这些领域都是如此。在给定文件中重复多次(
重复次数因文件而异)。
RelevantInfo之后的行上的数字。线条真的是我追求的。理想情况下,我希望有类似的东西:

RelevantInfo1 = 10/10/04#变量名称实际上并不重要
RelevantInfo3 = 23#it'只是在那里说明我是什么信息?试图阻止。


这是一种创建[RelevantInfo,value]对列表的方法:

import cStringIO


raw_data =''''''Gibberish

53

MoreGarbage

12

RelevantInfo1

10/10/04

NothingImportant

ThisDoesNotMatter

44

RelevantInfo2

22

BlahBlah

343

RelevantInfo3

23

Hubris

废话

34'''''

raw_data = cStringIO.StringIO(raw_data)


data = []

for raw_data中的行:

if line.startswith(''RelevantInfo''):

key = line.strip ()

value = raw_data.next()。strip()

data.append([key,value])


打印数据

分数[RelevantInfo1] [RelevantInfo3] = 22#来自RelevantInfo2的值


我不知道你的意思这样。你想建立一个乐谱词典吗?


肯特

收集所有文件。

所以,那里这些得分中的几个将是得分。每个文件,有一堆文件。最终,我有兴趣将它们打印成csv文件,但是一旦它们被困在我的厄运阵列中就应该相对容易
< cue evil laughter> ;.
我有一个相当丑陋的解决方案 (我非常*松散地使用这个术语)
使用awk和他的信仰伴侣sed,但我更喜欢
python中的内容。

感谢您的时间。
Hey Folks,

I''ve got some info in a bunch of files that kind of looks like so:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

and so on...

Anyhow, these "fields" repeat several times in a given file (number of
repetitions varies from file to file). The number on the line following the
"RelevantInfo" lines is really what I''m after. Ideally, I would like to have
something like so:

RelevantInfo1 = 10/10/04 # The variable name isn''t actually important
RelevantInfo3 = 23 # it''s just there to illustrate what info I''m
# trying to snag.
Here is a way to create a list of [RelevantInfo, value] pairs:
import cStringIO

raw_data = ''''''Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34''''''
raw_data = cStringIO.StringIO(raw_data)

data = []
for line in raw_data:
if line.startswith(''RelevantInfo''):
key = line.strip()
value = raw_data.next().strip()
data.append([key, value])

print data


Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
I''m not sure what you mean by this. Do you want to build a Score dictionary as well?

Kent

Collected from all of the files.

So, there would be several of these "scores" per file and there are a bunch
of files. Ultimately, I am interested in printing them out as a csv file but
that should be relatively easy once they are trapped in my array of doom
<cue evil laughter>.

I''ve got a fairly ugly "solution" (I am using this term *very* loosely)
using awk and his faithfail companion sed, but I would prefer something in
python.

Thanks for your time.



这篇关于多行正则表达式帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆