python re.split超前模式 [英] python re.split lookahead pattern

查看:96
本文介绍了python re.split超前模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试re.split从日志文件中获取带有标题和常规结构的BCF#,BTS#和LAC,CI:

I'm trying re.split to get BCF#, BTS# and LAC, CI from logfile with the header and regular structure inside:

==================================================================================
RADIO NETWORK CONFIGURATION IN BSC:
                                                         E P  B
                                      F                  T R  C D-CHANNEL  BUSY
                      AD OP           R  ET- BCCH/CBCH/  R E  S O&M LINK  HR  FR
 LAC   CI         HOP ST STATE  FREQ  T  PCM ERACH       X F  U NAME  ST
                                                                         /GP
===================== == ====== ==== == ==== =========== = = == ===== == === ===

BCF-0010  FLEXI MULTI  U WO                                   2 LM10  WO
10090 31335 BTS-0010  U WO                                                0   0
 KHAKHAATT070D    BB/- 
                                                                               7
              TRX-001  U WO      779  0 1348 MBCCH+CBCH    P  0
              TRX-002  U WO      659  0 1348                  1
              TRX-003  U WO      661  0 1348                  2
              TRX-004  U WO      670  0 1348                  0
              TRX-005  U WO      674  0 1348                  1
 10090 31336 BTS-0011  U WO                                                0   0
 KHAKHAATT200D    BB/- 
                                                                               7
              TRX-006  U WO      811  0 1348 MBCCH+CBCH    P  2
              TRX-009  U WO      845  0 1349                  2
              TRX-010  U WO      819  0 1349                  0
              TRX-011  U WO      823  0 1349                  1
              TRX-012  U WO      836  0 1349                  2
 10090 31337 BTS-0012  U WO                                                0   0
 KHAKHAATT340D    BB/- 
                                                                               5
              TRX-013  U WO      799  0 1349 MBCCH+CBCH    P  0
              TRX-014  U WO      829  0 1349                  1
              TRX-017  U WO      831  0 1302                  2
              TRX-018  U WO      834  0 1302                  1
              TRX-019  U WO      853  0 1302                  0
              TRX-020  U WO      858  0 1302                  2
              TRX-021  U WO      861  0 1302                  1

BCF-0020  FLEXI MULTI  U WO                                   0 LM20  WO
 10090 30341 BTS-0020  U WO                                                0   0
 KHAKHABYT100G    BB/- 
                                                                               1
              TRX-001  U WO       14  0 1856 MBCCH+CBCH    P  0
              TRX-002  U WO       85  0 1856                  1
 10090 30342 BTS-0021  U WO                                                0   0
 KHAKHABYT230G    BB/- 
                                                                               1
              TRX-003  U WO        4  0 1856 MBCCH+CBCH    P  2
              TRX-004  U WO       12  0 1856                  0
 10090 30343 BTS-0022  U WO                                                0   0
 KHAKHABYT340G    BB/- 
                                                                               1
              TRX-005  U WO       20  0 1856 MBCCH+CBCH    P  1
              TRX-006  U WO       22  0 1856                  2
 10090 30345 BTS-0025  U WO                                                0   0
 KHAKHABYT100D    BB/- 
                                                                               5
              TRX-007  U WO      793  0 1856 MBCCH+CBCH    P  0
              TRX-008  U WO      851  0 1856                  1
              TRX-009  U WO      834  0 1857                  2
              TRX-010  U WO      825  0 1857                  1
 10090 30346 BTS-0026  U WO                                                0   0
 KHAKHABYT230D    BB/- 
                                                                               4
              TRX-011  U WO      803  0 1857 MBCCH+CBCH    P  2
              TRX-012  U WO      860  0 1857                  0
              TRX-013  U WO      846  0 1857                  1
              TRX-014  U WO      844  0 1857                  2
              TRX-015  U WO      828  0 1857                  0
              TRX-016  U WO      813  0 1857                  1
 10090 30347 BTS-0027  U WO                                                0   2
 KHAKHABYT340D    BB/- 
                                                                               5
              TRX-017  U WO      801  0 1352 MBCCH+CBCH    P  2
              TRX-018  U WO      857  0 1352                  0
              TRX-019  U WO      840  0 1352                  1
              TRX-020  U WO      838  0 1352                  0
              TRX-021  U WO      836  0 1352                  1
              TRX-022  U WO      823  0 1352                  2
              TRX-023  U WO      821  0 1352                  0
              TRX-024  U WO      817  0 1352                  1

=======================================================================================

包含代码:

def GetTheSentences(infile):
    with con:
       cur = con.cursor()
       cur.execute("DROP TABLE IF EXISTS eei")
       cur.execute("CREATE TABLE eei(BCF INT, BTS INT PRIMARY KEY) ")
    with open(infile) as fp:
        for result_1 in re.split('BCF-', fp.read(), flags=re.UNICODE):
            BCF = result_1[:4]
            for result_2 in re.compile("(?=BTS-)").split(result_1):    
                rec = re.search('TRX-',result_2)
                if rec is not None:
                    BTS = result_2[4:8]
                    print BCF + "," + BTS

我需要使用正则表达式预先将result_1拆分为与BTS相关的部分,包括"BTS-"("10090 31335 BTS-0010")之前的第13个字符,并为每个TRX拆分为result_3,但没有成功.

I need to split result_1 in BTS-related parts including 13th characters before "BTS-" ("10090 31335 BTS-0010") using regex lookahead and split to result_3 for each TRX but have no success.

请支持!

推荐答案

Python的 re.split() 不会在零长度匹配中拆分.

Python's re.split() doesn't split on zero-length matches.

因此,re.compile("(?=BTS-)").split(result_1)将永远不会分割您的字符串.您需要找到没有re.split()的解决方案,或者使用新的 regex模块.

Therefore re.compile("(?=BTS-)").split(result_1) will never split your string. You need to find a solution without re.split() or use the new regex module.

这篇关于python re.split超前模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆