python正则表达式只匹配第一个实例 [英] python regex to match only first instance

查看:142
本文介绍了python正则表达式只匹配第一个实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 python 代码,我正在读取一个证书并且只匹配根证书.例如,我的证书如下:

--------开始证书--------CZImiZPyLGQBGRYFbG9jYWwxGjAYBgoJkiaJk/IasdasdassZAEZFgp2aXJ0dWFsdnB4MSEwHwYDVQQDExh2aXJ0dWFsdnB4LVZJUlRVQUxEQzEtQ0EwHhfdgdgdgfcNMTUwOTE2MTg1MTMxWhcNMTcwOTE2MTkwMTMxWjBaMQswCQYDVQQGEwJVUzEXMBUGCgmSJoaeqasadsmT8ixkARkWB3ZzcGhlcmUxFTATBgoJkiaJk/IsZAEZFgVsb2NhbDEOMAwGA1UEChMFdmNlcnfrrfgfdvQxCzAJBgNVBAMTAkNBMIIBIjANBgkqhkiG9w--------结束证书------------------开始证书--------ZGFwOi8vL0NOPXZpcnR1YWx2cHgtcvxcvxvVklSVFVBTERDMS1DQSxDTj1BSUEsQ049UHVibGljJTIwS2V5JTIwU2VydmldfsfhjZXMsQ049U2VydmfffljZXMsQ049Q29uZmlndXJhdGlvbixEQz12aXJ0dWFsdnB4LERDPWxvY2FsP2NxvxcvxcvBQ2VydGlmaWNhdGU/YmFzZT9vYmplY3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0dsfsdffraG9yaXR5MD0GCSsGAQQBgjcVBwQwMC4G--------结束证书----------

我只想获取以 CZImiZPy 开头的根证书.我将证书读入变量数据并应用以下正则表达式

re.sub('-----.*?-----', '', data)

但它获取了两个加密证书,而不仅仅是第一个.有没有更好的方法可以调整正则表达式?

解决方案

您想搜索文本,而不是用其他东西代替它.

<预><代码>>>>进口重新>>>s = """--------开始证书--------<证书加密>--------结束证书------------------开始证书--------<证书加密>--------结束证书----------""">>>re.search(r"-+begin certificate-+\s+(.*?)\s+-+end certificate-+", s, flags=re.DOTALL).group(1)'<证书加密>'

说明:

-+begin certificate-+ # 匹配起始标签\s+ # 匹配空格(包括换行符)(.*?) # 匹配任意数量的任意字符.捕获第 1 组中的结果\s+ # 匹配空格(包括换行符)-+end certificate-+ # 匹配结束标签

re.search() 将始终返回第一个匹配项.

I have a python code and i'm reading a certificate and matching only the root cert. For ex my certificate is as below:

--------begin certificate--------
CZImiZPyLGQBGRYFbG9jYWwxGjAYBgoJkiaJk/IasdasdassZAEZFgp2aXJ0dWFsdnB4MSEw
HwYDVQQDExh2aXJ0dWFsdnB4LVZJUlRVQUxEQzEtQ0EwHhfdgdgdgfcNMTUwOTE2MTg1MTMx
WhcNMTcwOTE2MTkwMTMxWjBaMQswCQYDVQQGEwJVUzEXMBUGCgmSJoaeqasadsmT8ixkARkW
B3ZzcGhlcmUxFTATBgoJkiaJk/IsZAEZFgVsb2NhbDEOMAwGA1UEChMFdmNlcnfrrfgfdvQx
CzAJBgNVBAMTAkNBMIIBIjANBgkqhkiG9w
--------end certificate----------
--------begin certificate--------
ZGFwOi8vL0NOPXZpcnR1YWx2cHgtcvxcvxvVklSVFVBTERDMS1DQSxDTj1BSUEsQ049UHVi
bGljJTIwS2V5JTIwU2VydmldfsfhjZXMsQ049U2VydmfffljZXMsQ049Q29uZmlndXJhdGlv
bixEQz12aXJ0dWFsdnB4LERDPWxvY2FsP2NxvxcvxcvBQ2VydGlmaWNhdGU/YmFzZT9vYmpl
Y3RDbGFzcz1jZXJ0aWZpY2F0aW9uQXV0dsfsdffraG9yaXR5MD0GCSsGAQQBgjcVBwQwMC4G
--------end certificate----------

I want to fetch only the root certificate, which starts with CZImiZPy. I read the certificate into the variable data and applying the below regex

re.sub('-----.*?-----', '', data)

But it fetched both the encrypted certificates and not just the first one. Is there any better way I can tweak the regular expression?

解决方案

You want to search for text, not substitute it with something else.

>>> import re
>>> s = """--------begin certificate--------
<certificate encrypted>
--------end certificate----------
--------begin certificate--------
<certificate encrypted>
--------end certificate----------"""
>>> re.search(r"-+begin certificate-+\s+(.*?)\s+-+end certificate-+", s, flags=re.DOTALL).group(1)
'<certificate encrypted>'

Explanation:

-+begin certificate-+ # Match the starting label
\s+                   # Match whitespace (including linebreaks)
(.*?)                 # Match any number of any character. Capture the result in group 1
\s+                   # Match whitespace (including linebreaks)
-+end certificate-+   # Match the ending label

re.search() will always return the first match.

这篇关于python正则表达式只匹配第一个实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆