使用正则表达式来替换文件数据 [英] Using regex to replace file data

查看:119
本文介绍了使用正则表达式来替换文件数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

借助此处 ,我的工作方式几乎和我想要的一样。现在,我需要能够在比较文件之前添加从文件中删除数据的功能。



原因是字符串,数据,我要删除已知每次保存文件不同。



我写了一个正则表达式来选择我想要的确切文本删除,但我用我当前的代码执行它有麻烦。

以下是三个主要功能

  HOSTNAME_RE = re.compile(r'hostname +(\S +)')
def get_file_info_from_lines(filename,file_lines):
hostname = None
a_hash = hashlib.sha1()
用于file_lines中的行:
a_hash.update(line.encode('utf-8'))
match = HOSTNAME_RE。 match(line)
if match:
hostname = match.group(1)
返回主机名,文件名,a_hash.hexdigest()
$ b $ def get_file_info(filename) :
if filename.endswith(('。cfg',' (文件名,r +)作为in_file:
#filename = re.sub(REMOVE_RE,subst,filename,0,re.MULTILINE)
return get_file_info_from_lines(filename,in_file.readlines())
$ b $ def hostname_parse(directory):
results = {}
i = 0
l = len os.listdir(directory))
for os.listdir(目录):
filename = os.path.join(目录,文件名)
sleep(0.001)
i + = 1
progress_bar(i,l,prefix ='Progress:',suffix ='Complete',barLength = 50)
info = get_file_info(filename)
如果info不是None:
results [info [0]] = info
返回结果

  REMOVE_RE = r((?:\ b $ current 













$ $ b

EXAMPLE_FILE_BEFORE_DATA_REMOVED:

 建筑物配置... 

当前配置:45617字节

!上次配置更改在00:22:36 UTC Sun 1月22 2017由用户
! NVRAM config最后更新于00:22:43 UTC Sun 1月22日2017用户

版本15.0
没有服务垫

没有日志控制台
启用密码5 ***加密密码***

用户名admin权限15密码7 ***加密密码***
用户名sadmin权限15密码5 ***加密密码***
aaa新模式

ip ftp用户名***加密密码***
ip ftp密码7 ***加密密码***
ip ssh版本2

行con 0
密码7 ***加密密码***
登录认证maint
行vty 0 4
密码7 ***加密密码***
长度0
运输输入ssh
行vty 5 15
密码7 ***加密密码***
运输输入ssh

EXAMPLE_FILE_AFTER_DATA_REMOVED:

 建筑物配置... 


没有服务垫

没有日志控制台
启用

用户名admin特权15
用户名gisadmin特权15
aaa新模型

ip ftp用户名cfgftp
ip ftp
ip ssh版本2

行con 0

登录认证maint
行vty 0 4

长度0
传输输入ssh
行vty 5 15

运输输入ssh

我试过做类似#filename = re.sub(REMOVE_RE,subst,filename,0 ,re.MULTILINE)在get_file_info和get_file_info_from_lines中,但我显然没有正确实现。



任何帮助将不胜感激,因为我只是在学习。


$ b

运行比较:

  results1 = hostname_parse 'test1.txt')
results2 = hostname_parse('test2.txt')



为主机名,文件名,filehash在results1.values():
如果主机名在results2中:
,filename2,filehash2 = results2 [主机名]
如果filehash!= filehash2:
print(%s有更改(%s,%s )%(
hostname,filehash,filehash2))
print(filename)
print(filename2)
print()
/ pre>

我不想修改当前文件。如果所有这些都可以在内存中完成,或者一个临时文件会很好。



完整代码

  import hashlib 
import os
import re


HOSTNAME_RE = re.compile(r' hostname +(\S +)')
REMOVE_RE = re.compile(r((?:\ bCurrent configuration)(?:。* \\\
?){6}))

$ b $ def get_file_info_from_lines(filename,file_lines):
hostname = None
a_hash = hashlib.sha1()
用于file_lines中的行:
#match = HOSTNAME_RE.match(line)
if not re.match(REMOVE_RE,line):
a_hash.update(line.encode('utf-8'))
#===== ================================================== ================
#if match:
#hostname = match.group(1)
#======== ================================================== =============
返回主机名,文件名,a_hash.hexdigest()
$ b $ def get_file_info(文件名):
如果filename_ndswith(('。cfg','.startup','.confg')):
打开(filename,r +)作为in_file:
返回get_file_info_from_lines(filename,in_file。 readlines())

在os.listdir(目录)中为文件名设置def hostname_parse(directory):
results = {}

filename = os.path。 join(目录,文件名)
info = get_file_info(文件名)
如果info不是None:
results [info [0]] = info
返回结果


results1 = hostname_parse('test1')#测试文件的目录
results2 = hostname_parse('test2')#测试文件的目录2



主机名,文件名,filehash在results1.values():
如果主机名在results2中:
,filename2,filehash2 = results2 [主机名]
如果filehash!= filehash2:
print(%s有更改(%s,%s)%(
主机名,filehash,filehash2))
pri nt(filename)
print(filename2)
print()


解决方案

我能找到解决方法。

  def get_file_info_from_lines(filename,file_lines):
hostname = None
a_hash = hashlib.sha1()
用于file_lines中的行:
如果!Last符合:
line =''
如果!NVRAM符合:
line =''
a_hash.update(line.encode('utf-8'))
match = HOSTNAME_RE.match(line)
如果匹配:
hostname = match.group(1)


With some help from here , I have this working almost exactly the way I want. Now I need to be able to add the ability to remove data from a file before the files are compared.

The reason for this is the strings, "data", that i'm removing is known to differ each time the file is saved.

I have written a regex to select the exact text that I want to remove, but I am having trouble implementing it with my current code.

Here are the three main functions

HOSTNAME_RE = re.compile(r'hostname +(\S+)')
def get_file_info_from_lines(filename, file_lines):
    hostname = None
    a_hash = hashlib.sha1()
    for line in file_lines:
        a_hash.update(line.encode('utf-8'))
        match = HOSTNAME_RE.match(line)
        if match:
            hostname = match.group(1)
    return hostname, filename, a_hash.hexdigest()

def get_file_info(filename):
    if filename.endswith(('.cfg', '.startup', '.confg')):
        with open(filename, "r+") as in_file:
            #filename = re.sub(REMOVE_RE, subst, filename, 0, re.MULTILINE)
            return get_file_info_from_lines(filename, in_file.readlines())

def hostname_parse(directory):
    results = {}
    i = 0
    l = len(os.listdir(directory))
    for filename in os.listdir(directory):
        filename = os.path.join(directory, filename)
        sleep(0.001)
        i += 1
        progress_bar(i, l, prefix = 'Progress:', suffix = 'Complete', barLength = 50)
        info = get_file_info(filename)
        if info is not None:
            results[info[0]] = info
    return results

This is the regex for finding the strings to be removed.

REMOVE_RE = r"((?:\bCurrent configuration)(?:.*\n?){6})"
subst = ""

EXAMPLE_FILE_BEFORE_DATA_REMOVED:

Building configuration...

Current configuration : 45617 bytes
!
! Last configuration change at 00:22:36 UTC Sun Jan 22 2017 by user
! NVRAM config last updated at 00:22:43 UTC Sun Jan 22 2017 by user
!
version 15.0
no service pad
!
no logging console
enable secret 5 ***encrypted password***
!
username admin privilege 15 password 7 ***encrypted password***
username sadmin privilege 15 secret 5 ***encrypted password***
aaa new-model
!
ip ftp username ***encrypted password***
ip ftp password 7 ***encrypted password***
ip ssh version 2
!
line con 0
 password 7 ***encrypted password***
 login authentication maint
line vty 0 4
 password 7 ***encrypted password***
 length 0
 transport input ssh
line vty 5 15
 password 7 ***encrypted password***
 transport input ssh
!

EXAMPLE_FILE_AFTER_DATA_REMOVED:

Building configuration...

!
no service pad
!
no logging console
enable 
!
username admin privilege 15 
username gisadmin privilege 15 
aaa new-model
!
ip ftp username cfgftp
ip ftp 
ip ssh version 2
!
line con 0

 login authentication maint
line vty 0 4

 length 0
 transport input ssh
line vty 5 15

 transport input ssh
!

I've tried doing something like #filename = re.sub(REMOVE_RE, subst, filename, 0, re.MULTILINE) within the get_file_info and get_file_info_from_lines but I'm obviously not implementing it correctly.

Any help would be appreciated as I am just learning.

Running the Compare:

results1 = hostname_parse('test1.txt')
results2 = hostname_parse('test2.txt')



for hostname, filename, filehash in results1.values():
    if hostname in results2:
        _, filename2, filehash2 = results2[hostname]
        if filehash != filehash2:
            print("%s has a change (%s, %s)" % (
                hostname, filehash, filehash2))
            print(filename)
            print(filename2)
            print()

I do not want to modify the current file. If all of this can be done in memory or a temporary file would be great.

FULL CODE:

import hashlib
import os
import re


HOSTNAME_RE = re.compile(r'hostname +(\S+)')
REMOVE_RE = re.compile(r"((?:\bCurrent configuration)(?:.*\n?){6})")


def get_file_info_from_lines(filename, file_lines):
    hostname = None
    a_hash = hashlib.sha1()
    for line in file_lines:
        #match = HOSTNAME_RE.match(line)
        if not re.match(REMOVE_RE, line):
            a_hash.update(line.encode('utf-8'))
        #=======================================================================
        # if match:
        #     hostname = match.group(1)
        #=======================================================================
    return hostname, filename, a_hash.hexdigest()

def get_file_info(filename):
    if filename.endswith(('.cfg', '.startup', '.confg')):
        with open(filename, "r+") as in_file:
            return get_file_info_from_lines(filename, in_file.readlines())

def hostname_parse(directory):
    results = {}
    for filename in os.listdir(directory):
        filename = os.path.join(directory, filename)
        info = get_file_info(filename)
        if info is not None:
            results[info[0]] = info
    return results


results1 = hostname_parse('test1') #Directory of test files
results2 = hostname_parse('test2') #Directory of test files 2



for hostname, filename, filehash in results1.values():
    if hostname in results2:
        _, filename2, filehash2 = results2[hostname]
        if filehash != filehash2:
            print("%s has a change (%s, %s)" % (
                hostname, filehash, filehash2))
            print(filename)
            print(filename2)
            print()

解决方案

I was able to find a way around the regex. I simply delete the lines by matching the line.

def get_file_info_from_lines(filename, file_lines):
    hostname = None
    a_hash = hashlib.sha1()
    for line in file_lines:
        if "! Last " in line:
            line = ''
        if "! NVRAM " in line:
            line = ''
        a_hash.update(line.encode('utf-8'))
        match = HOSTNAME_RE.match(line)
        if match:
            hostname = match.group(1)

这篇关于使用正则表达式来替换文件数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆