查找，解码和替换文本文件中的所有base64值 [英] Find, decode and replace all base64 values in text file

查看：119 发布时间：2020/9/18 19:53:28 python sql regex sed base64

本文介绍了查找，解码和替换文本文件中的所有base64值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个SQL转储文件，其中包含带有html链接的文本，例如:

I have a SQL dump file that contains text with html links like:

&lt;a href=&quot;http://blahblah.org/kb/getattachment.php?data=NHxUb3Bjb25fZGF0YS1kb3dubG9hZF9ob3d0by5wZGY=&quot;&gt;attached file&lt;/a&gt;

我想在每个链接中查找，解码和替换文本的base64部分.

I'd like to find, decode and replace the base64 part of the text in each of these links.

我一直在尝试使用带有正则表达式和base64的Python来完成这项工作.但是，我的正则表达式技能无法胜任这项工作.

I've been trying to use Python w/ regular expressions and base64 to do the job. However, my regex skills are not up to the task.

我需要选择任何以

'getattachement.php?data='

以

'&quot;'

然后我需要使用base64.b64decode()解码'data ='和'& quot'之间的部分

I then need to decode the part between 'data=' and '&quot' using base64.b64decode()

结果应类似于:

&lt;a href=&quot;http://blahblah.org/kb/4/Topcon_data-download_howto.pdf&quot;&gt;attached file&lt;/a&gt;

我认为解决方案将如下所示:

I think the solution will look something like:

import re
import base64
with open('phpkb_articles.sql') as f:
    for line in f:
        re.sub(some_regex_expression_here, some_function_here_to_decode_base64)

有什么想法吗?

回答任何有兴趣的人.

import re
import base64
import sys


def decode_base64(s):
    """
    Method to decode base64 into ascii
    """
    # fix escaped equal signs in some base64 strings
    base64_string = re.sub('%3D', '=', s.group(1))
    decodedString = base64.b64decode(base64_string)

    # substitute '|' for '/'
    decodedString = re.sub('\|', '/', decodedString)

    # escape the spaces in file names
    decodedString = re.sub(' ', '%20', decodedString)

    # print 'assets/' + decodedString + '&quot'  # Print for debug
    return 'assets/' + decodedString + '&quot'


count = 0

pattern = r'getattachment.php\?data=([^&]+?)&quot'

# Open the file and read line by line
with open('phpkb_articles.sql') as f:
    for line in f:
        try:
            # globally substitute in new file path
            edited_line = re.sub(pattern, decode_base64, line)
            # output the edited line to standard out
            sys.stdout.write(edited_line)
        except TypeError:
            # output unedited line if decoding fails to prevent corruption
            sys.stdout.write(line)
            # print line
            count += 1

推荐答案

您已经拥有它，只需要一些小块即可:

you already have it, you just need the small pieces:

模式:r'data=([^&]+?)&quot'将匹配data=之后和&quot

>>> pat = r'data=([^&]+?)&quot'
>>> line = '&lt;a href=&quot;http://blahblah.org/kb/getattachment.php?data=NHxUb3Bjb25fZGF0YS1kb3dubG9hZF9ob3d0by5wZGY=&quot;&gt;attached file&lt;/a&gt;'
>>> decodeString = re.search(pat,line).group(1) #because the b64 string is capture by grouping, we only want group(1)
>>> decodeString
'NHxUb3Bjb25fZGF0YS1kb3dubG9hZF9ob3d0by5wZGY='

然后可以使用str.replace()方法以及base64.b64decode()方法来完成其余部分.我不想只为您编写代码，但这应该使您对去哪儿有了个好主意.

you can then use str.replace() method as well as base64.b64decode() method to finish the rest. I dont want to just write your code for you but this should give you a good idea of where to go.

这篇关于查找，解码和替换文本文件中的所有base64值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查找，解码和替换文本文件中的所有base64值 [英] Find, decode and replace all base64 values in text file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

查找，解码和替换文本文件中的所有base64值 [英] Find, decode and replace all base64 values in text file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭