相当于Unix“字符串"的Python公用事业 [英] Python equivalent of unix "strings" utility

查看:80
本文介绍了相当于Unix“字符串"的Python公用事业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个脚本,该脚本将从可执行二进制文件中提取字符串并将其保存在文件中.不能选择用换行符分隔此文件,因为字符串本身也可以使用换行符.但是,这也意味着使用unix"strings"实用程序不是一个选择,因为它只会打印出所有以换行符分隔的字符串,这意味着仅通过查看输出就无法分辨出哪些字符串包含换行符. 字符串".因此,我希望找到一个实现与字符串"相同功能的python函数或库,但它将这些字符串作为变量提供给我,以便避免换行问题.

I'm trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn't an option since the strings could have newlines themselves. This also means, however, that using the unix "strings" utility isn't an option, since it just prints out all the strings newline-separated, meaning there's no way to tell which strings have newlines included just by looking at the output of "strings". Thus, I was hoping to find a python function or library which implements the same functionality of "strings", but which will give me those strings as variables so that I can avoid the newline issue.

谢谢!

推荐答案

这里是一个生成器,它生成在filename中找到的所有可打印字符字符串> = min(默认为4个):

Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

import string

def strings(filename, min=4):
    with open(filename, errors="ignore") as f:  # Python 3.x
    # with open(filename, "rb") as f:           # Python 2.x
        result = ""
        for c in f.read():
            if c in string.printable:
                result += c
                continue
            if len(result) >= min:
                yield result
            result = ""
        if len(result) >= min:  # catch result at EOF
            yield result

您可以迭代以下内容:

for s in strings("something.bin"):
    # do something with s

...或存储在列表中

... or store in a list:

sl = list(strings("something.bin"))

我已经对此进行了非常简短的测试,对于我选择的任意二进制文件,它似乎提供了与Unix strings命令相同的输出.但是,这很幼稚(一开始,它会立即将整个文件读取到内存中,这对于大文件而言可能会很昂贵),并且不太可能达到Unix strings命令的性能.

I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.

这篇关于相当于Unix“字符串"的Python公用事业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆