返回脚本中使用的导入 Python 模块的列表? [英] Return a list of imported Python modules used in a script?
问题描述
我正在编写一个程序,该程序根据导入的模块对 Python 文件列表进行分类.因此,我需要扫描 .py 文件的集合并返回它们导入的模块的列表.例如,如果我导入的文件之一具有以下几行:
I am writing a program that categorizes a list of Python files by which modules they import. As such I need to scan the collection of .py files ad return a list of which modules they import. As an example, if one of the files I import has the following lines:
import os
import sys, gtk
我希望它返回:
["os", "sys", "gtk"]
我玩了 modulefinder 并写道:
I played with modulefinder and wrote:
from modulefinder import ModuleFinder
finder = ModuleFinder()
finder.run_script('testscript.py')
print 'Loaded modules:'
for name, mod in finder.modules.iteritems():
print '%s ' % name,
但这不仅仅是返回脚本中使用的模块.作为仅具有以下内容的脚本中的示例:
but this returns more than just the modules used in the script. As an example in a script which merely has:
import os
print os.getenv('USERNAME')
ModuleFinder 脚本返回的模块返回:
The modules returned from the ModuleFinder script return:
tokenize heapq __future__ copy_reg sre_compile _collections cStringIO _sre functools random cPickle __builtin__ subprocess cmd gc __main__ operator array select _heapq _threading_local abc _bisect posixpath _random os2emxpath tempfile errno pprint binascii token sre_constants re _abcoll collections ntpath threading opcode _struct _warnings math shlex fcntl genericpath stat string warnings UserDict inspect repr struct sys pwd imp getopt readline copy bdb types strop _functools keyword thread StringIO bisect pickle signal traceback difflib marshal linecache itertools dummy_thread posix doctest unittest time sre_parse os pdb dis
...而我只希望它返回os",因为这是脚本中使用的模块.
...whereas I just want it to return 'os', as that was the module used in the script.
谁能帮我实现这个目标?
Can anyone help me achieve this?
更新:我只是想澄清一下,我想在不运行正在分析的 Python 文件的情况下执行此操作,而只需扫描代码.
UPDATE: I just want to clarify that I would like to do this without running the Python file being analyzed, and just scanning the code.
推荐答案
IMO 最好的方法是使用 http://furius.ca/snakefood/ 包.作者已经完成了所有必需的工作,不仅获得了直接导入的模块,而且还使用 AST 解析了运行时依赖项的代码,而这些代码是静态分析会遗漏的.
IMO the best way todo this is to use the http://furius.ca/snakefood/ package. The author has done all of the required work to get not only directly imported modules but it uses the AST to parse the code for runtime dependencies that a more static analysis would miss.
编写了一个命令示例来演示:
Worked up a command example to demonstrate:
sfood ./example.py | sfood-cluster > example.deps
这将生成每个唯一模块的基本依赖文件.如需更详细的信息,请使用:
That will generate a basic dependency file of each unique module. For even more detail use:
sfood -r -i ./example.py | sfood-cluster > example.deps
要遍历一棵树并查找所有导入,您也可以在代码中执行此操作:请注意 - 此例程的 AST 块是从拥有此版权的蛇食源中提取的:版权所有 (C) 2001-2007 Martin Blais.保留所有权利.
To walk a tree and find all imports, you can also do this in code: Please NOTE - The AST chunks of this routine were lifted from the snakefood source which has this copyright: Copyright (C) 2001-2007 Martin Blais. All Rights Reserved.
import os
import compiler
from compiler.ast import Discard, Const
from compiler.visitor import ASTVisitor
def pyfiles(startPath):
r = []
d = os.path.abspath(startPath)
if os.path.exists(d) and os.path.isdir(d):
for root, dirs, files in os.walk(d):
for f in files:
n, ext = os.path.splitext(f)
if ext == '.py':
r.append([d, f])
return r
class ImportVisitor(object):
def __init__(self):
self.modules = []
self.recent = []
def visitImport(self, node):
self.accept_imports()
self.recent.extend((x[0], None, x[1] or x[0], node.lineno, 0)
for x in node.names)
def visitFrom(self, node):
self.accept_imports()
modname = node.modname
if modname == '__future__':
return # Ignore these.
for name, as_ in node.names:
if name == '*':
# We really don't know...
mod = (modname, None, None, node.lineno, node.level)
else:
mod = (modname, name, as_ or name, node.lineno, node.level)
self.recent.append(mod)
def default(self, node):
pragma = None
if self.recent:
if isinstance(node, Discard):
children = node.getChildren()
if len(children) == 1 and isinstance(children[0], Const):
const_node = children[0]
pragma = const_node.value
self.accept_imports(pragma)
def accept_imports(self, pragma=None):
self.modules.extend((m, r, l, n, lvl, pragma)
for (m, r, l, n, lvl) in self.recent)
self.recent = []
def finalize(self):
self.accept_imports()
return self.modules
class ImportWalker(ASTVisitor):
def __init__(self, visitor):
ASTVisitor.__init__(self)
self._visitor = visitor
def default(self, node, *args):
self._visitor.default(node)
ASTVisitor.default(self, node, *args)
def parse_python_source(fn):
contents = open(fn, 'rU').read()
ast = compiler.parse(contents)
vis = ImportVisitor()
compiler.walk(ast, vis, ImportWalker(vis))
return vis.finalize()
for d, f in pyfiles('/Users/bear/temp/foobar'):
print d, f
print parse_python_source(os.path.join(d, f))
这篇关于返回脚本中使用的导入 Python 模块的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!