Python3.0 - 标记化和取消标记化 [英] Python3.0 - tokenize and untokenize

查看：31 发布时间：2021/9/8 20:22:40 python python-3.x tokenize lexical-analysis

本文介绍了Python3.0 - 标记化和取消标记化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用类似于以下简化脚本的内容来解析较大文件中的 python 片段:

I am using something similar to the following simplified script to parse snippets of python from a larger file:

import io
import tokenize

src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)

src = list(tokenize.tokenize(src.readline))

for tok in src:
  print(tok)

src = tokenize.untokenize(src)

虽然python2.x中的代码不一样，但它使用了相同的习惯用法并且工作得很好.但是，使用 python3.0 运行上面的代码片段，我得到了这个输出:

Although the code is not the same in python2.x, it uses the same idiom and works just fine. However, running the above snippet using python3.0, I get this output:

(57, 'utf-8', (0, 0), (0, 0), '')
(1, 'foo', (1, 0), (1, 3), 'foo="bar"')
(53, '=', (1, 3), (1, 4), 'foo="bar"')
(3, '"bar"', (1, 4), (1, 9), 'foo="bar"')
(0, '', (2, 0), (2, 0), '')

Traceback (most recent call last):
  File "q.py", line 13, in <module>
    src = tokenize.untokenize(src)
  File "/usr/local/lib/python3.0/tokenize.py", line 236, in untokenize
    out = ut.untokenize(iterable)
  File "/usr/local/lib/python3.0/tokenize.py", line 165, in untokenize
    self.add_whitespace(start)
  File "/usr/local/lib/python3.0/tokenize.py", line 151, in add_whitespace
    assert row <= self.prev_row
AssertionError

我已经搜索了有关此错误及其原因的参考资料，但找不到任何参考资料.我做错了什么，我该如何纠正?

I have searched for references to this error and its causes, but have been unable to find any. What am I doing wrong and how can I correct it?

在 partisann 观察到向源添加换行符会导致错误消失后，我开始搞乱我正在取消标记的列表.如果 EOF 标记没有紧跟在换行符之前，似乎会导致错误，因此删除它可以消除错误.以下脚本运行没有错误:

After partisann's observation that appending a newline to the source causes the error to go away, I started messing with the list I was untokenizing. It seems that the EOF token causes an error if not immediately preceded by a newline so removing it gets rid of the error. The following script runs without error:

import io
import tokenize

src = 'foo="bar"'
src = bytes(src.encode())
src = io.BytesIO(src)

src = list(tokenize.tokenize(src.readline))

for tok in src:
  print(tok)

src = tokenize.untokenize(src[:-1])

Python3.0 - 标记化和取消标记化 [英] Python3.0 - tokenize and untokenize

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python3.0 - 标记化和取消标记化 [英] Python3.0 - tokenize and untokenize

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭