Markdown 文本突出显示性能问题 - Tkinter [英] Markdown Text Highlighting Performance Issues - Tkinter

查看：43 发布时间：2021/9/8 19:16:17 python-3.x regex tkinter markdown tkinter-text

本文介绍了Markdown 文本突出显示性能问题 - Tkinter的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在文本编辑器中为我的项目添加 Markdown 语法高亮显示，但我在使其用户证明同时对性能友好

I’m trying to add markdown syntax highlighting in a text editor for my project, but I am having some issues with making it user proof so to speak, while being performance friendly

基本上，我是从 Visual Studio Code 的 markdown 开始的:

Basically, I'm after this–from Visual Studio Code's markdown:

我说的是粗体、斜体、列表等的简单突出显示，以指示用户预览 Markdown 文件时将应用的样式.

I’m talking about simple highlighting of bold, italic, lists, etc. to indicate the style that will be applied when the user previews their markdown file.

我最初为我的项目设置了这种方法(简化了问题并使用颜色使样式更清晰以便调试)

I originally set up this method for my project (simplified for the question and using colours to make the styles clearer for debugging)

import re
import tkinter

root = tkinter.Tk()
root.title("Markdown Text Editor")
editor = tkinter.Text(root)
editor.pack()

# bind each key Release to the markdown checker function
editor.bind("<KeyRelease>", lambda event : check_markdown(editor.index('insert').split(".")[0]))


# configure markdown styles
editor.tag_config("bold",           foreground = "#FF0000") # red for debugging clarity
editor.tag_config("italic",         foreground = "#00FF00") # green for debugging clarity
editor.tag_config("bold-italic",    foreground = "#0000FF") # blue for debugging clarity


# regex expressions and empty tag legnth
search_expressions = {
#   <tag name>    <regex expression>   <empty tag size>
    "italic" :      ["\*(.*?)\*",           2],
    "bold" :        ["\*\*(.*?)\*\*",       4], 
    "bold-italic" : ["\*\*\*(.*?)\*\*\*",   6],
}


def check_markdown(current_line):
    # loop through each tag with the matching regex expression
    for tag, expression in search_expressions.items():
        # start and end indices for the seach area
        start_index, end_index = f"{current_line}.0", f"{current_line}.end"

        # remove all tag instances
        editor.tag_remove(tag, start_index, end_index)
        
        # while there is still text to search
        while 1:
            length = tkinter.IntVar()
            # get the index of 'tag' that matches 'expression' on the 'current_line'
            index = editor.search(expression[0], start_index, count = length, stopindex = end_index, regexp = True)
            
            # break if the expression was not met on the current line
            if not index: 
                break
            
            # else is this tag empty ('**' <- empty italic)
            elif length.get() != expression[1]: 
                # apply the tag to the markdown syntax
                editor.tag_add(tag, index, f"{index}+{length.get()}c")

            # continue searching after the markdown
            start_index = index + f"+{length.get()}c"

            # update the display - stops program freezing
            root.update_idletasks()

            continue

        continue

    return

root.mainloop()

我认为，通过删除每个 KeyRelease 的所有格式，然后重新扫描当前行，它减少了被误解的语法数量，例如粗斜体为粗体或斜体，以及标签堆叠在一起.这对于一行中的几个句子很有效，但是如果用户在一行中键入大量文本，则性能会迅速下降，并且需要长时间等待应用样式 - 特别是当涉及许多不同的 Markdown 语法时.

I reasoned that by removing all formatting each KeyRelease and then rescanning the current line, it reduces the amount of syntax being misinterpreted like bold-italic as bold or italic, and tags stacking on top of each other. This works well for a few sentences on a single line, but if the user types lots of text on one line, the performance drops fast, with long waits for the styles to be applied - especially when lots of different markdown syntax is involved.

我使用 Visual Studio Code 的 Markdown 语言突出显示作为比较，它可以在一行中处理更多的语法，然后出于性能原因"删除突出显示.

I used Visual Studio Code's markdown language highlighting as a comparison, and it could handle far more syntax on a single line before it removed the highlighting for "performance reasons".

我知道每个 keyReleaee 都要进行大量的循环，但我发现替代方案要复杂得多，而且并没有真正提高性能.

I understand this is an extremely large amount of looping to be doing every keyReleaee, but I found the alternatives to be vastly more complicated, while not really improving the performance.

我想，让我们减少负载吧.我已经测试了每次用户输入诸如星号和 m-dash 之类的降价语法时的检查，并对任何已编辑的标签(标签范围内的密钥发布)进行验证.但是用户输入有很多变量需要考虑——比如当文本被粘贴到编辑器中时，因为很难确定某些语法组合对周围文档降价的影响——这些需要检查和验证.

I thought, let’s decrease the load. I’ve tested checking every time the user types markdown syntax like asterisks and m-dashes, and doing validation on any tag that has been edited (key release within a tags range). but there are so many variables to consider with the users input– like when text is pasted into the editor, as it is difficult to determine what the effects of certain syntax combinations could have on the surrounding documents markdown–these would need to be checked and validated.

有没有更好、更直观的方法来突出我还没有想到的markdown?有没有办法大大加快我最初的想法?或者是 python 和 Tkinter 根本无法以足够快的速度完成我想要做的事情.

提前致谢.

推荐答案

如果不想使用外部库并保持代码简单，使用 re.finditer() 似乎比使用更快Text.search().

If you don't want to use an external library and keep the code simple, using re.finditer() seems faster than Text.search().

您可以使用单个正则表达式来匹配所有情况:

You can use a single regular expression to match all cases:

regexp = re.compile(r"((?P<delimiter>\*{1,3})[^*]+?(?P=delimiter)|(?P<delimiter2>\_{1,3})[^_]+?(?P=delimiter2))")

分隔符"的长度组为您提供标签，匹配范围为您提供应用标签的位置.

The length of the "delimiter" group gives you the tag and the span of the match gives you where to apply the tag.

代码如下:

import re
import tkinter

root = tkinter.Tk()
root.title("Markdown Text Editor")
editor = tkinter.Text(root)
editor.pack()

# bind each key Release to the markdown checker function
editor.bind("<KeyRelease>", lambda event: check_markdown())

# configure markdown styles
editor.tag_config("bold", foreground="#FF0000") # red for debugging clarity
editor.tag_config("italic", foreground="#00FF00") # green for debugging clarity
editor.tag_config("bold-italic", foreground="#0000FF") # blue for debugging clarity

regexp = re.compile(r"((?P<delimiter>\*{1,3})[^*]+?(?P=delimiter)|(?P<delimiter2>\_{1,3})[^_]+?(?P=delimiter2))")
tags = {1: "italic", 2: "bold", 3: "bold-italic"}  # the length of the delimiter gives the tag


def check_markdown(start_index="insert linestart", end_index="insert lineend"):
    text = editor.get(start_index, end_index)
    # remove all tag instances
    for tag in tags.values():
        editor.tag_remove(tag, start_index, end_index)
    # loop through each match and add the corresponding tag
    for match in regexp.finditer(text):
        groupdict = match.groupdict()
        delim = groupdict["delimiter"] # * delimiter
        if delim is None:
            delim = groupdict["delimiter2"]  # _ delimiter
        start, end = match.span()
        editor.tag_add(tags[len(delim)], f"{start_index}+{start}c", f"{start_index}+{end}c")
    return

root.mainloop()

注意check_markdown()只在start_index和end_index在同一行时有效，否则你需要拆分文本并执行逐行搜索.

Note that check_markdown() only works if start_index and end_index are on the same line, otherwise you need to split the text and do the search line by line.

这篇关于Markdown 文本突出显示性能问题 - Tkinter的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Markdown 文本突出显示性能问题 - Tkinter [英] Markdown Text Highlighting Performance Issues - Tkinter

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Markdown 文本突出显示性能问题 - Tkinter [英] Markdown Text Highlighting Performance Issues - Tkinter

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭