如何在实时语法荧光笔中处理多行注释? [英] How to handle multi-line comments in a live syntax highlighter?

查看:53
本文介绍了如何在实时语法荧光笔中处理多行注释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Java突出显示语法来编写自己的文本编辑器,此刻,每次用户输入单个字符时,它都会简单地分析并突出显示当前行.虽然可能不是最有效的方法,但它足够好,不会引起任何明显的性能问题.在伪Java中,这将是我的代码的核心概念:

I'm writing my own text editor with syntax highlighting in Java, and at the moment it simply parses and highlights the current line every time the user enters a single character. While presumably not the most efficient way, it's good enough and doesn't cause any noticeable performance issues. In pseudo-Java, this would be the core concept of my code:

public void textUpdated(String wholeText, int updateOffset, int updateLength) {
    int lineStart = getFirstLineStart(wholeText, updateOffset);
    int lineEnd = getLastLineEnd(wholeText, updateOffset + updateLength);

    List<Token> foundTokens = tokenizeText(wholeText, lineStart, lineEnd);

    for(Token token : foundTokens) {
        highlightText(token.offset, token.length, token.tokenType);
    }
}

真正的问题在于多行注释.为了检查输入的字符是否在多行注释中,该程序将需要解析回最近出现的"/*",同时还要知道此出现是在文字注释中还是在另一个注释中.如果文本量很小,这将不是问题,但是如果文本包含20,000行代码,则每次按键时可能不得不扫描并(重新)突出显示20,000行代码,这将非常低效

The real problem lies with multi-line comments. To check if an entered character is inside a multi-line comment, the program would need to parse back to the most recent occurrence of a "/*", while also being aware of whether this occurrence is inside a literal or another comment. This would not be an issue if the amount of text is small, but if the text consists of 20,000 lines of code, it would possibly have to scan and (re)highlight 20,000 lines of code on each key press, which would be very inefficient.

所以我的最终问题是:如何在语法突出显示工具中处理多行标记/注释,同时又保持高效?

So my ultimate question is: how do I handle multi-line tokens/comments in a syntax highlighter while keeping it efficient?

推荐答案

一种常见的方法是在每行的开头保存词法分析器状态.(通常,词法分析器状态将是一个小整数或枚举;对于类似Java的语言,它可能会限制为三个值:normal,多行注释内部和多行字符串常量.)

One common approach is to save the lexer state at the start of each line. (Typically, the lexer state will be a small integer or enum; for Java-like languages, it would probably be limited to three values: normal, inside multiline comment, and inside multiline string constant.)

更改一行可能会在下一行的开头更改词法分析器状态,但无法更改当前行开头的状态,因此可以从该行的开头进行重新标记使用当前行的词法分析器状态作为起始条件的行.保持每行词法分析器状态可以轻松处理将光标移动到另一行(可能相距很远)的情况.

A change to a line could change the lexer state at the start of the next line, but it can't change the state at the beginning of the current line, so the retokenisation of the line can be done from the start of the line, using the current line's lexer state as a starting condition. Keeping per-line lexer states makes it easy to handle the case where the cursor is moved to another line, possibly quite some distance away.

如果编辑更改了行末(即下一行的开始)的词法分析器状态,则可以重新扫描文件的其余部分.但是,对用户而言,立即这样做确实很烦人,因为这意味着每次他们输入引号时,整个屏幕都会重新绘制,因为它已成为多行字符串的一部分(例如).由于大多数情况下,用户会关闭字符串(或注释),因此通常最好延迟重新扫描.例如,您可能要等到用户移动光标或完成词汇元素或某些其他此类信号为止.另一种常用的方法是在光标后插入 的"ghost"关闭符号,这将使lex保持同步.如果用户明确键入或删除它,则该虚影将被删除.

If the edit changes the lexer state at the end of the line (which is to say the start of the next line) you could rescan the rest of the file. However, doing so immediately is really annoying for the user because it means that every time they type a quote, the entire scrern gets repainted, because it has become part of a multiline string (for example). Since most of the time, the user wil close the string (or comment), it is usually better to delay the rescan. For example, you might wait until the user moves the cursor or completes the lexical element or some other such signal. Another comon approach is to insert a "ghost" close symbol after the cursor, which will keep the lex in sync. The ghost will be deleted if the user types it explicitly, or deletes it explicitly.

您似乎将整个程序保留为单个字符串.恕我直言,最好将其保留为行列表,以避免在插入或删除字符时必须复制整个字符串.否则,编辑很长的文件会变得很烦人.

You seem to be keeping the entire program as a single string. IMHO, it's better to keep it as a list of lines, to avoid having to copy the entire string when a character is inserted or deleted. Otherwise, editing very long files becomes really annoying.

最后,永远不要标记不可见的文本.避免这种情况将限制大型重新令牌化的破坏.

Finally, you should never tokenise text which is not visible. Avoiding that will limit the damage of large retokenisations.

这篇关于如何在实时语法荧光笔中处理多行注释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆