屈服拼写检查器 [英] Yielding spellchecker

查看:82
本文介绍了屈服拼写检查器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标题可能有点误导;我的拼写检查工具更注重格式,而不是拼写(大写,标点和空格,撇号,将互联网语转换为完整单词,经常被打乱的单词等).但是基本原则仍然适用.

基本上,我正在构建的JS/jQuery检查器会在键入单词时纠正单词(在单词之后键入空格或标点符号之后).

但是,与任何自动更正一样,它一定会出错.我什至没有考虑创建在给定情况下确定它"还是它"更合适的功能(尽管如果存在这样的插件或代码片段,请指出一个).

因此,我想使其成为自动"更正(因为缺乏更好的名称知识).基本上;

  1. 用户输入会触发检查器的单词,然后输入 空间.
  2. 检查器会更正单词.
  3. 用户认为这是一个错误 并改正它(通过对整个单词或单词的一部分进行退格, 或突出显示它,或者他们很乐意对其进行编辑.)
  4. 用户继续输入,并且检查器不会再次触摸该单词的实例.

当然,现在最简单的方法是完全禁用该单词的检查,但是我希望检查器更正它的将来实例.我要寻找的是检测到用户将自动更正的单词(无论是键入后还是以后)修改为自动更正之前的样子,然后学会不理会该单词的特定实例.

我什至不知道从哪里开始.我正在考虑一个内容可编辑的单词,每个单词都包裹在一个范围内,自动更正的单词具有特殊的类,并且data- *属性包含原始单词,侦听对自动更正的单词的编辑,如果它被编辑回等于data- *值,请添加一个将其排除在以后自动更正回合之外的类.

我在想,这可能不必要地复杂,或者至少不是阻力最小的途径.这样做最聪明的方法是什么?

解决方案

乍看之下,您建议的方法(将span中的每个单词分隔并在其中存储其他数据)似乎是最明智的方法.在编辑器级别,您只需要确保所有文本都在span内,并且每个文本仅包含一个单词即可(必要时将其拆分).在单词级别上,只需侦听span中的变化(绑定inputpropertyChange)并根据其类别/数据采取行动即可.

但是,真正的痛苦是保持插入符号的位置一致.当您更改textarea或带有contentEditable的元素的内容时,插入符号的移动非常不可预测,并且没有简单的(跨浏览器)跟踪插入符号的方法.我在SO和其他地方都在寻找解决方案,而我发现最简单的工作解决方案是

因此,我建议采用以下方法:

  • 保留Array中的单词列表,其中每个单词既存储当前值,又存储原始值;
  • textarea的内容更改时,保留一组不变的单词,然后重做其余部分;
  • 如果插入符仅在非单词字符之后(需要改进),并且您没有点击backspace;
  • ,则仅应用拼写检查
  • 如果用户不满意该更正,则按一次backspace将会撤消它,并且除非进行了修改,否则不会再次对其进行检查.
    • 如果一次进行了许多更正(例如,如果复制了很多文本),则每个backspace都将撤消一项更正,直到没有更正为止.
    • 点击其他任何键都将提交更正,因此,如果用户仍然不满意,则必须返回并再次更改它.
    • 注意:与OP要求不同,如果用户输入非文字字符,则更改后的版本 将再次自动更正;他需要按一次backspace来保护"它.

我在 jsFiddle 上创建了一个简单的概念验证.详细信息如下.请注意,您可以将其与其他方法结合使用(例如,检测向下箭头"键并显示带有一些自动更正选项的菜单)等.


概念验证的步骤详细说明:

  • 保留Array中的单词列表,其中每个单词既存储当前值,又存储原始值;

    var words = [];
    

    此正则表达式将文本拆分为单词(每个单词都有一个word属性和一个sp属性;后者在其后立即存储非单词字符)

    delimiter:/^(\w+)(\W+)(.*)$/,
    ...
    regexSplit:function(regex,text) {
        var ret = [];
        for ( var match = regex.exec(text) ; match ; match = regex.exec(text) ) {
            ret.push({
                word:match[1],
                sp:match[2],
                length:match[1].length + match[2].length
            });
            text = match[3];
        }
        if ( text )
            ret.push({word:text, sp:'', length:text.length});
         return ret;
    }
    

  • textarea的内容更改时,保留一组不变的单词,然后重做其余单词;

        // Split all the text
        var split = $.autocorrect.regexSplit(options.delimiter, $this.val());
        // Find unchanged words in the beginning of the field
        var start = 0;
        while ( start < words.length && start < split.length ) {
            if ( !words[start].equals(split[start]) )
                break;
            start++;
        }
        // Find unchanged words in the end of the field
        var end = 0;
        while ( 0 < words.length - end && 0 < split.length - end ) {
            if ( !words[words.length-end-1].equals(split[split.length-end-1]) ||
                 words.length-end-1 < start )
                break;
            end++;
        }
        // Autocorrects words in-between
        var toSplice = [start, words.length-end - start];
        for ( var i = start ; i < split.length-end ; i++ )
            toSplice.push({
                word:check(split[i], i),
                sp:split[i].sp,
                original:split[i].word,
                equals:function(w) {
                    return this.word == w.word && this.sp == w.sp;
                }
            });
        words.splice.apply(words, toSplice);
        // Updates the text, preserving the caret position
        updateText();
    

  • 仅当插入符号位于非单词字符之后(改进的余地)并且您未按backspace;

    时,才应用拼写检查

    var caret = doGetCaretPosition(this);
    var atFirstSpace = caret >= 2 &&
                       /\w\W/.test($this.val().substring(caret-2,caret));
    function check(word, index) {
        var w = (atFirstSpace && !backtracking ) ?
                options.checker(word.word) :
                word.word;
        if ( w != word.word )
            stack.push(index); // stack stores a list of auto-corrections
        return w;
    }
    

  • 如果用户不满意该更正,则一次按backspace键将撤消该更改,并且除非进行了修改,否则将不会再次对其进行检查.

    $(this).keydown(function(e) {
        if ( e.which == 8 ) {
            if ( stack.length > 0 ) {
                var last = stack.pop();
                words[last].word = words[last].original;
                updateText(last);
                return false;
            }
            else
                backtracking = true;
            stack = [];
        }
    });
    

  • updateText的代码仅将所有单词再次连接到字符串中,然后将值设置回textarea.如果未做任何更改,则插入符将保留,或者在上一次自动更正完成/取消之后放置,以说明文本长度的更改:

    function updateText(undone) {
        var caret = doGetCaretPosition(element);
        var text = "";
        for ( var i = 0 ; i < words.length ; i++ )
            text += words[i].word + words[i].sp;
        $this.val(text);
        // If a word was autocorrected, put the caret right after it
        if ( stack.length > 0 || undone !== undefined ) {
            var last = undone !== undefined ? undone : stack[stack.length-1];
            caret = 0;
            for ( var i = 0 ; i < last ; i++ )
                caret += words[i].word.length + words[i].sp.length;
            caret += words[last].word.length + 1;
        }
        setCaretPosition(element,caret);
    }
    

  • 最终的插件结构:

    $.fn.autocorrect = function(options) {
        options = $.extend({
            delimiter:/^(\w+)(\W+)(.*)$/,
            checker:function(x) { return x; }
        }, options);
        return this.each(function() {
            var element = this, $this = $(this);
            var words = [];
            var stack = [];
            var backtracking = false;
            function updateText(undone) { ... }
            $this.bind("input propertyChange", function() {
                stack = [];
                // * Only apply the spell check if the caret...
                // * When the contents of the `textarea` changes...
                backtracking = false;
            });
            // * If the user was unsatisfied with the correction...
        });
    };
    $.autocorrect = {
        regexSplit:function(regex,text) { ... }
    };
    

The title is a bit misleading maybe; my spellchecker focuses more on format than spelling (caps, punctuation and spaces, apostrophes, converting internet slang to full words, oft-scrambled words etc.). However the basic principles apply.

Basically, the JS/jQuery checker I'm building would correct words as they are typed (after a space or punctuation has been typed after the word).

However, much like any autocorrecting, it's bound to run into mistakes. I'm not even considering creating functionality that would determine whether "its" or "it's" is more appropriate in a given case (though if such a plugin or code snippet exists, do point me to one).

So I want to make it a "yielding" autocorrect (for the lack of the knowledge of a better name). Basically;

  1. User types in a word that would set off the checker, and types a space.
  2. The checker corrects the word.
  3. The user deems this a mistake and corrects it back (by Backspacing the whole word, or parts of it, or highlighting it or however they feel comfortable editing it).
  4. The user continues typing, and the checker doesn't touch that instance of that word again.

Now easiest of course would be to disable the check for that word entirely, but I want the checker to correct future instances of it. What I'm looking for would detect a user editing an autocorrected word (regardless whether right after typing or later) back to what it was before being autocorrected, and then learning to leave that specific instance of that word alone.

I don't even know where to begin with this. I'm thinking a contenteditable with each word wrapped in a span, autocorrected ones having a special class and a data-* attribute containing the original one, listen for edits on the autocorrected words, and if it's edited back to equaling the data-* value, add a class that leaves it out of future autocorrect rounds.

I'm thinking though that this might be unnecessarily complicated, or at least not the path of least resistance. What would be the smartest way of doing this?

解决方案

Your suggested approach (separating each word in a span and storing additional data in it) at first glance seems to be the most sensible approach. On the editor level, you just need to ensure all text is inside some span, and that each of them contains only a single word (splitting it if necessary). On the word level, just listen for changes in the spans (binding input and propertyChange) and act according to its class/data.

However, the real pain is to keep the caret position consistent. When you change the contents of either a textarea or an element with contentEditable, the caret moves rather unpredictably, and there's no easy (cross-browser) way of keeping track of the caret. I searched for solutions both here at SO and elsewhere, and the simplest working solution I found was this blog post. Unfortunatly it only applied to textarea, so the "each word in a span" solution couldn't be used.

So, I suggest the following approach:

  • Keep a list of words in an Array, where each word stores both the current value and the original;
  • When the contents of the textarea changes, keep the set of unchanged words and redo the rest;
  • Only apply the spell check if the caret is just after a non-word character (room for improvement) and you're not hitting backspace;
  • If the user was unsatisfied with the correction, hitting backspace once will undo it, and it won't be checked again unless modified.
    • If many corrections were done at once (for instance, if a lot of text were copy-pasted), each backspace will undo one correction until no one is left.
    • Hitting any other key will commit the correction, so if the user is still unsatisfied he'll have to go back and change it again.
    • Note: differently from the OP requirements, the changed version will be autocorrected again if the user inputs a non-word character; he'll need to hit backspace once to "protect" it.

I created a simple proof-of-concept at jsFiddle. Details below. Note that you can combine it with other approaches (for instance, detecting a "down arrow" key and displaying a menu with some auto-correcting options) etc.


Steps of the proof-of-concept explained in detail:

  • Keep a list of words in an Array, where each word stores both the current value and the original;

    var words = [];
    

    This regex splits the text into words (each word has a word property and a sp one; the latter stores non-word characters immediatly following it)

    delimiter:/^(\w+)(\W+)(.*)$/,
    ...
    regexSplit:function(regex,text) {
        var ret = [];
        for ( var match = regex.exec(text) ; match ; match = regex.exec(text) ) {
            ret.push({
                word:match[1],
                sp:match[2],
                length:match[1].length + match[2].length
            });
            text = match[3];
        }
        if ( text )
            ret.push({word:text, sp:'', length:text.length});
         return ret;
    }
    

  • When the contents of the textarea changes, keep the set of unchanged words and redo the rest;

        // Split all the text
        var split = $.autocorrect.regexSplit(options.delimiter, $this.val());
        // Find unchanged words in the beginning of the field
        var start = 0;
        while ( start < words.length && start < split.length ) {
            if ( !words[start].equals(split[start]) )
                break;
            start++;
        }
        // Find unchanged words in the end of the field
        var end = 0;
        while ( 0 < words.length - end && 0 < split.length - end ) {
            if ( !words[words.length-end-1].equals(split[split.length-end-1]) ||
                 words.length-end-1 < start )
                break;
            end++;
        }
        // Autocorrects words in-between
        var toSplice = [start, words.length-end - start];
        for ( var i = start ; i < split.length-end ; i++ )
            toSplice.push({
                word:check(split[i], i),
                sp:split[i].sp,
                original:split[i].word,
                equals:function(w) {
                    return this.word == w.word && this.sp == w.sp;
                }
            });
        words.splice.apply(words, toSplice);
        // Updates the text, preserving the caret position
        updateText();
    

  • Only apply the spell check if the caret is just after a non-word character (room for improvement) and you're not hitting backspace;

    var caret = doGetCaretPosition(this);
    var atFirstSpace = caret >= 2 &&
                       /\w\W/.test($this.val().substring(caret-2,caret));
    function check(word, index) {
        var w = (atFirstSpace && !backtracking ) ?
                options.checker(word.word) :
                word.word;
        if ( w != word.word )
            stack.push(index); // stack stores a list of auto-corrections
        return w;
    }
    

  • If the user was unsatisfied with the correction, hitting backspace once will undo it, and it won't be checked again unless modified.

    $(this).keydown(function(e) {
        if ( e.which == 8 ) {
            if ( stack.length > 0 ) {
                var last = stack.pop();
                words[last].word = words[last].original;
                updateText(last);
                return false;
            }
            else
                backtracking = true;
            stack = [];
        }
    });
    

  • The code for updateText simply joins all words again into a string, and set the value back to the textarea. The caret is preserved if nothing was changed, or placed just after the last autocorrection done/undone, to account for changes in the text length:

    function updateText(undone) {
        var caret = doGetCaretPosition(element);
        var text = "";
        for ( var i = 0 ; i < words.length ; i++ )
            text += words[i].word + words[i].sp;
        $this.val(text);
        // If a word was autocorrected, put the caret right after it
        if ( stack.length > 0 || undone !== undefined ) {
            var last = undone !== undefined ? undone : stack[stack.length-1];
            caret = 0;
            for ( var i = 0 ; i < last ; i++ )
                caret += words[i].word.length + words[i].sp.length;
            caret += words[last].word.length + 1;
        }
        setCaretPosition(element,caret);
    }
    

  • The final plugin structure:

    $.fn.autocorrect = function(options) {
        options = $.extend({
            delimiter:/^(\w+)(\W+)(.*)$/,
            checker:function(x) { return x; }
        }, options);
        return this.each(function() {
            var element = this, $this = $(this);
            var words = [];
            var stack = [];
            var backtracking = false;
            function updateText(undone) { ... }
            $this.bind("input propertyChange", function() {
                stack = [];
                // * Only apply the spell check if the caret...
                // * When the contents of the `textarea` changes...
                backtracking = false;
            });
            // * If the user was unsatisfied with the correction...
        });
    };
    $.autocorrect = {
        regexSplit:function(regex,text) { ... }
    };
    

这篇关于屈服拼写检查器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆