从contentEditable div中提取文本 [英] Extracting text from a contentEditable div

查看:261
本文介绍了从contentEditable div中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个div设置为 contentEditable ,并用 white-space:pre 样式,像linebreaks。在Safari,FF和IE中,div看起来和工作原理是一样的。一切都很好。我想要做的是从这个div中提取文本,但是这样不会丢失格式化 - 具体来说,换行符。



我们使用jQuery,它的 text()函数基本上做了一个预订DFS,并将DOM分支中的所有内容粘合到一个单一的块中。这丢失了格式。



我看了一下 html()函数,浏览器与在 contentEditable div中的幕后生成的实际HTML做不同的事情。假设我将其键入我的div:

  1 
2
3

这些是结果:



Safari 4:

  1 
< div> 2< / div>
< div> 3< / div>

Firefox 3.6:

  1 
< br _moz_dirty =>
2
< br _moz_dirty =>
3
< br _moz_dirty =>
< br _moz_dirty =type =_ moz>

IE 8:



< < P> 1< / P>< P> 2< / P>

Ugh。这里没有什么非常一致。令人惊讶的是,MSIE看起来最健全! (大写P标签和所有)



div将动态设置样式(字体,颜色,大小和对齐方式),这是使用CSS完成的,所以我不是确定是否可以使用 pre 标记(在使用Google的某些网页上被隐藏)。



有没有人知道任何JavaScript代码和/或jQuery插件或将从contentEditable div中提取文本,以保持linebreaks的方式?我不想重新创建解析轮,如果我



更新:我从jQuery 1.4.2中捕获了 getText 函数,并将其修改为(我只添加一行换行符);

  function extractTextWithWhitespace(elems){
var ret =,elem;

for(var i = 0; elems [i]; i ++){
elem = elems [i];

//从文本节点和CDATA节点获取文本
if(elem.nodeType === 3 || elem.nodeType === 4){
ret + = elem.nodeValue +\\\
;

//遍历一切,注释节点除外
} else if(elem.nodeType!== 8){
ret + = extractTextWithWhitespace2(elem.childNodes)
}
}

return ret;
}

我调用此函数并使用其输出将其分配给XML节点jQuery,例如:

  var extractedText = extractTextWithWhitespace($(this)); 
var $ someXmlNode = $('< someXmlNode />');
$ someXmlNode.text(extractedText);

生成的XML最终通过AJAX调用发送到服务器。



这在Safari和Firefox中运行良好。



在IE上,只有第一个'\\\
'看看它更多,看起来像jQuery设置文本像这样(jQuery-1.4.2.js的行4004):

  return this.empty()。append((this [0]&& this [0] .ownerDocument || document).createTextNode 

阅读 createTextNode IE的实现可能会混淆空白。这是真的还是我做错了什么?

解决方案

我忘了这个问题,直到现在,当Nico打了一个赏金。



我通过编写我自己需要的函数来解决这个问题,从现有的jQuery代码库中绑定一个函数,并根据需要修改它。



我用Safari(WebKit),IE,Firefox和Opera测试了这个函数。我没有打扰检查任何其他浏览器,因为整个contentEditable东西是非标准的。如果任何浏览器的更新改变了它们如何实现contentEditable,也可能破坏这个函数。所以程序员要小心。

  function extractTextWithWhitespace(elems)
{
var lineBreakNodeName =BR // Use< br>作为默认
if($ .browser.webkit)
{
lineBreakNodeName =DIV;
}
else if($ .browser.msie)
{
lineBreakNodeName =P;
}
else if($ .browser.mozilla)
{
lineBreakNodeName =BR;
}
else if($ .browser.opera)
{
lineBreakNodeName =P;
}
var extractedText = extractTextWithWhitespaceWorker(elems,lineBreakNodeName);

return extractText;
}

//从jQuery 1.4.2(getText)获取并修改为保留空格
function extractTextWithWhitespaceWorker(elems,lineBreakNodeName)
{
var ret =;
var elem;

for(var i = 0; elems [i]; i ++)
{
elem = elems [i];

if(elem.nodeType === 3 // text node
|| elem.nodeType === 4)// CDATA节点
{
ret + = elem.nodeValue;
}

if(elem.nodeName === lineBreakNodeName)
{
ret + =\\\
;
}

if(elem.nodeType!== 8)//注释节点
{
ret + = extractTextWithWhitespace(elem.childNodes,lineBreakNodeName);
}
}

return ret;
}


I have a div set to contentEditable and styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.

We are using jQuery, whose text() function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.

I had a look at the html() function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable div. Assuming I type this into my div:

1
2
3

These are the results:

Safari 4:

1
<div>2</div>
<div>3</div>

Firefox 3.6:

1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">

IE 8:

<P>1</P><P>2</P><P>3</P>

Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)

The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre tag (which was alluded to on some pages I found using Google).

Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.

Update: I cribbed the getText function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);

function extractTextWithWhitespace( elems ) {
    var ret = "", elem;

    for ( var i = 0; elems[i]; i++ ) {
        elem = elems[i];

        // Get the text from text nodes and CDATA nodes
        if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
            ret += elem.nodeValue + "\n";

        // Traverse everything else, except comment nodes
        } else if ( elem.nodeType !== 8 ) {
            ret += extractTextWithWhitespace2( elem.childNodes );
        }
    }

    return ret;
}

I call this function and use its output to assign it to an XML node with jQuery, something like:

var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);

The resulting XML is eventually sent to a server via an AJAX call.

This works well in Safari and Firefox.

On IE, only the first '\n' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):

return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );

Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

解决方案

I forgot about this question until now, when Nico slapped a bounty on it.

I solved the problem by writing the function I needed myself, cribbing a function from the existing jQuery codebase and modifying it to work as I needed.

I've tested this function with Safari (WebKit), IE, Firefox and Opera. I didn't bother checking any other browsers since the whole contentEditable thing is non-standard. It is also possible that an update to any browser could break this function if they change how they implement contentEditable. So programmer beware.

function extractTextWithWhitespace(elems)
{
    var lineBreakNodeName = "BR"; // Use <br> as a default
    if ($.browser.webkit)
    {
        lineBreakNodeName = "DIV";
    }
    else if ($.browser.msie)
    {
        lineBreakNodeName = "P";
    }
    else if ($.browser.mozilla)
    {
        lineBreakNodeName = "BR";
    }
    else if ($.browser.opera)
    {
        lineBreakNodeName = "P";
    }
    var extractedText = extractTextWithWhitespaceWorker(elems, lineBreakNodeName);

    return extractedText;
}

// Cribbed from jQuery 1.4.2 (getText) and modified to retain whitespace
function extractTextWithWhitespaceWorker(elems, lineBreakNodeName)
{
    var ret = "";
    var elem;

    for (var i = 0; elems[i]; i++)
    {
        elem = elems[i];

        if (elem.nodeType === 3     // text node
            || elem.nodeType === 4) // CDATA node
        {
            ret += elem.nodeValue;
        }

        if (elem.nodeName === lineBreakNodeName)
        {
            ret += "\n";
        }

        if (elem.nodeType !== 8) // comment node
        {
            ret += extractTextWithWhitespace(elem.childNodes, lineBreakNodeName);
        }
    }

    return ret;
}

这篇关于从contentEditable div中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆