替换非渲染(非显示)元素文本中的↵(\ n) [英] Replace ↵ (\n) in non-render (non-display) element text

查看:43
本文介绍了替换非渲染(非显示)元素文本中的↵(\ n)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个解析器,该解析器从隐藏的iframe中获取数据.

I'm write a parser that gets data from hidden iframes.

在文本中,我需要将 \ n (↵)字符替换为 (空格).我将其用于此任务- text.replace(/\ n/gi,").但是,它仅适用于可见元素(即没有 display:none ).如果该元素不可见( display:none ),换行符就会消失并且不会得到任何替换.

In text i need to replace \n (↵) characters by (space). I use this for this task - text.replace(/\n/gi, " "). However, it is only works for visible elements (i.e. don't haven't display: none). If the element is not visible (display: none) new-lines just disappears and don't get any replacement.

HTML示例:

<div data-custom="languages">
    <div>
        <div>
            <h2>
                <span>Just a text that will be removed</span>
            </h2>
            <p>A - b</p>
            <p>c - d</p>
        </div>
    </div>
</div>

JS示例:

visibleIframe.style.display = "block";
invisibleIframe.style.display = "none";

const visibleDivWithNestedDivs = visibleIframe.querySelector(`[data-custom="languages"]`);
const invisibleDivWithNestedDivs = invisibleIframe.querySelector(`[data-custom="languages"]`);

const visibleText = visibleDivWithNestedDivs.innerText; // "A - b↵c - d"
const invisibleText = invisibleDivWithNestedDivs.innerText; // "A - b↵c - d"

console.log(visibleText.replace(/\n/gi, " ")); // "A - b c - d" (expected result)
console.log(invisibleText.replace(/\n/gi, " ")); // "A - bc - d" (unexpected result, no space between "b" and "c")

我尝试过的事情:

.replace(/\n/gi, " ")
.replace(/\r\n/gi, " ")
.replace(/↵/gi, " ")
.replace(/↵↵/gi, " ") // in some cases there was two of this.
.split("↵").join(" ") 
.split("\n").join(" ")
white-space: pre
white-space: pre-wrap

您测试过吗?

我有99%的把握是因为 display:none .我对其进行了测试,不同的iframe展示给了我不同的结果.

I'm 99% sure it's because of display: none. I tested it and different display of iframes give me different result.

TextContent

我不需要 textContent ,因为这会返回不含 \ n 字符的文本.我使用 innerText .

I don't need textContent because this returns a text without \n characters. I use innerText.

问题:

  1. 意外结果是否可能不是因为 display:none ?
  2. 我应该怎么做才能达到预期的结果?

推荐答案

首先,让我们根据您提供的示例清除您似乎有的一些误解.

First, let's clear up a few misunderstandings you seem to have based on the examples you've provided.

是一个Unicode字符,被描述为DOWNWARDS ARROW WITH CORNER LEFTWARDS.当然,它可以很好地直观显示换行符或Return/Enter键,但是在代码中没有任何意义.如果在正则表达式中使用此符号,则正则表达式将尝试匹配包含箭头符号的文本.

is a unicode character described as DOWNWARDS ARROW WITH CORNER LEFTWARDS. Sure, it makes a nice visual representation of a line break or the Return/Enter key, but it has no meaning in code. If you use this symbol in a regular expression, the regular expression will try to match for text that includes the arrow symbol.

在大多数编程语言中,字符串中的 \ n 表示换行符,您不必为引擎盖下的表示方式而烦恼,无论它带有CR,LF,或两者兼而有之.所以我不会在JavaScript中使用 \ r .

In most programming languages, \n in a string represents a line break, and you don't have to be bothered by how it is represented under the hood, be it with a CR, an LF, or both. So I wouldn't use \r in JavaScript.

.replace(/\ n/gi,")是一个完全有效的选项,具体取决于您要执行的操作.但是,您可能希望替换任何包含换行符的空白序列.在那种情况下,我会改用它: .replace(/\ s +/,").RegExp中的 \ s 特殊代码匹配任何类型的空白,包括换行符.添加 + 使其匹配任何空白序列.使用此选项可确保将这样的字符串"a \ n \ n b" 转换为"a b" .

.replace(/\n/gi, " ") is a perfectly valid option, depending on what you want to do. You might want to replace any sequence of whitespace that includes newlines, however. In that case, I would use this instead: .replace(/\s+/, " "). The \s special code in RegExp matches any kind of white space including line breaks. Adding a + makes it match any sequence of white space. Using this will ensure that a string like this one "a \n \n b" gets turned into "a b".

现在已经解决了正则表达式问题,让我们看一下 innerText .根据 HTML生活标准通过查看针对innerText的MDN文章找到, innerText 属性是从该元素复制粘贴文本时用户将获得的近似值.定义如下:

Now that the regular expression issues have been dealt with, let's look at innerText. According to the HTML Living Standard which I found by looking at the MDN article for innerText, the innerText property is an approximation of what the user will get when copy-pasting the text from that element. It is defined like this:

如果未呈现此元素,或者用户代理为非CSS用户代理,则返回与此元素上的textContent IDL属性相同的值.注意:此步骤可能会产生令人惊讶的结果,因为当在未呈现的元素上访问innerText属性时,将返回其文本内容,但是在正在呈现的元素上访问时,其所有未呈现的子代都具有他们的文字内容被忽略了.

If this element is not being rendered, or if the user agent is a non-CSS user agent, then return the same value as the textContent IDL attribute on this element. Note: This step can produce surprising results, as when the innerText attribute is accessed on an element not being rendered, its text contents are returned, but when accessed on an element that is being rendered, all of its children that are not being rendered have their text contents ignored.

这回答了为什么可见元素和隐藏元素之间可能会有区别.至于换行的数量,确定字符串中有多少个换行的算法是在

This answers why there might be a difference between visible and hidden elements. As for the number of line breaks, the algorithm that determines how many line breaks are in the string is defined recursively on the standard page and it is quite confusing, which is why I would advise not to base your logic on the behavior of this function. innerText is meant to be an approximation.

我建议您看一下 textContent ,不受CSS的影响.

I suggest taking a look at textContent, which isn't affected by CSS.

所以总结一下这个长解释:

So to wrap up this long explanation:

  1. 是,显示:无不会影响 innerText
  2. 根据您的目标,我可能会使用 foo.textContent.replace(/\ s +/g,").
  1. Yes, display: none does influence innerText
  2. I might use foo.textContent.replace(/\s+/g, " ") depending on what your goal is.

这篇关于替换非渲染(非显示)元素文本中的↵(\ n)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆