使用JavaScript从HTML字符串中提取文本 [英] Extract the text out of HTML string using JavaScript
本文介绍了使用JavaScript从HTML字符串中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图使用JS函数(字符串作为参数传递)来获取HTML字符串的内部文本。这里是代码:
function extractContent(value){
var content_holder =;
$ b $ for(var i = 0; i< value.length; i ++){
if(value.charAt(i)==='>'){
continue ;
while(value.charAt(i)!='<'){
content_holder + = value.charAt(i);
}
}
}
console.log(content_holder);
extractContent(< p> Hello< / p>< a href ='http://w3c.org'> W3C< / a>);
问题是控制台上没有打印任何东西( content_holder )。我认为这个问题是由===操作符引起的。
解决方案
创建一个元素,在其中存储HTML ,并获取它的 textContent
:
function extractContent(s){var span = document.createElement('跨度'); span.innerHTML = s;返回span.textContent || span.innerText;}; alert(extractContent(< p> Hello< / p>< a href ='http://w3c.org'> W3C< / a>));
$ p 以下是一个允许节点之间有空格的版本,虽然您可能只想要块级元素:
function extractContent(s,space){var span = document.createElement跨度'); span.innerHTML = s; if(space){var children = span.querySelectorAll('*'); for(var i = 0; i< children.length; i ++){if(children [i] .textContent)children [i] .textContent + =''; else children [i] .innerText + =''; }} return [span.textContent || span.innerText] .toString()。replace(/ + / g,'');}; console.log(extractContent(< p> Hello< / p>< a href ='http://w3c.org'> W3C< / a> ;.很高兴< em>请参阅< / em>< ; strong>< em> you!< / em>< / strong>)); console.log(extractContent(< p> Hello< / p>< a href ='http:// w3c .org'>< />< / a> ;. code>
I am trying to get the inner text of HTML string, using a JS function(the string is passed as an argument). Here is the code:
function extractContent(value) {
var content_holder = "";
for(var i=0;i<value.length;i++) {
if(value.charAt(i) === '>') {
continue;
while(value.charAt(i) != '<') {
content_holder += value.charAt(i);
}
}
}
console.log(content_holder);
}
extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>");
The problem is that nothing gets printed on the console(content_holder stays empty). I think the problem is caused by the "===" operator..
解决方案
Create an element, store the HTML in it, and get its textContent
:
function extractContent(s) {
var span= document.createElement('span');
span.innerHTML= s;
return span.textContent || span.innerText;
};
alert(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>"));
Here's a version that allows you to have spaces between nodes, although you'd probably want that for block-level elements only:
function extractContent(s, space) {
var span= document.createElement('span');
span.innerHTML= s;
if(space) {
var children= span.querySelectorAll('*');
for(var i = 0 ; i < children.length ; i++) {
if(children[i].textContent)
children[i].textContent+= ' ';
else
children[i].innerText+= ' ';
}
}
return [span.textContent || span.innerText].toString().replace(/ +/g,' ');
};
console.log(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>. Nice to <em>see</em><strong><em>you!</em></strong>"));
console.log(extractContent("<p>Hello</p><a href='http://w3c.org'>W3C</a>. Nice to <em>see</em><strong><em>you!</em></strong>",true));
这篇关于使用JavaScript从HTML字符串中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文