Javascript:如何从网页检索文本 [英] Javascript: How to retrieve text from a webpage

查看:126
本文介绍了Javascript:如何从网页检索文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以字符串的形式检索网页内的文字。这可能吗?
我是Javascript的新手。

I want to retrieve the text within a webpage as a string. Is this possible? I am new to Javascript.

例如:

var url = "http://en.wikipedia.org/wiki/Programming";
var result = url.getText();  <---- stores text as a string
document.write(result);

如何编写getText方法?以太网的整个HTML源代码(我可以用来获取文本)或只是文本。我想在网络浏览器中执行此操作。

How do I write the getText method? Ether the entire HTML source code (which I can use to get the text) or just the text. I would like to do this from within a web browser.

我尝试过这个,我可以得到一个索引号:

I tried this and I am able to get an index number:

var url = "http://www.youtube.com/results?search_query=cat&page=2";
var result;
function go(){
    result = url.search(/cat/i);
    document.write(result);
}

这给我一个44的索引。这意味着阅读页面是可能的。我可以做相反的事情,输入索引来检索文本吗?

This gives me an index of 44. That means that reading a page is possible. Can I do the opposite and enter the index to retrieve the text?

推荐答案

如果Ajax / Cross-Domain的情况不是您可以使用

If the Ajax/Cross-Domain situation is not an issue for you, you can extract the text of a web page with

var el = document.body; // or some other element reference
var text = el.innerText || el.textContent;

如果您需要从与应用程序相同的域中的页面读取文本,可以直接使用Ajax 。

If you need to read text from pages in the same domain as your application, you can use Ajax directly.

如果您需要从域外的页面读取文本,则必须跳过几个额外的环,如设置代理服务器或处理CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing

If you need to read text from pages outside of your domain, you'll have to jump through a few extra hoops like setting up a proxy server or dealing with CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing

这篇关于Javascript:如何从网页检索文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆