Javascript:如何从网页检索文本 [英] Javascript: How to retrieve text from a webpage
问题描述
我想以字符串的形式检索网页内的文字。这可能吗?
我是Javascript的新手。
I want to retrieve the text within a webpage as a string. Is this possible? I am new to Javascript.
例如:
var url = "http://en.wikipedia.org/wiki/Programming";
var result = url.getText(); <---- stores text as a string
document.write(result);
如何编写getText方法?以太网的整个HTML源代码(我可以用来获取文本)或只是文本。我想在网络浏览器中执行此操作。
How do I write the getText method? Ether the entire HTML source code (which I can use to get the text) or just the text. I would like to do this from within a web browser.
我尝试过这个,我可以得到一个索引号:
I tried this and I am able to get an index number:
var url = "http://www.youtube.com/results?search_query=cat&page=2";
var result;
function go(){
result = url.search(/cat/i);
document.write(result);
}
这给我一个44的索引。这意味着阅读页面是可能的。我可以做相反的事情,输入索引来检索文本吗?
This gives me an index of 44. That means that reading a page is possible. Can I do the opposite and enter the index to retrieve the text?
推荐答案
如果Ajax / Cross-Domain的情况不是您可以使用
If the Ajax/Cross-Domain situation is not an issue for you, you can extract the text of a web page with
var el = document.body; // or some other element reference
var text = el.innerText || el.textContent;
如果您需要从与应用程序相同的域中的页面读取文本,可以直接使用Ajax 。
If you need to read text from pages in the same domain as your application, you can use Ajax directly.
如果您需要从域外的页面读取文本,则必须跳过几个额外的环,如设置代理服务器或处理CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
If you need to read text from pages outside of your domain, you'll have to jump through a few extra hoops like setting up a proxy server or dealing with CORS - http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
这篇关于Javascript:如何从网页检索文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!