从维基百科获取随机摘录(Javascript中,仅客户端) [英] Fetch random excerpt from Wikipedia (Javascript, client-only)

查看:138
本文介绍了从维基百科获取随机摘录(Javascript中,仅客户端)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网页,要求文本段的用户,然后在其上执行某些操作。为了演示它懒惰的用户,我想补充一句我觉得很幸运按钮,将抓住从维基百科的一些随机的文本和填充的投入。

I have a web page that asks the user for a paragraph of text, then performs some operation on it. To demo it to lazy users, I'd like to add an "I feel lucky" button that will grab some random text from Wikipedia and populate the inputs.

我如何使用JavaScript来从一个随机的Wikipedia文章获取文本序列?

How can I use Javascript to fetch a sequence of text from a random Wikipedia article?

我发现取和的parsing使用的文章=htt​​p://en.wikipedia.org/w/api.php 相对=nofollow>维基百科的API ,但他们往往是服务器端。我在寻找从客户端运行完全并没有得到通过<一个scuppered解决方案href="http://stackoverflow.com/questions/2067029/getting-around-same-origin-policy-in-javascript-without-server-side-scripts">same来源政策。

I found some examples of fetching and parsing articles using the Wikipedia API, but they tend to be server side. I'm looking for a solution that runs entirely from the client and doesn't get scuppered by same origin policy.

请注意随机乱码是不够的;我需要人类可读的句子有意义的。

Note random gibberish is not sufficient; I need human-readable sentences that make sense.

推荐答案

我的答案是建立在技术<一href="http://stackoverflow.com/questions/2374377/query-wikipedias-api-using-ajax-xmlhtt$p$pquest/2375948#2375948">suggested这里。

My answer builds on the technique suggested here.

棘手的部分是制定正确的查询字符串:

The tricky part is formulating the correct query string:

<一个href="http://en.wikipedia.org/w/api.php?action=query&generator=random&prop=extracts&exchars=500&format=json&callback=onWikipedia">http://en.wikipedia.org/w/api.php?action=query&generator=random&prop=extracts&exchars=500&format=json&callback=onWikipedia

  • 发电机=随机选择一个随机页
  • 道具=提取物 exchars = 500 检索500字符提取
  • 格式= JSON 返回JSON格式的数据
  • 回调= 会导致数据被封装在一个函数调用,因此它可以像任何其他&LT进行处理;脚本&GT; 并注入到您的网页(见 JSONP ),从而绕过跨域障碍。
  • 请求ID 可以选择每次添加,用一个新值,以避免浏览器缓存(在IE9要求)陈旧结果

  • generator=random selects a random page
  • prop=extracts and exchars=500 retrieves a 500 character extract
  • format=json returns JSON-formatted data
  • callback= causes that data to be wrapped in a function call so it can be treated like any other <script> and injected into your page (see JSONP), thus bypassing cross-domain barriers.
  • requestid can optionally be added, with a new value each time, to avoid stale results from the browser cache (required in IE9)
  • 查询所提供的网页的东西,看起来像这样(我加空格以提高可读性):

    The page served by the query is something that looks like this (I've added whitespace for readability):

    onWikipedia(
      {"query":
        {"pages":
          {"12362520":
            {"pageid":12362520,
             "ns":0,
             "title":"Power Building",
             "extract":"<p>The <b>Power Building<\/b> is a historic commercial building in
                        the downtown of Cincinnati, Ohio, United States. Built in 1903, it
                        was designed by Harry Hake. It was listed on the National Register
                        of Historic Places on March 5, 1999. One week later, a group of
                        buildings in the northeastern section of downtown was named a
                        historic district, the Cincinnati East Manufacturing and Warehouse
                        District; the Power Building is one of the district's contributing
                        properties.<\/p>\n<h2> Notes<\/h2>"
      } } } }
    )
    

    当然,你每次都会获得不同的物品。

    Of course you'll get a different article each time.

    下面是一个完整的,工作的例子,你可以 尝试 在JSBin。

    Here's a full, working example which you can try out on JSBin.

    <HTML><BODY>
    
      <p><textarea id="textbox" style="width:350px; height:150px"></textarea></p>
      <p><button type="button" id="button" onclick="startFetch(100, 500)">
        Fetch random Wikipedia extract</button></p>
    
      <script type="text/javascript">
    
        var textbox = document.getElementById("textbox");
        var button = document.getElementById("button");
        var tempscript = null, minchars, maxchars, attempts;
    
        function startFetch(minimumCharacters, maximumCharacters, isRetry) {
          if (tempscript) return; // a fetch is already in progress
          if (!isRetry) {
            attempts = 0;
            minchars = minimumCharacters; // save params in case retry needed
            maxchars = maximumCharacters;
            button.disabled = true;
            button.style.cursor = "wait";
          }
          tempscript = document.createElement("script");
          tempscript.type = "text/javascript";
          tempscript.id = "tempscript";
          tempscript.src = "http://en.wikipedia.org/w/api.php"
            + "?action=query&generator=random&prop=extracts"
            + "&exchars="+maxchars+"&format=json&callback=onFetchComplete&requestid="
            + Math.floor(Math.random()*999999).toString();
          document.body.appendChild(tempscript);
          // onFetchComplete invoked when finished
        }
    
        function onFetchComplete(data) {
          document.body.removeChild(tempscript);
          tempscript = null
          var s = getFirstProp(data.query.pages).extract;
          s = htmlDecode(stripTags(s));
          if (s.length > minchars || attempts++ > 5) {
            textbox.value = s;
            button.disabled = false;
            button.style.cursor = "auto";
          } else {
            startFetch(0, 0, true); // retry
          }
        }
    
        function getFirstProp(obj) {
          for (var i in obj) return obj[i];
        }
    
        // This next bit borrowed from Prototype / hacked together
        // You may want to replace with something more robust
        function stripTags(s) {
          return s.replace(/<\w+(\s+("[^"]*"|'[^']*'|[^>])+)?>|<\/\w+>/gi, "");
        }
        function htmlDecode(input){
          var e = document.createElement("div");
          e.innerHTML = input;
          return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
        }
    
      </script>
    
    </BODY></HTML>
    

    发生器的一个缺点=随机就是你经常会得到不属于实战篇讨论页或生成的内容。如果任何人都可以提高查询字符串来限制到高质量的文章,这将是伟大的!

    One downside of generator=random is you often get talk pages or generated content that are not actual articles. If anyone can improve the query string to limit it to quality articles, that would be great!

    这篇关于从维基百科获取随机摘录(Javascript中,仅客户端)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆