如何从维基百科 API 获取干净的 json [英] How to get clean json from wikipedia API
问题描述
我想从维基百科页面获取结果https://en.wikipedia.org/wiki/February_2 作为 JSON.
I want to get the result from a wikipedia page https://en.wikipedia.org/wiki/February_2 as JSON.
我尝试使用他们的 API:https://en.wikipedia.org/w/api.php?action=parse&page=February_19&prop=text&formatversion=2&format=json
I tried using their API: https://en.wikipedia.org/w/api.php?action=parse&page=February_19&prop=text&formatversion=2&format=json
虽然它以 Json 格式给出.内容是 HTML.我只想要内容.
Though it is giving it as Json format. The content is HTML. I want only the content.
我需要一种方法来获得干净的结果.
I need a way to get clean result.
推荐答案
如果你想要没有标记的纯文本,你必须首先解析 JSON 对象,然后从 HTML 代码中提取文本:
If you want plain text without markup, you have first to parse the JSON object and then extract the text from the HTML code:
function htmlToText(html) {
let tempDiv = document.createElement("div");
tempDiv.innerHTML = html;
return tempDiv.textContent || tempDiv.innerText || "";
}
const url = 'https://en.wikipedia.org/w/api.php?action=parse&page=February_19&prop=text&format=json&formatversion=2&origin=*';
$.getJSON(url, function(data) {
const html = data['parse']['text'];
const plainText = htmlToText(html);
const array = [...plainText.matchAll(/^\d{4} *–.*/gm)].map(x=>x[0]);
console.log(array);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
更新:我根据下面的评论编辑了上面的代码.现在该函数提取所有列表项并将它们放入一个数组中.
Update: I edited the code above according to the comment below. Now the function extracts all the list items putting them into an array.
这篇关于如何从维基百科 API 获取干净的 json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!