维基百科的站点摘要提要,不包括单个用户 [英] Site summary feed of Wikipedia excluding a single user

查看:30
本文介绍了维基百科的站点摘要提要,不包括单个用户的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个最近的变化"维基百科主页上提供的提要.

There is a "Recent changes" feed available on the Wikipedia homepage.

同样也可用作 ATOM 提要.也可以通过转到他们的 user_account 并选择提要来观看单个用户.但是,除了一(或两个)用户之外,有没有办法访问提要?

The same is also available as an ATOM feed. It is also possible to watch a single user by going to their user_account and selecting the feed. But is there any way to get to the feed excluding one (or two) users?

更新:使用 xmllint 我可以提取作者姓名.

Update: Using xmllint I can extract the author names.

wget https://hunspell.s3.amazonaws.com/temp/out.txt

xmllint --xpath "//*[name() = 'feed']/*[name() = 'entry']/*[name() = 'author']/*[name() = 'name']" out.txt

但我想从这个提要中排除一两个作者.例如,Clarityfiend 和 Shortride.

But I want to exclude one or two authors from this feed. For example, Clarityfiend and Shortride.

更新:

当我尝试 xpath 命令时,它使用一个参数(英文)运行良好.但它以 Unicode 参数失败:

When I tried xpath command, it worked very well with one parameter (english). But it failed with a Unicode parameter:

wget https://hunspell.s3.amazonaws.com/todel/out.txt

工作:

xpath -e "/feed/entry[author/name!='Aditya tamhankar' and author/name!='Sushant Madhale']" out.txt > a.txt

没有用:

xpath -e "/feed/entry[author/name!='Aditya tamhankar' and author/name!='संतोष गोरे']"  out.txt > filtered.txt

第二作者的条目仍然存在于过滤输出中.

The entry by the second author is still there in filtered output.

grep 'संतोष गोरे' filtered.txt


第二个命令对 Unicode 没问题,但它没有正确显示一条记录...


The second command is OK with Unicode, but it does not display one record correctly...

# (t1='Aditya tamhankar' ; t2='संतोष गोरे'; echo 'setns x=http://www.w3.org/2005/Atom'; echo "cat /x:feed/x:entry[not(x:author/x:name[.='$t1'] | x:author/x:name[.='$t2'])]/descendant::*[self::x:updated or self::x:title or descendant-or-self::x:name]/text()") | xmllint --shell out.txt  | tail -n +4 | gawk '{ if(NR % 6 == 0){ print $0 "¬"} else { print $0 }}' |gawk 'BEGIN{FS="\n -------\n" ; RS="\n -------¬\n"; OFS="||"} { print $2,$1,$3 }END{ print FNR}'

除此之外的所有记录都是正确的:

All records except this one are correct:

152.238.27.63
/ >
||2021-07-15T20:14:03Z||
19

推荐答案

我建议你使用终端中的 xpath 工具(Ubuntu 包 libxml-xpath-perl).它支持 XPath 2:

I suggest that you use xpath tool from your terminal (Ubuntu package libxml-xpath-perl). It supports XPath 2:

wget -O - https://hunspell.s3.amazonaws.com/temp/out.txt | xpath -e "/feed/entry[author/name!='Clarityfiend' and author/name!='Shortride']" > filtered.txt

UPD:如果输入缓冲区出现内存不足错误,请将提要下载到文件而不是标准输出中:

UPD: If there is an out of memory error for input buffer, download the feed into a file rather than standard output:

wget https://hunspell.s3.amazonaws.com/temp/out.txt
xpath -e "/feed/entry[author/name!='Clarityfiend' and author/name!='Shortride']" out.txt > filtered.txt

XPath 查询将列出作者姓名不等于 ClarityfiendShortride 的所有条目.条目将保存在 filtered.txt 中.

The XPath query will list all entries with author's name not equal to Clarityfiend or Shortride. The entries will be saved in filtered.txt.

这篇关于维基百科的站点摘要提要,不包括单个用户的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆