任何人都可以分享使用Mathematica和Google Scholar提取学术研究信息的简单示例 [英] Can anybody share a simple example of using Mathematica and Google scholar to extract academic research information

查看:121
本文介绍了任何人都可以分享使用Mathematica和Google Scholar提取学术研究信息的简单示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用Mathematica和Google学术搜索来查找一个人在2011年发表的论文数量?

How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?

推荐答案

由于没有正式的API AFAIK,Google Scholar不太适合该目标.它还不提供结构化(例如XML)格式的结果.因此,我们必须诉诸快速(而且非常脆弱!)的文本模式匹配技巧,例如:

Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:

 searchGoogleScholarAuthor[author_String] := 
 First[StringCases[
   Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> 
     StringDrop[
      StringJoin @@ ("author:" <> # <> "+" & /@ 
         StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ 
     "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ 
     p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ 
     "(" ~~ ___ :> p]]

In[191]:= searchGoogleScholarAuthor["A Einstein"]

Out[191]= "6,400"

In[190]:= searchGoogleScholarAuthor["Einstein"]

Out[190]= "9,400"

In[192]:= searchGoogleScholarAuthor["Wizard"]

Out[192]= "197"

In[193]:= searchGoogleScholarAuthor["Vries"]

Out[193]= "70,700"

如果您不喜欢字符串结果,请添加ToExpression.如果要限制发布年份,可以在搜索字符串中添加&as_ylo=2011&as_yhi=2011&并更改开始和结束年份 适当地.

Add ToExpression if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011& to the search string and change the start and end years appropriately.

请注意,具有流行名称的作者将产生大量虚假点击,因为无法唯一标识单个作者.此外,Scholar还返回了多种热门内容,包括引文,书籍,再版等.因此,实际上,这对于计数不是很有用.

Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.

一些解释:

Scholar将作者和合著者的姓名缩写和名字分成多个author:字段,并加上+.代码的StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1]部分可以解决这个问题. StringDrop删除最后一个+.

Scholar splits the initials and names of authors and co-authors over several author: fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] part of the code takes care of that. The StringDrop removes the last +.

Stringcases部分包含一个大文本模式,该模式主要搜索Scholar放置在每个结果页面顶部的文本,并包含命中数.然后隔离此号码并返回.

The Stringcases part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.

这篇关于任何人都可以分享使用Mathematica和Google Scholar提取学术研究信息的简单示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆