强制xpath返回字符串lxml [英] force xpath to return a string lxml

查看:61
本文介绍了强制xpath返回字符串lxml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 lxml 并且我有一个来自 Google Scholar 的废弃页面.以下是一个最小的工作示例以及我尝试过的事情.

I am using lxml and I have a scrapped page from Google Scholar. Following is a minimal working example and things I have tried.

In [56]: seed = "https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:machine_learning"

In [60]: page = urllib2.urlopen(seed).read()

In [63]: tree = html.fromstring(page)

In [64]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[1]'

In [65]: tree.xpath(xpath)
#first element returns as list
Out[65]: ["window.location='/citations?view_op\\x3dsearch_authors\\x26hl\\x3den\\x26oe\\x3dASCII\\x26mauthors\\x3dlabel:machine_learning\\x26after_author\\x3dVCoCALPY_v8J\\x26astart\\x3d10'"]         

In [66]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[2]'

#there is no second element
In [67]: tree.xpath(xpath)
Out[67]: []     

In [70]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)'

#The list contains only one element
In [71]: tree.xpath(xpath)
Out[71]: ["window.location='/citations?view_op\\x3dsearch_authors\\x26hl\\x3den\\x26oe\\x3dASCII\\x26mauthors\\x3dlabel:machine_learning\\x26after_author\\x3dVCoCALPY_v8J\\x26astart\\x3d10'"]         

根据文档此处,返回值可以是智能字符串,但是我无法从中获取字符串输出xpath函数.如何编写xpath以便从xpath获取字符串输出

As per documentation here, return values can be smart strings, but I cannot get a string output from xpath function. How can I write the xpath so that I get a string output from xpath

推荐答案

您可以使用XPath表达式 string(/html/body/div [1]/div [4]/div [2]/div/span/button [2]/@ onclick),在这种情况下,您会得到一个简单的字符串值.

You can use an XPath expression string(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick), in that case you get a simple string value.

这篇关于强制xpath返回字符串lxml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆