如何使用 XPath 选择非空段落? [英] How to use XPath to select non-empty paragraph?

查看：28 发布时间：2021/7/16 22:08:29 html xml xpath scrapy

本文介绍了如何使用 XPath 选择非空段落?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要抓取的网页具有相似的结构.每个人都有一个作为问题的段落和一个作为答案的段落.我想抓取每个问题和答案并将它们存储在两个项目中

The webpages I want to scrape have similar structures. Each has a paragraph which is a question and a paragraph which is an answer. I want to scrape each question and answer and store them in two items

问题是在某些页面上，问题和答案分别是//xxx/p[1]和//xxx/p[2]，但在其他页面上，//xxx/p[1] 是一个没有任何文本的空段落，作为一个额外的空间.对于这些页面，//xxx/p[1] 不会给我想要的.

The problem is that on some pages, the question and the answer are respectively //xxx/p[1] and //xxx/p[2], but on other pages, the //xxx/p[1] is an empty paragraph without any text, which serves as an extra space. For these pages, //xxx/p[1] won't give me what I desire.

那么有没有一种XPath表达式可以选择一个节点下的非空段落?

So is there an XPath expression that can select non-empty paragraphs under one node?

推荐答案

如果根本没有文字，你可以使用

If there's no text at all, you can use

//p[.//text()]

选择带有文本的段落.如果空"段落包含空格(例如换行符)，则必须先规范化空格:

to select paragraphs with text. If the "empty" paragraphs contain whitespace (e.g. newlines), you have to normalize the whitespace first:

//p[normalize-space(.//text())]

可以缩短为

//p[normalize-space()]

这篇关于如何使用 XPath 选择非空段落?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 XPath 选择非空段落? [英] How to use XPath to select non-empty paragraph?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用 XPath 选择非空段落? [英] How to use XPath to select non-empty paragraph?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭