< DIV> < div>内的标签在Google Spreadsheet中使用importXML Xpath查询 [英] <div> tags inside <div> using importXML Xpath query, in Google Spreadsheet

查看:128
本文介绍了< DIV> < div>内的标签在Google Spreadsheet中使用importXML Xpath查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Google文档中使用Xpath获取< div> 中的文字。
我想将文本保存在Google文档电子表格的一个单元格内的< div id =job_description> 中,但它显示了每个<$ c $

 < div id =job_description>< div>  ; 
< div>
< strong>
基本目的:
< / strong>
< br>< / br>
< / div>
< div>
与开发人员,产品所有者和Q ...密切合作
< br>< / br>
< / div>
< div>
测试分析师对开发人员负责...
< br>< / br>
< / div>
< div>
< strong>
职责和责任:
< / strong>
< / div>
< ul>
< li>< / li>
< li>< / li>
< / ul>
< div>
< strong>
要求:
< / strong>
< br>< / br>
< / div>
< ul>
< li>< / li>
< li>< / li>
< / ul>
< / div>

图片:
http://i.stack.imgur.com/K0mAY.png



这就是代码我写道:

  = IMPORTXML(E4,// div [@ id ='job_description'])

可以帮我把所有文字(包括< div> < div id =job_description>中的code> < ul> ...)< / code >只有一个单元格?

解决方案

使用JOIN是一个很好的开始,但是您可以将它作为单个操作。 >

您没有显示要导入的网页的网址,因此我只能给出另一个网页的示例。例如,如果您正在导入www.w3.org并寻找 div ,其中 @ class ='event closed expand_block'

  = JOIN(CHAR(10),IMPORTXML(http://www.w3.org/ ,// div [@ class ='event closed expand_block'] // text()))



<请注意,我还修改了XPath表达式: // text()确保只有后代文本节点被检索到,即所有文本。






编辑:回应您的评论: $ b


我可以知道CHAR(10)指的是什么?


当然可以。 CHAR 返回一个字符并将一个数字作为输入。在 CHAR(10)的情况下,返回一个换行符(我假定是因为&#10; )。



在公式中, CHAR(10)用作第一个参数 JOIN ,它是要连接的对象的分隔符


I'm using Xpath in Google docs to get the text inside <div>. I want to save the text inside <div id="job_description"> in one cell of Google doc spreadsheet, but it shows each <div> in separate cell.

<div id="job_description">
    <div>
        <strong>
            Basic Purpose:
        </strong>
        <br></br>
    </div>
    <div>
        Work closely with developers, product owners and Q…
        <br></br>
    </div>
    <div>
        The Test Analyst is accountable for the developmen…
        <br></br>
    </div>
    <div>
        <strong>
            Duties and Responsibilities:
        </strong>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
    <div>
        <strong>
            Requirements:
        </strong>
        <br></br>
    </div>
    <ul>
        <li></li>
        <li></li>
    </ul>
</div>

Image: http://i.stack.imgur.com/K0mAY.png

and this is the code I wrote:

=IMPORTXML(E4,"//div[@id='job_description']")

May you help me to put all of the text (including <div> <ul> ...) inside the <div id="job_description"> in only one cell ?

解决方案

Using JOIN is a good start, but you can make it a single operation.

You did not show the URL to the page you're importing, so I can only give you an example with another page. For instance, if you are importing www.w3.org and looking for a div where @class='event closed expand_block', use

=JOIN(CHAR(10),IMPORTXML("http://www.w3.org/","//div[@class='event closed expand_block']//text()"))

Notice that I also modified the XPath expression: //text() makes sure only descendant text nodes are retrieved, that is, all the text.


EDIT: Responding to your comment:

May I know what is CHAR(10) referring to?

Yes, of course. CHAR returns a character and takes a number as input. In the case of CHAR(10), a newline character is returned (I assume because of &#10;).

In the formula, CHAR(10) is used as the first argument of JOIN, which is the delimiter of the objects that are to be joined.

这篇关于&LT; DIV&GT; &lt; div&gt;内的标签在Google Spreadsheet中使用importXML Xpath查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆