如何使用API从Google文档中提取标题 [英] How to pull headings from Google document using API

查看：61 发布时间：2021/5/12 20:08:10 python google-docs google-docs-api

本文介绍了如何使用API从Google文档中提取标题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前正在尝试创建一个python脚本，该脚本将检查google文档中的各种SEO页面指标.

Currently trying to create a python script that will check a google document for various SEO onpage metrics.

google docs API的好示例显示了如何从Google文档中提取所有文本.但是，这只会返回不带格式的纯文本.

The google docs API has a good sample showing how to extract ALL the text from a google document. However, this simply returns plain text with no formatting.

要执行检查，我需要将H1，H2-H4，粗体文本等拆分出来，但是经过两个小时的玩耍/在API文档/网络中进行搜索后，我不知道该如何做.编辑以下循环以获取(例如)所有HEADING_2元素.

To perform my checks I need to be able to split out the H1, H2-H4, text in bold etc but after two hours of playing around/searching around the API docs/web, I can't figure out how to edit the following loop to be able to get (for example) all the HEADING_2 elements.

    text = ''
    for value in elements:
        if 'paragraph' in value:
            elements = value.get('paragraph').get('elements')
            for elem in elements:
                text += read_paragraph_element(elem)
        elif 'table' in value:
            # The text in table cells are in nested Structural Elements and tables may be
            # nested.
            table = value.get('table')
            for row in table.get('tableRows'):
                cells = row.get('tableCells')
                for cell in cells:
                    text += read_strucutural_elements(cell.get('content'))
        elif 'tableOfContents' in value:
            # The text in the TOC is also in a Structural Element.
            toc = value.get('tableOfContents')
            text += read_strucutural_elements(toc.get('content'))
    return text

任何帮助表示赞赏.谢谢.

Any help appreciated. Thanks.

推荐答案

我相信您的目标和当前情况如下.

I believe your goal and your current situation as follows.

您要检索段落样式的 HEADING_2 的文本.
您要使用适用于python的googleapis实现此目标.
您想使用问题中的脚本实现目标.
您已经使用Docs API从Google文档中获取了值.

在这种情况下，我认为当 namedStyleType 的值为 HEADING_2 时，需要检索文本.

In this case, I thought that when the value of namedStyleType is HEADING_2, the text is required to be retrieved.

当这一点反映到您的脚本中时，它如下所示.

When this point is reflected to your script, it becomes as follows.

for value in elements:
    if 'paragraph' in value:
        elements = value.get('paragraph').get('elements')

至:

for value in elements:
    if 'paragraph' in value and value['paragraph']['paragraphStyle']['namedStyleType'] == 'HEADING_2':  # Modified
        elements = value.get('paragraph').get('elements')

参考:

NamedStyleType

这篇关于如何使用API从Google文档中提取标题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用API从Google文档中提取标题 [英] How to pull headings from Google document using API

问题描述

推荐答案

参考:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用API​​从Google文档中提取标题 [英] How to pull headings from Google document using API

问题描述

推荐答案

参考:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何使用API从Google文档中提取标题 [英] How to pull headings from Google document using API

登录关闭