如何导入带有IMPORTHTML和/或IMPORTXML的文本和图像元数据混合的表? [英] How to import table with mix of text and image metadata with IMPORTHTML and/or IMPORTXML?

查看：99 发布时间：2020/6/17 18:52:05 xpath web-scraping google-sheets array-formulas google-sheets-importxml

本文介绍了如何导入带有IMPORTHTML和/或IMPORTXML的文本和图像元数据混合的表?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用IMPORTHTML和/或IMPORTXML函数将包含文本和图像的表格导入Google表格.

我要导入的表是来自多个站点的高级"部分下的设备"表，例如: https://stt.wiki/wiki/Xindi_%27Prisoner%27_Archer .

表中每个项目的星星数量代表从1(通用")到5(传奇")的等级"，没有星星代表等级0(基本").图像元数据包含级别描述. 传奇"级别的示例:

<img alt="Legendary" src="/w/images/thumb/b/b5/StarItem.png/15px-StarItem.png" title="Legendary" width="15" height="15" style="vertical-align: sub" srcset="/w/images/thumb/b/b5/StarItem.png/23px-StarItem.png 1.5x, /w/images/thumb/b/b5/StarItem.png/30px-StarItem.png 2x">

我的问题是将级别信息作为图像或图像元数据包含在导入中.

我的最终目标是创建一个这样的表(手动创建):

(带有URL的E和I列是可选的).

IMPORTHTML:

首先，我尝试使用IMPORTHTML导入，单元格A1包含URL(请参见上文)(请注意，由于本地设置，我必须在公式中使用分号):

=IMPORTHTML(A1; "table"; 4)

这给了我这张桌子:

不幸的是，原始表中的星星"没有导入.

1)第一个问题是:是否可以使用IMPORTHTML方法包含表中的图像?还是图片中的元数据?

IMPORTXML:

然后我尝试使用IMPORTXML来仅获取缺少的级别数据:

=IMPORTXML(A1; "//*[@id='mw-content-text']/div/table[3]/tbody/tr/td/span/img[1]/@alt").

IMPORTHTML总共给了我40个项目，但是使用此IMPORTXML我只能得到37个项目级别的值.这是因为使用IMPORTXML方法无法获得有关基本"项目的信息，即没有星号的项目.

所以现在我有一个37个级别的列表和一个包含40个项目的表，但是它们之间没有逻辑联系.级别列表需要在列表中的正确位置输入基本项目的条目(可以是空白单元格)，以使项目和级别之间的分配成为可能.

2)因此，我的第二个问题是:对于IMPORTXML方法，有什么方法可以使Google表格中的单元格数量与原始表中的单元格数量相同，即使对于原始表中的某些单元格而言，XPATH不匹配?在这种情况下，导入可以改为提供一个空单元格. 在示例中，这将为我提供40个单元格的列表，其中3个为空.

也欢迎使用Google表格的其他解决方案.

解决方案

XPath解决方案(已使用6，请检查黄色单元格):

Star.Treck.Sheet

首先，我们使用IMPORTHTML获得表的结构.然后使用XPath，我们获得每个带有星号(即等级)的人的ID，成员名称和等级.然后，我们获得每个人的ID和成员名称(带或不带星号).我们通过VLOOKUP来构建级别表(请参见join.levels).没有星星>>基本".我们获取网址.最后，我们使用CONCAT(标识+名称以确保连接的安全性)构建最终表.

I'm trying to import tables with a mixture of text and images into Google Sheets with IMPORTHTML and/or IMPORTXML function.

The tables I'm trying to import are the 'Equipment' tables under the 'Advancement' section from multiple sites like this: https://stt.wiki/wiki/Xindi_%27Prisoner%27_Archer.

The number of stars at each item in the table represents a "level" from 1 ("Common") to 5 ("Legendary"), with no stars representing level 0 ("Basic"). The image metadata contains the level description. Example for "Legendary" level:

<img alt="Legendary" src="/w/images/thumb/b/b5/StarItem.png/15px-StarItem.png" title="Legendary" width="15" height="15" style="vertical-align: sub" srcset="/w/images/thumb/b/b5/StarItem.png/23px-StarItem.png 1.5x, /w/images/thumb/b/b5/StarItem.png/30px-StarItem.png 2x">

My problem is to include the level information in the import, either as images or as image metadata.

My ultimate goal is a table like this (created manually):

(columns E and I with URLs are optional).

IMPORTHTML:

First I tried to import with IMPORTHTML, cell A1 contains the URL (see above) (please note that I have to use semicolon in formulas due to local settings):

=IMPORTHTML(A1; "table"; 4)

This gives me this table:

Unfortunately, the "stars" from the original table are not imported.

1) So the first question is: Is there a way to include the images from a table with IMPORTHTML method? Or alternatively metadata from the images?

IMPORTXML:

I then tried to use IMPORTXML to get just the missing level data:

=IMPORTXML(A1; "//*[@id='mw-content-text']/div/table[3]/tbody/tr/td/span/img[1]/@alt").

The IMPORTHTML gave me 40 items in total, but with this IMPORTXML I only get 37 values for item levels. This is because with my IMPORTXML method I don't get information on the "Basic" items, that is the items without stars.

So now I have a list of 37 levels and a table with 40 items, but no logical connection between them. The list of levels would need entries (could be blank cells) for the basic items at the correct positions in the list to make the assignmant between items and levels possible.

2) So my second question is: For the IMPORTXML method, is there any way to get a result with the same number of cells in Google sheets as in the original table, even when for some cells of the original table the XPATH doesn't match? In this case the import could give an empty cell instead. In the example this would give me a list of 40 cells, 3 of which would be empty.

Other solutions with Google Sheets are welcome, too.

解决方案

XPath solution (6 are used, check yellow cells) :

Star.Treck.Sheet

First we get the structure of the table with IMPORTHTML. Then with XPath, we get the ids, members names and levels of everyone with a star (i.e a rank). Then we get the ids and members names of everyone (with and without a star). We VLOOKUP to build the levels table (see join.levels). No star >> "Basic". We fetch the urls. Finally, we build our final table with CONCAT (ids+names to secure the join).

这篇关于如何导入带有IMPORTHTML和/或IMPORTXML的文本和图像元数据混合的表?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何导入带有IMPORTHTML和/或IMPORTXML的文本和图像元数据混合的表? [英] How to import table with mix of text and image metadata with IMPORTHTML and/or IMPORTXML?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何导入带有IMPORTHTML和/或IMPORTXML的文本和图像元数据混合的表? [英] How to import table with mix of text and image metadata with IMPORTHTML and/or IMPORTXML?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭