如何使用 IMPORTHTML 和/或 IMPORTXML 导入混合了文本和图像元数据的表格? [英] How to import table with mix of text and image metadata with IMPORTHTML and/or IMPORTXML?

查看:28
本文介绍了如何使用 IMPORTHTML 和/或 IMPORTXML 导入混合了文本和图像元数据的表格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 IMPORTHTML 和/或 IMPORTXML 函数将混合了文本和图像的表格导入到 Google 表格中.

我尝试导入的表是来自多个站点的Advancement"部分下的Equipment"表,如下所示:

Star.Treck.Sheet>

首先我们用 IMPORTHTML 得到表格的结构.然后使用 XPath,我们获得每个带星号(即等级)的人的 id、成员名称和级别.然后我们得到每个人的 id 和成员名称(带星号和不带星号).我们通过 VLOOKUP 来构建级别表(参见 join.levels).无星>>基本".我们获取网址.最后,我们使用 CONCAT(用于保护连接的 ID+名称)构建最终表.

I'm trying to import tables with a mixture of text and images into Google Sheets with IMPORTHTML and/or IMPORTXML function.

The tables I'm trying to import are the 'Equipment' tables under the 'Advancement' section from multiple sites like this: https://stt.wiki/wiki/Xindi_%27Prisoner%27_Archer.

The number of stars at each item in the table represents a "level" from 1 ("Common") to 5 ("Legendary"), with no stars representing level 0 ("Basic"). The image metadata contains the level description. Example for "Legendary" level:

<img alt="Legendary" src="/w/images/thumb/b/b5/StarItem.png/15px-StarItem.png" title="Legendary" width="15" height="15" style="vertical-align: sub" srcset="/w/images/thumb/b/b5/StarItem.png/23px-StarItem.png 1.5x, /w/images/thumb/b/b5/StarItem.png/30px-StarItem.png 2x">

My problem is to include the level information in the import, either as images or as image metadata.

My ultimate goal is a table like this (created manually):

(columns E and I with URLs are optional).


IMPORTHTML:

First I tried to import with IMPORTHTML, cell A1 contains the URL (see above) (please note that I have to use semicolon in formulas due to local settings):

=IMPORTHTML(A1; "table"; 4)

This gives me this table:

Unfortunately, the "stars" from the original table are not imported.

1) So the first question is: Is there a way to include the images from a table with IMPORTHTML method? Or alternatively metadata from the images?


IMPORTXML:

I then tried to use IMPORTXML to get just the missing level data:

=IMPORTXML(A1; "//*[@id='mw-content-text']/div/table[3]/tbody/tr/td/span/img[1]/@alt").

The IMPORTHTML gave me 40 items in total, but with this IMPORTXML I only get 37 values for item levels. This is because with my IMPORTXML method I don't get information on the "Basic" items, that is the items without stars.

So now I have a list of 37 levels and a table with 40 items, but no logical connection between them. The list of levels would need entries (could be blank cells) for the basic items at the correct positions in the list to make the assignmant between items and levels possible.

2) So my second question is: For the IMPORTXML method, is there any way to get a result with the same number of cells in Google sheets as in the original table, even when for some cells of the original table the XPATH doesn't match? In this case the import could give an empty cell instead. In the example this would give me a list of 40 cells, 3 of which would be empty.


Other solutions with Google Sheets are welcome, too.

解决方案

XPath solution (6 are used, check yellow cells) :

Star.Treck.Sheet

First we get the structure of the table with IMPORTHTML. Then with XPath, we get the ids, members names and levels of everyone with a star (i.e a rank). Then we get the ids and members names of everyone (with and without a star). We VLOOKUP to build the levels table (see join.levels). No star >> "Basic". We fetch the urls. Finally, we build our final table with CONCAT (ids+names to secure the join).

这篇关于如何使用 IMPORTHTML 和/或 IMPORTXML 导入混合了文本和图像元数据的表格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆