使用谷歌电子表格刮擦Instagram数据？ [英] Scraping Instagram data using google spreadsheet?

查看：214 发布时间：2018/5/14 21:23:10 google-spreadsheet instagram

本文介绍了使用谷歌电子表格刮擦Instagram数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用谷歌电子表格这样的数据，例如生物，以及公众Instagram帐户中的帖子数量。我能够提取一些追随者和追随者。您可以提供帮助吗？ 解决方案

这个公式看起来非常复杂，但实际上全是这样 - 是一个importxml公式来自脚本部分的数据有你想要的部分...然后使用一堆regexreplace / extract函数我将数据清理成可读格式：

以此公开页面为例：

还要注意追随者，follow_by和媒体：count是你提到的字段（例如更新：在回答你的评论 - 如果你想获得其他2个值你可以在一个像这样的regexextract函数中完成它：

如果使用原始导入数据，这些正则表达式可以工作：

媒体计数：

= REGEXEXTRACT（concatenate（IMPORTDATA（E1）），media：{ count：（\d +）page_info：{）
传记：
= REGEXEXTRACT（concatenate（IMPORTDATA（E1）），biography：（。*）full_name）
如果您使用importxml方法，这些方法可行：
= REGEXEXTRACT（A1，biography：（。*），。*media：{count：（\d +），page_info ）
这会创建2个捕获组，它们会自动将它们放入它们自己的相邻单元中，或者您可以单独执行它们： p> 和for传记： = REGEXEXTRACT（A1，biography：（。*），。* 媒体）媒体计数： = REGEXEXTRACT（A1，media：{count：（\d +），page_info） I need data like bio, and number of posts from public Instagram account using google spreadsheet. I'm able to extract number of followers and following. Can you help ? 解决方案 This formula is going to look really complicated but really all it is - is an importxml formula to pull in the data from the "script" section which has the pieces you want... then using a bunch of regexreplace/extract functions I clean up the data into a readable format: take this public page for example: http://www.instagram.com/salesforce/ Then in B1 or C1 enter this: =iferror(arrayformula(regexreplace({arrayformula(regexextract(transpose(split(regexreplace(regexreplace(concatenate(IMPORTXML(Sheet2!A1,"//script")),"\n",""),"(^.*""ProfilePage"": \[{""user"": {""username"": "")(.*)(nodes.*)","$2"),", """,false)),"(^.*)"": .*")),arrayformula(regexextract(transpose(split(regexreplace(regexreplace(concatenate(IMPORTXML(Sheet2!A1,"//script")),"\n",""),"(^.*""ProfilePage"": \[{""user"": {""username"": "")(.*)(nodes.*)","$2"),", """,false)),"^.*"": (.*)"))},"[""}{]",""))) I ended up using a literal array so that I could effectively split the field names from the values , obviously you can format however you really want , but see the image here that demonstrates the fields it pulls: ALso note that the followers, followed_by, and media: count are the fields your mentioned (e.g. # of posts is called media count) and then the biography of course is self explanatory Update: In answer to your comment - if you want to get the other 2 values out you can do it either in a single regexextract function like this: If you using the raw import data these regexes work: Media count: =REGEXEXTRACT(concatenate(IMPORTDATA(E1)),"""media: {""count"": (\d+)page_info: {") Biography: =REGEXEXTRACT(concatenate(IMPORTDATA(E1)),"biography: ""(.*)""full_name") If your using the importxml method these work: =REGEXEXTRACT(A1,"biography"": ""(.*)"", "".*""media"": {""count"": (\d+), ""page_info""") That creates 2 capture groups which automatically puts them into their own adjacent cells, or you can do them individually which is: and for biography: =REGEXEXTRACT(A1,"biography"": ""(.*)"", "".*""media") media count: =REGEXEXTRACT(A1,"media"": {""count"": (\d+), ""page_info""") 这篇关于使用谷歌电子表格刮擦Instagram数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用谷歌电子表格刮擦Instagram数据？ [英] Scraping Instagram data using google spreadsheet?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用谷歌电子表格刮擦Instagram数据？ [英] Scraping Instagram data using google spreadsheet?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭