使用XMLHTTP方法时,请等待页面加载 [英] Wait until page is loaded when using XMLHTTP approach
问题描述
在以下工作代码中,我尝试导航到特定的YouTube频道要将视频名称导入excel ..它正在工作,但部分是因为代码仅列出了大约30个视频
昏暗x,html作为对象,ele作为对象,sKeyWords作为字符串,i长使用CreateObject("MSXML2.ServerXMLHTTP").打开"GET","YouTube频道网址视频","False".setRequestHeader"Content-Type","application/x-www-form-urlencoded".发送如果.Status<>200然后MsgBox问题"和vbNewLine&状态和-"&.statusText:退出子设置html = CreateObject("htmlfile")html.body.innerHTML = .responseText
我如何才能使代码加载页面..的所有内容?以便获取此处列出的所有视频.
我找到了一个网站,该网站列出了一个表中的所有视频,但是对于抓取该表的部分,我无法提取视频名称甚至无法处理该表这是我的尝试
Sub Post_Method()昏暗的HTTP作为新的XMLHTTP60变暗html作为新的HTMLDocument昏暗的htmla作为对象暗淡作为对象作为对象的暗淡Dim strArg作为字符串昏暗的c只要昏暗x长strArg ="inputType = 1& stringInput = https%3A%2F%2Fwww.youtube.com%2Fchannel%2FUC43lrLHl4EhxaKQn2HrcJPQ& limit = 100& keyType = default& customKey ="用http.打开"POST","https://youtube-playlist-analyzer.appspot.com/submit",False.setRequestHeader内容类型",应用程序/x-www-form-urlencoded".send strArghtml.body.innerHTML = .responseText'WriteTxtFile html.body.innerHTML结束于昏暗的帖子作为对象,元素作为对象,r尽可能长``这部分我无法调整'------------------------设置帖子= html.getElementById("container").getElementById("tableContainer").getElementById("tableData")对于帖子中的每个元素对于elem.Cells中的每个拖曳c = c + 1:单元格(r + 1,c)= trow.innerText下一拖c = 0:r = r + 1下一个元素'----------------------------------停止结束子
您可以使用该端点,然后从包含感兴趣数据的响应中提取javascript对象,并使用jsonconverter.bas进行解析.
Json库:
我使用jsonconverter.bas.从
Py:
python更加简洁
导入请求,re,json,csv数据= {'inputType':'1','stringInput':'https://www.youtube.com/channel/UC43lrLHl4EhxaKQn2HrcJPQ','限制':'100','keyType':'默认'}r = request.post('https://youtube-playlist-analyzer.appspot.com/submit',data=data)p = re.compile(r'json_items =(.*?);',re.DOTALL)结果= json.loads(p.findall(r.text)[0])使用open("data.csv","w",encoding ="utf-8-sig",newline ='')作为csv_file:w = csv.writer(csv_file,delimiter =,",quoting = csv.QUOTE_MINIMAL)#针对语言环境更改此设置w.writerow(['Title','ViewCount'])对于结果中的项目:w.writerow([[item ['title'],item ['viewCount']])
In the following working code, I am trying to navigate to specific youtube channel To get the videos names into excel .. It is working but partially as the code just lists about 30 videos only
Dim x, html As Object, ele As Object, sKeyWords As String, i As Long
With CreateObject("MSXML2.ServerXMLHTTP")
.Open "GET", "youtube channel url videos", False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
If .Status <> 200 Then MsgBox "Problem" & vbNewLine & .Status & " - " & .statusText: Exit Sub
Set html = CreateObject("htmlfile")
html.body.innerHTML = .responseText
How can I maje the code load all the content of the page ..? so as to get all the videos that are listed there.
I have found a site that lists all the videos in one table but as for the part of scraping the table, I failed to extract the video name or even dealing with the table Here's my try
Sub Post_Method()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim htmla As Object
Dim trow As Object
Dim tcel As Object
Dim strArg As String
Dim c As Long
Dim x As Long
strArg = "inputType=1&stringInput=https%3A%2F%2Fwww.youtube.com%2Fchannel%2FUC43lrLHl4EhxaKQn2HrcJPQ&limit=100&keyType=default&customKey="
With http
.Open "POST", "https://youtube-playlist-analyzer.appspot.com/submit", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send strArg
html.body.innerHTML = .responseText
' WriteTxtFile html.body.innerHTML
End With
Dim posts As Object, elem As Object, r As Long
'This part I can't adjust
'------------------------
Set posts = html.getElementById("container").getElementById("tableContainer").getElementById("tableData")
For Each elem In posts.Children
For Each trow In elem.Cells
c = c + 1: Cells(r + 1, c) = trow.innerText
Next trow
c = 0: r = r + 1
Next elem
'----------------------------------
Stop
End Sub
You can use that endpoint then extract the javascript object from the response which contains data of interest and parse with jsonconverter.bas.
Json library:
I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.
VBA:
Option Explicit
Public Sub GetYouTubeViews()
Dim s As String, ws As Worksheet, body As String
body = "inputType=1&stringInput=https://www.youtube.com/channel/UC43lrLHl4EhxaKQn2HrcJPQ&limit=100&keyType=default"
Set ws = ThisWorkbook.Worksheets("Sheet1")
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", "https://youtube-playlist-analyzer.appspot.com/submit", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send body
s = .responseText
End With
Dim results(), r As Long, jsonSource As String
Dim json As Object, item As Object, headers()
jsonSource = GetString(s, "json_items = ", ";")
If jsonSource = "No match" Then Exit Sub
Set json = JsonConverter.ParseJson(jsonSource)
headers = Array("Title", "ViewCount")
ReDim results(1 To json.Count, 1 To UBound(headers) + 1)
For Each item In json
r = r + 1
results(r, 1) = item("title")
results(r, 2) = item("viewCount")
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetString(ByVal inputString As String, ByVal startPhrase As String, ByVal endPhrase As String) As String
Dim s As Long, e As Long
s = InStr(inputString, startPhrase)
If Not s > 0 Then
GetString = "No match"
Exit Function
End If
e = InStr(s + Len(startPhrase) - 1, inputString, endPhrase)
If Not e > 0 Then
GetString = "No match"
Exit Function
End If
GetString = Mid$(inputString, s + Len(startPhrase), e - (s + Len(startPhrase)))
End Function
Sample results:
Py:
A lot more concise with python
import requests, re, json ,csv
data = {
'inputType': '1',
'stringInput': 'https://www.youtube.com/channel/UC43lrLHl4EhxaKQn2HrcJPQ',
'limit': '100',
'keyType': 'default'
}
r = requests.post('https://youtube-playlist-analyzer.appspot.com/submit', data=data)
p = re.compile(r'json_items = (.*?);', re.DOTALL)
results = json.loads(p.findall(r.text)[0])
with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL) #change this for locale
w.writerow(['Title','ViewCount'])
for item in results:
w.writerow([item['title'], item['viewCount']])
这篇关于使用XMLHTTP方法时,请等待页面加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!