Excel VBA脚本在URL列表中查找404错误? [英] Excel VBA script to find 404 errors in a list of URLs?

查看:374
本文介绍了Excel VBA脚本在URL列表中查找404错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有这个电子表格包含大约5000个URL的列表。 (我们公司内部网上的所有页面)



我们知道某些链接已损坏,但不知道如何确定哪些链接不点击所有5000个链接



通常这将是一件简单的事情:创建一个包含5000页的链接的网页,然后使用Xenu Link Sleuth等工具检查链接。 p>

但是在这种情况下这将无法工作,因为许多链接被重定向,重定向代码欺骗HTTP.response 200,这使得Xenu将其视为有效的URL。



但是,有一个好消息:重定向脚本不能在Excel中运行。如果单击Excel中的错误链接,则重定向脚本不会执行,并且HTTP响应将返回到Excel。我相信Excel应该能够识别正确的HTTP响应代码(404) - 或者至少是否链接是否有效。



这让我看到我的问题:



有没有办法使用VBA编写一个脚本,点击每个链接并捕获结果?捕获的结果可以是HTTP响应代码的形式,或者您认为在查找5000页此列表中的错误链接时可能会有用的其他内容。理想情况下,结果将被写入与链接相邻的电子表格中的单元格。



如果任何人如果熟悉VBA来建议解决这个问题,我会永远感激!

解决方案

以下是从Excel列表中查看状态行的示例:

  Sub TestLinks()
Dim source As Range,req As Object,url $
Set req = CreateObject(Msxml2.ServerXMLHTTP.6.0 )

'define是链接,结果是
设置source =范围(A1:B2)

'清除结果
源.Columns(2).Clear

'每行迭代
对于i = 1 To source.Rows.count
'从第一列获取
url的链接= source.Cells(i,1)

'使用HEAD发送请求以检查状态行
req.OpenHEAD,url,False
req.setRequestHeader Accept,image / webp,image / *,* / *; q = 0.8
req.setRequestHeaderAccept-Language,en-GB,en-US; q = 0.8,en; q = 0.6
req.setRequestHeaderAccept-Encoding,gzip,deflate
req.setRequestHeaderCache-Control,no-cache
req.setRequestHeader Content-Type,text / xml; charset = utf-8
req.setRequestHeaderUser-Agent,Mozilla / 5.0(Windows NT 6.3; WOW64)AppleWebKit / 537.36(KHTML,像Gecko)Chrome / 47.0.2526.111 Safari / 537.36
req.Send

'在第二列中写出结果
source.Cells(i,2)= req.Status
下一个

MsgBoxFinished!
End Sub


So, I have this spreadsheet with a list of about 5000 URLs. (All pages on our corporate intranet.)

We know some of the links are broken, but don't know of a good way to determine which without clicking all 5000 links.

Normally this would be a simple matter: Create a web page with links to the 5000 pages, and then check the links with a tool like Xenu Link Sleuth.

But that won't work in this case because many of the links are being redirected, and the redirect code spoofs HTTP.response 200, which tricks Xenu into treating it as a valid URL.

However, there is some good news: The redirect script does not run from within Excel. If you click a bad link inside Excel, the redirect script does not execute and the HTTP response is reported back to Excel. I believe Excel should be able to identify the correct HTTP response code (404) - or at least whether the link was valid or not.

Which brings me to my question:

Is there a way using VBA to write a script that would click through every link and capture the result? The result captured could be in the form of the HTTP response code or anything else you think would be useful in finding the bad links in this list of 5000 pages. Ideally the result would be written to a cell in the spreadsheet adjacent to the link.

If anyone if familiar enough with VBA to suggest a solution to this problem, I would be eternally grateful!

解决方案

Here is an example to check the status line from a list of URL with Excel:

Sub TestLinks()
  Dim source As Range, req As Object, url$
  Set req = CreateObject("Msxml2.ServerXMLHTTP.6.0")

  ' define were the links and results are
  Set source = Range("A1:B2")

  ' clear the results
  source.Columns(2).Clear

  ' iterate each row
  For i = 1 To source.Rows.count
    ' get the link from the first column
    url = source.Cells(i, 1)

    ' send the request using a HEAD to check the status line
    req.Open "HEAD", url, False
    req.setRequestHeader "Accept", "image/webp,image/*,*/*;q=0.8"
    req.setRequestHeader "Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6"
    req.setRequestHeader "Accept-Encoding", "gzip, deflate"
    req.setRequestHeader "Cache-Control", "no-cache"
    req.setRequestHeader "Content-Type", "text/xml; charset=utf-8"
    req.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
    req.Send

    ' write the result in the second column
    source.Cells(i, 2) = req.Status
  Next

  MsgBox "Finished!"
End Sub

这篇关于Excel VBA脚本在URL列表中查找404错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆