执行一些步骤后,无法从网页中获取动态填充的数字 [英] Can't fetch a number populated dynamically from a webpage after following some steps

查看:21
本文介绍了执行一些步骤后,无法从网页中获取动态填充的数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用请求模块和 BeautifulSoup 库创建了一个脚本来从网页中获取一些表格内容.要生成表格,必须手动执行我在所附图像中显示的步骤.我在下面粘贴的代码是一个有效的代码,但我试图解决的主要问题是以编程方式获取 title 编号,在这种情况下是 628086906附加到我在此处硬编码的 table_link.

点击工具按钮后 - 在第 6 步中 - 当您将光标悬停在地图上时,您可以看到此选项 Multiple,当您点击该选项时,您会看到包含标题编号的网址.

起始页

这正是脚本所遵循的步骤.

这是第 6 步中需要在输入框中输入的 linc 编号 0030278592.

我已经尝试过(因为我在 table_link 中使用了硬编码的标题编号,所以使用了一个):

导入请求从 bs4 导入 BeautifulSoup链接 = 'https://alta.registries.gov.ab.ca/spinii/logon.aspx'lnotice = 'https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx'search_page = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'map_page = 'http://alta.registries.gov.ab.ca/SpinII/mapindex.aspx'map_find = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'table_link = 'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title=628086906'def get_content(s,link):r = s.get(链接)汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['uctrlLogon:cmdLogonGuest.x'] = '80'有效载荷['uctrlLogon:cmdLogonGuest.y'] = '20'r = s.post(link,data=payload)汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['cmdYES.x'] = '52'有效载荷['cmdYES.y'] = '8's.post(lnotice,data=payload)s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/spinii/welcomeguest.aspx's.get(search_page)s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx's.get(map_page)r = s.get(map_find)s.headers['Referer'] = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['__EVENTTARGET'] = 'Finds$lstFindTypes'有效载荷['Finds:lstFindTypes'] = 'Linc'有效载荷['Finds:ctlLincNumber:txtLincNumber'] = '0030278592'r = s.post(map_find,data=payload)r = s.get(table_link)打印(r.text)如果 __name__ == __main__":使用 requests.Session() 作为 s:s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'get_content(s,link)

<块引用>

如何从 url 中获取标题编号?

<块引用>

如何从该站点获取所有 linc 编号,以便我根本不需要使用地图?

这个网站唯一的问题是白天无法维护.

解决方案

数据调用自:

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

内容在被 OpenLayers 库 使用之前以自定义格式编码.所有解码都位于这个JS文件.如果你美化它,你可以寻找解码它的WayTo.Wtb.Format.WTBOpenLayers.Class.二进制是逐字节解码的,在 JS 中如下所示:

switch(elementType){情况1:var lineColor = new WayTo.Wtb.Element.LineColor();byteOffset = lineColor.parse(dataReader, byteOffset);outputElement = lineColor;休息;案例2:var lineStyle = new WayTo.Wtb.Element.LineStyle();byteOffset = lineStyle.parse(dataReader, byteOffset);outputElement = lineStyle;休息;案例3:var ellipse = new WayTo.Wtb.Element.Ellipse();byteOffset = ellipse.parse(dataReader, byteOffset);outputElement = 椭圆;休息;…………}

我们必须重现这种解码算法才能获得原始数据.我们不需要解码所有的对象,我们只想得到正确的偏移量并正确提取strings.这是解码部分的 脚本来自文件的数据( 的输出):

 with open("wtb.bin", mode='rb') 作为文件:编码数据 = file.read()偏移量 = 0对象 = []而偏移0:偏移量+= 16 + 名称长度如果编码数据[偏移] == 0:偏移+=1别的:偏移+= 16numberOfPoints = int.from_bytes(encodedData[offset:offset+2],小")偏移+=2offset+=numberOfPoints*8elif curElemType == 257:经过别的:偏移+= curElemSize*2打印(f偏移差异{offset-offsetInit}")打印(--------------------------------")打印(对象)打印(len(编码数据))打印(偏移)

(旁注:注意元素大小是大端,所有其他值都是小端)

运行 这个 repl.it 以查看它如何解码文件>

从那里我们构建了抓取数据的步骤,为了清楚起见,我将描述所有步骤(即使是您已经弄清楚的步骤):

登录

使用 :

登录网站

获取 https://alta.registries.gov.ab.ca/spinii/logon.aspx

抓取输入名称/值并添加 uctrlLogon:cmdLogonGuest.xuctrlLogon:cmdLogonGuest.y 然后调用

POST https://alta.registries.gov.ab.ca/spinii/logon.aspx

法律声明

法律通知调用不是获取地图值所必需的,而是获取项目信息所必需的(帖子的最后一步)

获取 https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

抓取input标签名/值,设置cmdYES.xcmdYES.y,然后调用

POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

地图数据

调用服务器地图API:

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

具有以下数据:

<代码>{mt":titleresults",qt":lincNo",LINCNumber":lincNumber,权利":B",#不需要cx":1920,#screen 定义cy":1080,}

cx/xy 是画布大小

使用上述方法对编码数据进行解码.你会得到:

[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_greenline': 0_red, ': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '001217028294';'实体':23,'出现':628048595,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':'fill_color_red':258_0,'blue'type':'LargePolygon','name':'0010691822 8722524;1;163','entity':23,'occurence':628222354,'line_color_green':0,'line_129_19'blue':, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'ocence,entity'':628021327,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':'argely',','名称':'0010694454 8722524;1;179','实体':23,'出现':628191678,'line_color_green':0,'line_color_red':129,'绿色:'fill_color_5', 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence_307' 6':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':180},{'type':'LargePolygon','0381':8722524;1;177','实体':23,'出现':628209696,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green','fill_color_green':'fill_color5':255fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, :'line_0,_color_red'129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':180},{'type':'LargePolygon','name':'0010694352412;;176','实体':23,'出现':628315826,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_25_fill'blue'blue'180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0,'100'_color_2, '100'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 87225524',1;23,'出现':628132810,'line_color_green':0,'line_color_red':129,

提取信息

如果你想定位一个特定的lincNumber,你需要寻找多边形的样式,因为对于multiple"值(例如具有多个项目的值)没有提及响应的 lincNumber id,只是一个链接引用.以下将获得所选项目:

selectedZone = [吨对于对象中的 t如果 t.get(fill_color_green", 255) <255 和 t.get(line_color_red") == 255][0]打印(选定区域)

调用您在帖子中提到的网址以获取数据并提取表格:

GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone[occurence"]}

完整代码:

导入请求从 bs4 导入 BeautifulSoup将熊猫导入为 pdlincNumber = "0030278592";#lincNumber = "0010661156";s = requests.Session()# 1) 登录r = s.get(https://alta.registries.gov.ab.ca/spinii/logon.aspx")汤 = BeautifulSoup(r.text, html.parser")有效载荷 = dict([(t["name"], t.get("value", ""))对于汤中的 t.findAll("input")])有效载荷[uctrlLogon:cmdLogonGuest.x"] = 76有效载荷[uctrlLogon:cmdLogonGuest.y"] = 25s.post(https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)#2) 法律声明r = s.get(https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")汤 = BeautifulSoup(r.text, html.parser")有效载荷 = dict([(t["name"], t.get("value", ""))对于汤中的 t.findAll("input")])有效载荷[cmdYES.x"] = 82有效载荷[cmdYES.y"] = 3s.post(https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx",数据=有效载荷)# 3) 地图数据r = s.post(http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",数据= {mt":titleresults",qt":lincNo",LINCNumber":lincNumber,权利":B",#不需要cx":1920,#screen 定义cy":1080,})def decodeWtb(encodedData):偏移 = 0对象 = []迭代 = 0而偏移0:偏移量+= 16 + 名称长度如果编码数据[偏移] == 0:偏移+=1别的:偏移+= 16numberOfPoints = int.from_bytes(encodedData[offset:offset+2],小")偏移+=2offset+=numberOfPoints*8elif curElemType == 257:经过别的:偏移+= curElemSize*2返回对象# 4) 解码自定义格式对象 = decodeWtb(r.content)# 5) 获取选中区域选定区域 = [吨对于对象中的 t如果 t.get(fill_color_green", 255) <255 和 t.get(line_color_red") == 255][0]打印(选定区域)# 6) 获取物品的信息r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone[发生"]}')df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]del df['加入购物车']del df['查看']打印(df [:-1])

在 repl.it 上运行这个

输出

 Title Number Type LINC Number Short Legal Rights Registration Date 更改/取消日期0 052400228 当前标题 0030278592 0420091;16 表面 19/09/2005 13/11/20191 072294084 当前标题 0030278551 0420091;12 表面 22/05/2007 21/08/20072 072400529 当前标题 0030278469 0420091;3 表面 05/07/2007 28/08/20073 072498228 当前标题 0030278501 0420091;7 表面 18/08/2007 08/02/20084 072508699 当前标题 0030278535 0420091;10 表面 23/08/2007 13/12/20075 072559500 当前标题 0030278477 0420091;4 表面 17/09/2007 19/11/20076 072559508 当前标题 0030278576 0420091;14 表面 17/09/2007 09/01/20097 072559521 当前标题 0030278519 0420091;8 表面 17/09/2007 07/11/20078 072559530 当前标题 0030278493 0420091;6 表面 17/09/2007 25/08/20089 072559605 当前标题 0030278485 0420091;5 表面 17/09/2007 23/12/2008

如果您想获得更多条目,可以查看 objects 字段.如果您想获得有关坐标等项目的更多信息,您可以改进解码器...

也可以通过查看包含 lincNumber 的 name 字段来匹配位于目标周围的其他 lincNumber,除非存在多个".名字在里面.

有趣的事实:

此流程中无需设置 http 标头

I've created a script using requests module and BeautifulSoup library to fetch some tabular content from a webpage. To generate the table it is necessary to follow the steps manually that I've shown in the image attached. The code that I've pasted below is a working one but the main problem that I'm trying to solve is fetch the title number programmatically which is in this case 628086906 that is attached to the table_link that I've hardcoded here.

After clicking on the tool button - in step 6 - when you hover your cursor over the map, you can see this option Multiple which when you click leads you to the url containing title number.

start page

This is exactly the steps the script is following.

This is the linc number 0030278592 which will be required to input in the inputbox in step 6.

I've tried with (working one as I've used hardcoded title number within table_link):

import requests
from bs4 import BeautifulSoup

link = 'https://alta.registries.gov.ab.ca/spinii/logon.aspx'
lnotice = 'https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx'
search_page = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'
map_page = 'http://alta.registries.gov.ab.ca/SpinII/mapindex.aspx'
map_find = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'
table_link = 'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title=628086906'

def get_content(s,link):   
    r = s.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
    payload['uctrlLogon:cmdLogonGuest.x'] = '80'
    payload['uctrlLogon:cmdLogonGuest.y'] = '20'

    r = s.post(link,data=payload)
    soup = BeautifulSoup(r.text,"lxml")
    payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
    payload['cmdYES.x'] = '52'
    payload['cmdYES.y'] = '8'

    s.post(lnotice,data=payload)
    s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/spinii/welcomeguest.aspx'
    
    s.get(search_page)
    s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'
    
    s.get(map_page)
    
    r = s.get(map_find)
    s.headers['Referer'] = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'
    soup = BeautifulSoup(r.text,"lxml")
    payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
    payload['__EVENTTARGET'] = 'Finds$lstFindTypes'
    payload['Finds:lstFindTypes'] = 'Linc'
    payload['Finds:ctlLincNumber:txtLincNumber'] = '0030278592'
    
    r = s.post(map_find,data=payload)
    
    r = s.get(table_link)
    print(r.text)


if __name__ == "__main__":
    with requests.Session() as s:
        s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'
        get_content(s,link)

How can I grab the title number from the url?

or

How can I fetch all the linc numbers from that site so that I don't need to use map at all?

The only problem with this site is that it is unavailable in daytime for maintenance.

解决方案

The data is called from :

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

The content is encoded in a custom format before being consumed by OpenLayers library. All the decoding is located in this JS file. If you beautify it, you can look for the decoding it's WayTo.Wtb.Format.WTB's OpenLayers.Class. The binary is decoded byte by byte like the following in JS :

switch(elementType){
    case 1:
        var lineColor = new WayTo.Wtb.Element.LineColor();
        byteOffset = lineColor.parse(dataReader, byteOffset);
        outputElement = lineColor;
        break;
    case 2:
        var lineStyle = new WayTo.Wtb.Element.LineStyle();
        byteOffset = lineStyle.parse(dataReader, byteOffset);
        outputElement = lineStyle;
        break;
    case 3:
        var ellipse = new WayTo.Wtb.Element.Ellipse();
        byteOffset = ellipse.parse(dataReader, byteOffset);
        outputElement = ellipse;
        break;
    ........
}

We have to reproduce this decoding algorithm in order to get the raw data. We don't need to decode all the objects, we only want to get the offset right and extract the strings correctly. Here is a script for the decoding part which decodes the data from a file (output of ):

with open("wtb.bin", mode='rb') as file:
    encodedData = file.read()
    offset = 0
    objects = []

    while offset < len(encodedData):

        elementSize = encodedData[offset]
        offset+=1
        elementType = encodedData[offset]
        offset+=1

        if elementType == 0:
            break

        curElemSize = elementSize
        curElemType = elementType

        if elementType== 114:
            largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
            offset+=4
            largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            curElemSize = largeElementSize
            curElemType = largeElementType

        print(f"type {curElemType} | size {curElemSize}")
        offsetInit = offset

        if curElemType == 1:
            offset+=4
        elif curElemType == 2:
            offset+=2
        elif curElemType == 3:
            offset+=20
        elif curElemType == 4:
            offset+=28
        elif curElemType == 5:
            offset+=12
        elif curElemType == 6:
            textLength = curElemSize - 3
            objects.append({
                "type": "Text",
                "x_position": int.from_bytes(encodedData[offset:offset+2], "little"),
                "y_position": int.from_bytes(encodedData[offset+2:offset+4], "little"),
                "rotation": int.from_bytes(encodedData[offset+4:offset+6], "little"),
                "text": encodedData[offset+6:offset+6+(textLength*2)].decode("utf-8").replace('\x00','')
            })
            offset+=6+(textLength*2)
        elif curElemType == 7:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 27:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 8:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 28:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 13:
            offset+=4
        elif curElemType == 14:
            offset+=2
        elif curElemType == 15:
            offset+=2
        elif curElemType == 100:
            pass
        elif curElemType == 101:
            offset+=20
        elif curElemType == 102:
            offset+=2
        elif curElemType == 103:
            pass
        elif curElemType == 104:
            highShort = int.from_bytes(encodedData[offset+2:offset+4], "little")
            lowShort = int.from_bytes(encodedData[offset+4:offset+6], "little")
            objects.append({
                "type": "StartNumericCell",
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurrence": (highShort << 16) + lowShort
            })
            offset+=6
        elif curElemType == 105:
            #end cell
            pass
        elif curElemType == 109:
            textLength = curElemSize - 1
            objects.append({
                "type": "StartAlphanumericCell",
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurrence":encodedData[offset+2:offset+2+(textLength*2)].decode("utf-8").replace('\x00','')
            })
            offset+=2+(textLength*2)
        elif curElemType == 111:
            offset+=40
        elif curElemType == 112:
            objects.append({
                "type": "CoordinatePlane",
                "projection_code": encodedData[offset+48:offset+52].decode("utf-8").replace('\x00','')
            })
            offset+=52
        elif curElemType == 113:
            offset+=24
        elif curElemType == 256:
            nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
            objects.append({
                "type": "LargePolygon",
                "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
                "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little")
            })
            if nameLength > 0:
                offset+= 16 + nameLength
                if encodedData[offset] == 0:
                    offset+=1
            else:
                offset+= 16
            numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            offset+=numberOfPoints*8
        elif curElemType == 257:
            pass
        else:
            offset+= curElemSize*2
        print(f"offset diff {offset-offsetInit}")
        print("--------------------------------")

    print(objects)
    print(len(encodedData))
    print(offset)

(Sidenote: note that element size is in big endian and all other values are in little endian)

Run this repl.it to see how it decodes the file

From there we build the steps to scrape the data, I will describe all the steps (even those you've already figured out) for the sake of clarity :

Login

login to the website using :

GET https://alta.registries.gov.ab.ca/spinii/logon.aspx

scrape the input name/value and add uctrlLogon:cmdLogonGuest.x and uctrlLogon:cmdLogonGuest.y then call

POST https://alta.registries.gov.ab.ca/spinii/logon.aspx

Legal Notice

The legal notice call is not necessary to get the map values but is necessary to get the item info (last step in your post)

GET https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

Scrape the input tag name/value and set cmdYES.x and cmdYES.y and then call

POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx

Map data

Call the server map API :

POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx

with the following data :

{
    "mt":"titleresults",
    "qt":"lincNo",
    "LINCNumber": lincNumber,
    "rights": "B", #not required
    "cx": 1920, #screen definition
    "cy": 1080,
}

cx/xy are canvas size

Decode the encoded data by using the method above. You will get :

[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170859 8022146;8;99', 'entity': 23, 'occurence': 628048595, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010691822 8722524;1;163', 'entity': 23, 'occurence': 628222354, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'entity': 23, 'occurence': 628021327, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694454 8722524;1;179', 'entity': 23, 'occurence': 628191678, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence': 628307403, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010433381 8722524;1;177', 'entity': 23, 'occurence': 628209696, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694355 8722524;1;176', 'entity': 23, 'occurence': 628315826, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 8722524;1;175', 'entity': 23, 'occurence': 628132810, 'line_color_green': 0, 'line_color_red': 129, 

Extract the info

If you want to target a specific lincNumber you will need to look for the style of the polygon since for "multiple" values (eg values with multiple items) there are no mention of the lincNumber id the response, just a link reference. The following will get the selected item :

selectedZone = [
    t 
    for t in objects 
    if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)

Call the url you mention in your post to get the data and extract the table :

GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}

The full code :

import requests
from bs4 import BeautifulSoup
import pandas as pd

lincNumber = "0030278592"
#lincNumber = "0010661156"

s = requests.Session()

# 1) login
r = s.get("https://alta.registries.gov.ab.ca/spinii/logon.aspx")
soup = BeautifulSoup(r.text, "html.parser")

payload = dict([
    (t["name"], t.get("value", ""))
    for t in soup.findAll("input")
])
payload["uctrlLogon:cmdLogonGuest.x"] = 76
payload["uctrlLogon:cmdLogonGuest.y"] = 25
s.post("https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)

# 2) legal notice
r = s.get("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
    (t["name"], t.get("value", ""))
    for t in soup.findAll("input")
])
payload["cmdYES.x"] = 82
payload["cmdYES.y"] = 3
s.post("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx", data = payload)

# 3) map data
r = s.post("http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",
    data= {
        "mt":"titleresults",
        "qt":"lincNo",
        "LINCNumber": lincNumber,
        "rights": "B", #not required
        "cx": 1920, #screen definition
        "cy": 1080,
    })

def decodeWtb(encodedData):
    offset = 0

    objects = []
    iteration = 0

    while offset < len(encodedData):

        elementSize = encodedData[offset]
        offset+=1
        elementType = encodedData[offset]
        offset+=1

        if elementType == 0:
            break

        curElemSize = elementSize
        curElemType = elementType

        if elementType== 114:
            largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
            offset+=4
            largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            curElemSize = largeElementSize
            curElemType = largeElementType

        offsetInit = offset

        if curElemType == 1:
            offset+=4
        elif curElemType == 2:
            offset+=2
        elif curElemType == 3:
            offset+=20
        elif curElemType == 4:
            offset+=28
        elif curElemType == 5:
            offset+=12
        elif curElemType == 6:
            textLength = curElemSize - 3
            offset+=6+(textLength*2)
        elif curElemType == 7:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 27:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 8:
            numPoint = int(curElemSize / 2)
            offset+=4*numPoint
        elif curElemType == 28:
            numPoint = int(curElemSize / 4)
            offset+=8*numPoint
        elif curElemType == 13:
            offset+=4
        elif curElemType == 14:
            offset+=2
        elif curElemType == 15:
            offset+=2
        elif curElemType == 100:
            pass
        elif curElemType == 101:
            offset+=20
        elif curElemType == 102:
            offset+=2
        elif curElemType == 103:
            pass
        elif curElemType == 104:
            offset+=6
        elif curElemType == 105:
            pass
        elif curElemType == 109:
            textLength = curElemSize - 1
            offset+=2+(textLength*2)
        elif curElemType == 111:
            offset+=40
        elif curElemType == 112:
            offset+=52
        elif curElemType == 113:
            offset+=24
        elif curElemType == 256:
            nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
            objects.append({
                "type": "LargePolygon",
                "name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
                "entity": int.from_bytes(encodedData[offset:offset+2], "little"),
                "occurence": int.from_bytes(encodedData[offset+2:offset+6], "little"),
                "line_color_green": encodedData[offset + 8],
                "line_color_red": encodedData[offset + 7],
                "line_color_blue": encodedData[offset + 9],
                "fill_color_green": encodedData[offset + 10],
                "fill_color_red": encodedData[offset + 11],
                "fill_color_blue": encodedData[offset + 13]
            })
            if nameLength > 0:
                offset+= 16 + nameLength
                if encodedData[offset] == 0:
                    offset+=1
            else:
                offset+= 16
            numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
            offset+=2
            offset+=numberOfPoints*8
        elif curElemType == 257:
            pass
        else:
            offset+= curElemSize*2

    return objects

# 4) decode custom format
objects = decodeWtb(r.content)

# 5) get the selected area
selectedZone = [
    t 
    for t in objects 
    if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)

# 6) get the info about item
r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}')
df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]
del df['Add to Cart']
del df['View']
print(df[:-1])

Run this on repl.it

Output

  Title Number           Type LINC Number Short Legal   Rights Registration Date Change/Cancel Date
0    052400228  Current Title  0030278592  0420091;16  Surface        19/09/2005         13/11/2019
1    072294084  Current Title  0030278551  0420091;12  Surface        22/05/2007         21/08/2007
2    072400529  Current Title  0030278469   0420091;3  Surface        05/07/2007         28/08/2007
3    072498228  Current Title  0030278501   0420091;7  Surface        18/08/2007         08/02/2008
4    072508699  Current Title  0030278535  0420091;10  Surface        23/08/2007         13/12/2007
5    072559500  Current Title  0030278477   0420091;4  Surface        17/09/2007         19/11/2007
6    072559508  Current Title  0030278576  0420091;14  Surface        17/09/2007         09/01/2009
7    072559521  Current Title  0030278519   0420091;8  Surface        17/09/2007         07/11/2007
8    072559530  Current Title  0030278493   0420091;6  Surface        17/09/2007         25/08/2008
9    072559605  Current Title  0030278485   0420091;5  Surface        17/09/2007         23/12/2008

You can look at the objects field if you want to get more entries. And you can improve the decoder if you want to get more info about item like coordinates etc...

It's also possible to match the other lincNumber located around your target by looking at the name field which contains the lincNumber unless there is a "multiple" name in it.

fun fact :

no http header need to be set in this flow

这篇关于执行一些步骤后,无法从网页中获取动态填充的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆