执行一些步骤后,无法从网页中获取动态填充的数字 [英] Can't fetch a number populated dynamically from a webpage after following some steps
问题描述
我已经使用请求模块和 BeautifulSoup 库创建了一个脚本来从网页中获取一些表格内容.要生成表格,必须手动执行我在所附图像中显示的步骤.我在下面粘贴的代码是一个有效的代码,但我试图解决的主要问题是以编程方式获取 title
编号,在这种情况下是 628086906
附加到我在此处硬编码的 table_link
.
点击工具按钮后 - 在第 6 步中 - 当您将光标悬停在地图上时,您可以看到此选项 Multiple
,当您点击该选项时,您会看到包含标题编号的网址.
这正是脚本所遵循的步骤.
这是第 6 步中需要在输入框中输入的 linc 编号 0030278592
.
我已经尝试过(因为我在 table_link
中使用了硬编码的标题编号,所以使用了一个):
导入请求从 bs4 导入 BeautifulSoup链接 = 'https://alta.registries.gov.ab.ca/spinii/logon.aspx'lnotice = 'https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx'search_page = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'map_page = 'http://alta.registries.gov.ab.ca/SpinII/mapindex.aspx'map_find = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'table_link = 'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title=628086906'def get_content(s,link):r = s.get(链接)汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['uctrlLogon:cmdLogonGuest.x'] = '80'有效载荷['uctrlLogon:cmdLogonGuest.y'] = '20'r = s.post(link,data=payload)汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['cmdYES.x'] = '52'有效载荷['cmdYES.y'] = '8's.post(lnotice,data=payload)s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/spinii/welcomeguest.aspx's.get(search_page)s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx's.get(map_page)r = s.get(map_find)s.headers['Referer'] = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'汤 = BeautifulSoup(r.text,lxml")payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}有效载荷['__EVENTTARGET'] = 'Finds$lstFindTypes'有效载荷['Finds:lstFindTypes'] = 'Linc'有效载荷['Finds:ctlLincNumber:txtLincNumber'] = '0030278592'r = s.post(map_find,data=payload)r = s.get(table_link)打印(r.text)如果 __name__ == __main__":使用 requests.Session() 作为 s:s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'get_content(s,link)
<块引用>
如何从 url 中获取标题编号?
或
<块引用>如何从该站点获取所有 linc 编号,以便我根本不需要使用地图?
这个网站唯一的问题是白天无法维护.
数据调用自:
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
内容在被 OpenLayers 库 使用之前以自定义格式编码.所有解码都位于这个JS文件.如果你美化它,你可以寻找解码它的WayTo.Wtb.Format.WTB
的OpenLayers.Class
.二进制是逐字节解码的,在 JS 中如下所示:
switch(elementType){情况1:var lineColor = new WayTo.Wtb.Element.LineColor();byteOffset = lineColor.parse(dataReader, byteOffset);outputElement = lineColor;休息;案例2:var lineStyle = new WayTo.Wtb.Element.LineStyle();byteOffset = lineStyle.parse(dataReader, byteOffset);outputElement = lineStyle;休息;案例3:var ellipse = new WayTo.Wtb.Element.Ellipse();byteOffset = ellipse.parse(dataReader, byteOffset);outputElement = 椭圆;休息;…………}
我们必须重现这种解码算法才能获得原始数据.我们不需要解码所有的对象,我们只想得到正确的偏移量并正确提取strings
.这是解码部分的 python 脚本来自文件的数据(curl 的输出):
with open("wtb.bin", mode='rb') 作为文件:编码数据 = file.read()偏移量 = 0对象 = []而偏移0:偏移量+= 16 + 名称长度如果编码数据[偏移] == 0:偏移+=1别的:偏移+= 16numberOfPoints = int.from_bytes(encodedData[offset:offset+2],小")偏移+=2offset+=numberOfPoints*8elif curElemType == 257:经过别的:偏移+= curElemSize*2打印(f偏移差异{offset-offsetInit}")打印(--------------------------------")打印(对象)打印(len(编码数据))打印(偏移)
(旁注:注意元素大小是大端,所有其他值都是小端)
运行 这个 repl.it 以查看它如何解码文件>
从那里我们构建了抓取数据的步骤,为了清楚起见,我将描述所有步骤(即使是您已经弄清楚的步骤):
登录
使用 :
登录网站获取 https://alta.registries.gov.ab.ca/spinii/logon.aspx
抓取输入名称/值并添加 uctrlLogon:cmdLogonGuest.x
和 uctrlLogon:cmdLogonGuest.y
然后调用
POST https://alta.registries.gov.ab.ca/spinii/logon.aspx
法律声明
法律通知调用不是获取地图值所必需的,而是获取项目信息所必需的(帖子的最后一步)
获取 https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
抓取input
标签名/值,设置cmdYES.x
和cmdYES.y
,然后调用
POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
地图数据
调用服务器地图API:
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
具有以下数据:
<代码>{mt":titleresults",qt":lincNo",LINCNumber":lincNumber,权利":B",#不需要cx":1920,#screen 定义cy":1080,}
cx
/xy
是画布大小
使用上述方法对编码数据进行解码.你会得到:
[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_greenline': 0_red, ': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '001217028294';'实体':23,'出现':628048595,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':'fill_color_red':258_0,'blue'type':'LargePolygon','name':'0010691822 8722524;1;163','entity':23,'occurence':628222354,'line_color_green':0,'line_129_19'blue':, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'ocence,entity'':628021327,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':'argely',','名称':'0010694454 8722524;1;179','实体':23,'出现':628191678,'line_color_green':0,'line_color_red':129,'绿色:'fill_color_5', 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence_307' 6':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':180},{'type':'LargePolygon','0381':8722524;1;177','实体':23,'出现':628209696,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green','fill_color_green':'fill_color5':255fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, :'line_0,_color_red'129,'line_color_blue':129,'fill_color_green':255,'fill_color_red':255,'fill_color_blue':180},{'type':'LargePolygon','name':'0010694352412;;176','实体':23,'出现':628315826,'line_color_green':0,'line_color_red':129,'line_color_blue':129,'fill_color_green':255,'fill_25_fill'blue'blue'180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0,'100'_color_2, '100'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 87225524',1;23,'出现':628132810,'line_color_green':0,'line_color_red':129,
提取信息
如果你想定位一个特定的lincNumber
,你需要寻找多边形的样式,因为对于multiple"值(例如具有多个项目的值)没有提及响应的 lincNumber
id,只是一个链接引用.以下将获得所选项目:
selectedZone = [吨对于对象中的 t如果 t.get(fill_color_green", 255) <255 和 t.get(line_color_red") == 255][0]打印(选定区域)
调用您在帖子中提到的网址以获取数据并提取表格:
GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone[occurence"]}
完整代码:
导入请求从 bs4 导入 BeautifulSoup将熊猫导入为 pdlincNumber = "0030278592";#lincNumber = "0010661156";s = requests.Session()# 1) 登录r = s.get(https://alta.registries.gov.ab.ca/spinii/logon.aspx")汤 = BeautifulSoup(r.text, html.parser")有效载荷 = dict([(t["name"], t.get("value", ""))对于汤中的 t.findAll("input")])有效载荷[uctrlLogon:cmdLogonGuest.x"] = 76有效载荷[uctrlLogon:cmdLogonGuest.y"] = 25s.post(https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)#2) 法律声明r = s.get(https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")汤 = BeautifulSoup(r.text, html.parser")有效载荷 = dict([(t["name"], t.get("value", ""))对于汤中的 t.findAll("input")])有效载荷[cmdYES.x"] = 82有效载荷[cmdYES.y"] = 3s.post(https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx",数据=有效载荷)# 3) 地图数据r = s.post(http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",数据= {mt":titleresults",qt":lincNo",LINCNumber":lincNumber,权利":B",#不需要cx":1920,#screen 定义cy":1080,})def decodeWtb(encodedData):偏移 = 0对象 = []迭代 = 0而偏移0:偏移量+= 16 + 名称长度如果编码数据[偏移] == 0:偏移+=1别的:偏移+= 16numberOfPoints = int.from_bytes(encodedData[offset:offset+2],小")偏移+=2offset+=numberOfPoints*8elif curElemType == 257:经过别的:偏移+= curElemSize*2返回对象# 4) 解码自定义格式对象 = decodeWtb(r.content)# 5) 获取选中区域选定区域 = [吨对于对象中的 t如果 t.get(fill_color_green", 255) <255 和 t.get(line_color_red") == 255][0]打印(选定区域)# 6) 获取物品的信息r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone[发生"]}')df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]del df['加入购物车']del df['查看']打印(df [:-1])
输出
Title Number Type LINC Number Short Legal Rights Registration Date 更改/取消日期0 052400228 当前标题 0030278592 0420091;16 表面 19/09/2005 13/11/20191 072294084 当前标题 0030278551 0420091;12 表面 22/05/2007 21/08/20072 072400529 当前标题 0030278469 0420091;3 表面 05/07/2007 28/08/20073 072498228 当前标题 0030278501 0420091;7 表面 18/08/2007 08/02/20084 072508699 当前标题 0030278535 0420091;10 表面 23/08/2007 13/12/20075 072559500 当前标题 0030278477 0420091;4 表面 17/09/2007 19/11/20076 072559508 当前标题 0030278576 0420091;14 表面 17/09/2007 09/01/20097 072559521 当前标题 0030278519 0420091;8 表面 17/09/2007 07/11/20078 072559530 当前标题 0030278493 0420091;6 表面 17/09/2007 25/08/20089 072559605 当前标题 0030278485 0420091;5 表面 17/09/2007 23/12/2008
如果您想获得更多条目,可以查看 objects
字段.如果您想获得有关坐标等项目的更多信息,您可以改进解码器...
也可以通过查看包含 lincNumber 的 name
字段来匹配位于目标周围的其他 lincNumber,除非存在多个".名字在里面.
有趣的事实:
此流程中无需设置 http 标头
I've created a script using requests module and BeautifulSoup library to fetch some tabular content from a webpage. To generate the table it is necessary to follow the steps manually that I've shown in the image attached. The code that I've pasted below is a working one but the main problem that I'm trying to solve is fetch the title
number programmatically which is in this case 628086906
that is attached to the table_link
that I've hardcoded here.
After clicking on the tool button - in step 6 - when you hover your cursor over the map, you can see this option Multiple
which when you click leads you to the url containing title number.
This is exactly the steps the script is following.
This is the linc number 0030278592
which will be required to input in the inputbox in step 6.
I've tried with (working one as I've used hardcoded title number within table_link
):
import requests
from bs4 import BeautifulSoup
link = 'https://alta.registries.gov.ab.ca/spinii/logon.aspx'
lnotice = 'https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx'
search_page = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'
map_page = 'http://alta.registries.gov.ab.ca/SpinII/mapindex.aspx'
map_find = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'
table_link = 'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title=628086906'
def get_content(s,link):
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['uctrlLogon:cmdLogonGuest.x'] = '80'
payload['uctrlLogon:cmdLogonGuest.y'] = '20'
r = s.post(link,data=payload)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['cmdYES.x'] = '52'
payload['cmdYES.y'] = '8'
s.post(lnotice,data=payload)
s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/spinii/welcomeguest.aspx'
s.get(search_page)
s.headers['Referer'] = 'https://alta.registries.gov.ab.ca/SpinII/SearchSelectType.aspx'
s.get(map_page)
r = s.get(map_find)
s.headers['Referer'] = 'http://alta.registries.gov.ab.ca/SpinII/mapfinds.aspx'
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['__EVENTTARGET'] = 'Finds$lstFindTypes'
payload['Finds:lstFindTypes'] = 'Linc'
payload['Finds:ctlLincNumber:txtLincNumber'] = '0030278592'
r = s.post(map_find,data=payload)
r = s.get(table_link)
print(r.text)
if __name__ == "__main__":
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'
get_content(s,link)
How can I grab the title number from the url?
or
How can I fetch all the linc numbers from that site so that I don't need to use map at all?
The only problem with this site is that it is unavailable in daytime for maintenance.
The data is called from :
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
The content is encoded in a custom format before being consumed by OpenLayers library. All the decoding is located in this JS file. If you beautify it, you can look for the decoding it's WayTo.Wtb.Format.WTB
's OpenLayers.Class
. The binary is decoded byte by byte like the following in JS :
switch(elementType){
case 1:
var lineColor = new WayTo.Wtb.Element.LineColor();
byteOffset = lineColor.parse(dataReader, byteOffset);
outputElement = lineColor;
break;
case 2:
var lineStyle = new WayTo.Wtb.Element.LineStyle();
byteOffset = lineStyle.parse(dataReader, byteOffset);
outputElement = lineStyle;
break;
case 3:
var ellipse = new WayTo.Wtb.Element.Ellipse();
byteOffset = ellipse.parse(dataReader, byteOffset);
outputElement = ellipse;
break;
........
}
We have to reproduce this decoding algorithm in order to get the raw data. We don't need to decode all the objects, we only want to get the offset right and extract the strings
correctly. Here is a python script for the decoding part which decodes the data from a file (output of curl):
with open("wtb.bin", mode='rb') as file:
encodedData = file.read()
offset = 0
objects = []
while offset < len(encodedData):
elementSize = encodedData[offset]
offset+=1
elementType = encodedData[offset]
offset+=1
if elementType == 0:
break
curElemSize = elementSize
curElemType = elementType
if elementType== 114:
largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
offset+=4
largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
curElemSize = largeElementSize
curElemType = largeElementType
print(f"type {curElemType} | size {curElemSize}")
offsetInit = offset
if curElemType == 1:
offset+=4
elif curElemType == 2:
offset+=2
elif curElemType == 3:
offset+=20
elif curElemType == 4:
offset+=28
elif curElemType == 5:
offset+=12
elif curElemType == 6:
textLength = curElemSize - 3
objects.append({
"type": "Text",
"x_position": int.from_bytes(encodedData[offset:offset+2], "little"),
"y_position": int.from_bytes(encodedData[offset+2:offset+4], "little"),
"rotation": int.from_bytes(encodedData[offset+4:offset+6], "little"),
"text": encodedData[offset+6:offset+6+(textLength*2)].decode("utf-8").replace('\x00','')
})
offset+=6+(textLength*2)
elif curElemType == 7:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 27:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 8:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 28:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 13:
offset+=4
elif curElemType == 14:
offset+=2
elif curElemType == 15:
offset+=2
elif curElemType == 100:
pass
elif curElemType == 101:
offset+=20
elif curElemType == 102:
offset+=2
elif curElemType == 103:
pass
elif curElemType == 104:
highShort = int.from_bytes(encodedData[offset+2:offset+4], "little")
lowShort = int.from_bytes(encodedData[offset+4:offset+6], "little")
objects.append({
"type": "StartNumericCell",
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurrence": (highShort << 16) + lowShort
})
offset+=6
elif curElemType == 105:
#end cell
pass
elif curElemType == 109:
textLength = curElemSize - 1
objects.append({
"type": "StartAlphanumericCell",
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurrence":encodedData[offset+2:offset+2+(textLength*2)].decode("utf-8").replace('\x00','')
})
offset+=2+(textLength*2)
elif curElemType == 111:
offset+=40
elif curElemType == 112:
objects.append({
"type": "CoordinatePlane",
"projection_code": encodedData[offset+48:offset+52].decode("utf-8").replace('\x00','')
})
offset+=52
elif curElemType == 113:
offset+=24
elif curElemType == 256:
nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
objects.append({
"type": "LargePolygon",
"name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
"occurence": int.from_bytes(encodedData[offset+2:offset+6], "little")
})
if nameLength > 0:
offset+= 16 + nameLength
if encodedData[offset] == 0:
offset+=1
else:
offset+= 16
numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
offset+=numberOfPoints*8
elif curElemType == 257:
pass
else:
offset+= curElemSize*2
print(f"offset diff {offset-offsetInit}")
print("--------------------------------")
print(objects)
print(len(encodedData))
print(offset)
(Sidenote: note that element size is in big endian and all other values are in little endian)
Run this repl.it to see how it decodes the file
From there we build the steps to scrape the data, I will describe all the steps (even those you've already figured out) for the sake of clarity :
Login
login to the website using :
GET https://alta.registries.gov.ab.ca/spinii/logon.aspx
scrape the input name/value and add uctrlLogon:cmdLogonGuest.x
and uctrlLogon:cmdLogonGuest.y
then call
POST https://alta.registries.gov.ab.ca/spinii/logon.aspx
Legal Notice
The legal notice call is not necessary to get the map values but is necessary to get the item info (last step in your post)
GET https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
Scrape the input
tag name/value and set cmdYES.x
and cmdYES.y
and then call
POST https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx
Map data
Call the server map API :
POST http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx
with the following data :
{
"mt":"titleresults",
"qt":"lincNo",
"LINCNumber": lincNumber,
"rights": "B", #not required
"cx": 1920, #screen definition
"cy": 1080,
}
cx
/xy
are canvas size
Decode the encoded data by using the method above. You will get :
[{'type': 'LargePolygon', 'name': '0010495134 8722524;1;162', 'entity': 23, 'occurence': 628079167, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170859 8022146;8;99', 'entity': 23, 'occurence': 628048595, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010691822 8722524;1;163', 'entity': 23, 'occurence': 628222354, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169736 8022146;8;89', 'entity': 23, 'occurence': 628021327, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694454 8722524;1;179', 'entity': 23, 'occurence': 628191678, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694362 8722524;1;178', 'entity': 23, 'occurence': 628307403, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010433381 8722524;1;177', 'entity': 23, 'occurence': 628209696, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012169710 8022146;8;88A', 'entity': 23, 'occurence': 628021328, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694355 8722524;1;176', 'entity': 23, 'occurence': 628315826, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0012170866 8022146;8;100', 'entity': 23, 'occurence': 628163431, 'line_color_green': 0, 'line_color_red': 129, 'line_color_blue': 129, 'fill_color_green': 255, 'fill_color_red': 255, 'fill_color_blue': 180}, {'type': 'LargePolygon', 'name': '0010694347 8722524;1;175', 'entity': 23, 'occurence': 628132810, 'line_color_green': 0, 'line_color_red': 129,
Extract the info
If you want to target a specific lincNumber
you will need to look for the style of the polygon since for "multiple" values (eg values with multiple items) there are no mention of the lincNumber
id the response, just a link reference. The following will get the selected item :
selectedZone = [
t
for t in objects
if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)
Call the url you mention in your post to get the data and extract the table :
GET https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}
The full code :
import requests
from bs4 import BeautifulSoup
import pandas as pd
lincNumber = "0030278592"
#lincNumber = "0010661156"
s = requests.Session()
# 1) login
r = s.get("https://alta.registries.gov.ab.ca/spinii/logon.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
(t["name"], t.get("value", ""))
for t in soup.findAll("input")
])
payload["uctrlLogon:cmdLogonGuest.x"] = 76
payload["uctrlLogon:cmdLogonGuest.y"] = 25
s.post("https://alta.registries.gov.ab.ca/spinii/logon.aspx",data=payload)
# 2) legal notice
r = s.get("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx")
soup = BeautifulSoup(r.text, "html.parser")
payload = dict([
(t["name"], t.get("value", ""))
for t in soup.findAll("input")
])
payload["cmdYES.x"] = 82
payload["cmdYES.y"] = 3
s.post("https://alta.registries.gov.ab.ca/spinii/legalnotice.aspx", data = payload)
# 3) map data
r = s.post("http://alta.registries.gov.ab.ca/SpinII/mapserver.aspx",
data= {
"mt":"titleresults",
"qt":"lincNo",
"LINCNumber": lincNumber,
"rights": "B", #not required
"cx": 1920, #screen definition
"cy": 1080,
})
def decodeWtb(encodedData):
offset = 0
objects = []
iteration = 0
while offset < len(encodedData):
elementSize = encodedData[offset]
offset+=1
elementType = encodedData[offset]
offset+=1
if elementType == 0:
break
curElemSize = elementSize
curElemType = elementType
if elementType== 114:
largeElementSize = int.from_bytes(encodedData[offset:offset + 4], "big")
offset+=4
largeElementType = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
curElemSize = largeElementSize
curElemType = largeElementType
offsetInit = offset
if curElemType == 1:
offset+=4
elif curElemType == 2:
offset+=2
elif curElemType == 3:
offset+=20
elif curElemType == 4:
offset+=28
elif curElemType == 5:
offset+=12
elif curElemType == 6:
textLength = curElemSize - 3
offset+=6+(textLength*2)
elif curElemType == 7:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 27:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 8:
numPoint = int(curElemSize / 2)
offset+=4*numPoint
elif curElemType == 28:
numPoint = int(curElemSize / 4)
offset+=8*numPoint
elif curElemType == 13:
offset+=4
elif curElemType == 14:
offset+=2
elif curElemType == 15:
offset+=2
elif curElemType == 100:
pass
elif curElemType == 101:
offset+=20
elif curElemType == 102:
offset+=2
elif curElemType == 103:
pass
elif curElemType == 104:
offset+=6
elif curElemType == 105:
pass
elif curElemType == 109:
textLength = curElemSize - 1
offset+=2+(textLength*2)
elif curElemType == 111:
offset+=40
elif curElemType == 112:
offset+=52
elif curElemType == 113:
offset+=24
elif curElemType == 256:
nameLength = int.from_bytes(encodedData[offset+14:offset+16], "little")
objects.append({
"type": "LargePolygon",
"name": encodedData[offset+16:offset+16+nameLength].decode("utf-8").replace('\x00',''),
"entity": int.from_bytes(encodedData[offset:offset+2], "little"),
"occurence": int.from_bytes(encodedData[offset+2:offset+6], "little"),
"line_color_green": encodedData[offset + 8],
"line_color_red": encodedData[offset + 7],
"line_color_blue": encodedData[offset + 9],
"fill_color_green": encodedData[offset + 10],
"fill_color_red": encodedData[offset + 11],
"fill_color_blue": encodedData[offset + 13]
})
if nameLength > 0:
offset+= 16 + nameLength
if encodedData[offset] == 0:
offset+=1
else:
offset+= 16
numberOfPoints = int.from_bytes(encodedData[offset:offset+2], "little")
offset+=2
offset+=numberOfPoints*8
elif curElemType == 257:
pass
else:
offset+= curElemSize*2
return objects
# 4) decode custom format
objects = decodeWtb(r.content)
# 5) get the selected area
selectedZone = [
t
for t in objects
if t.get("fill_color_green", 255) < 255 and t.get("line_color_red") == 255
][0]
print(selectedZone)
# 6) get the info about item
r = s.get(f'https://alta.registries.gov.ab.ca/SpinII/popupTitleSearch.aspx?title={selectedZone["occurence"]}')
df = pd.read_html(r.content, attrs = {'class': 'bodyText'}, header =0)[0]
del df['Add to Cart']
del df['View']
print(df[:-1])
Output
Title Number Type LINC Number Short Legal Rights Registration Date Change/Cancel Date
0 052400228 Current Title 0030278592 0420091;16 Surface 19/09/2005 13/11/2019
1 072294084 Current Title 0030278551 0420091;12 Surface 22/05/2007 21/08/2007
2 072400529 Current Title 0030278469 0420091;3 Surface 05/07/2007 28/08/2007
3 072498228 Current Title 0030278501 0420091;7 Surface 18/08/2007 08/02/2008
4 072508699 Current Title 0030278535 0420091;10 Surface 23/08/2007 13/12/2007
5 072559500 Current Title 0030278477 0420091;4 Surface 17/09/2007 19/11/2007
6 072559508 Current Title 0030278576 0420091;14 Surface 17/09/2007 09/01/2009
7 072559521 Current Title 0030278519 0420091;8 Surface 17/09/2007 07/11/2007
8 072559530 Current Title 0030278493 0420091;6 Surface 17/09/2007 25/08/2008
9 072559605 Current Title 0030278485 0420091;5 Surface 17/09/2007 23/12/2008
You can look at the objects
field if you want to get more entries. And you can improve the decoder if you want to get more info about item like coordinates etc...
It's also possible to match the other lincNumber located around your target by looking at the name
field which contains the lincNumber unless there is a "multiple" name in it.
fun fact :
no http header need to be set in this flow
这篇关于执行一些步骤后,无法从网页中获取动态填充的数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!