如何在搜索表单上找到 html_node? [英] How do I find html_node on search form?

查看:44
本文介绍了如何在搜索表单上找到 html_node?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一份姓名列表(名字、姓氏和出生日期),我需要在乔治亚州富尔顿县(美国)监狱网站上搜索这些姓名,以确定某人是入狱还是出狱.

I have a list of names (first name, last name, and date-of-birth) that I need to search the Fulton County Georgia (USA) Jail website to determine if a person is in or released from jail.

网站是http://justice.fultoncountyga.gov/PAJailManager/JailingSearch.aspx?ID=400

该网站要求您输入姓氏和名字,然后它会为您提供结果列表.

The site requires you enter a last name and first name, then it gives you a list of results.

我发现了一些给我一些指导的 stackoverflow 帖子,但我仍在努力弄清楚这一点.我正在使用这个 帖子 作为和遵循的例子.我正在使用 SelectorGaget 来帮助找出 CSS 标签.

I have found some stackoverflow posts that have given me some direction, but I'm still struggling to figure this out. I"m using this post as and example to follow. I am using SelectorGaget to help figure out the CSS tags.

这是我到目前为止的代码.现在我不知道要使用什么 html_node.

Here is the code I have so far. Right now I can't figure out what html_node to use.

library(rvest)

# Specify URL
fc.url <- "http://justice.fultoncountyga.gov/PAJailManager/JailingSearch.aspx?ID=400"

# start session
jail <- html_session(fc.url)

# Grab initial form
form.unfilled <- jail %>% html_node("form")

form.unfilled

我从 form.unfilled 得到的结果是 {xml_missing} <NA> 我知道这是不对的.

The result I get from form.unfilled is {xml_missing} <NA> which I know isn't right.

我想如果我能找出 html_node 的值,我就可以继续使用 set_valuessubmit_form.

I think if I can figure out the html_node value, I can proceed to using set_values and submit_form.

谢谢.

推荐答案

它出现在网页打开到http://justice.fultoncountyga.gov/PAJailManager/default.aspx"的初始调用中.会话开始后,您应该能够跳转到搜索页面:

It appears on the initial call the webpage opens onto "http://justice.fultoncountyga.gov/PAJailManager/default.aspx". Once the session is started you should be able to jump to the search page:

library(rvest)

# Specify URL
fc.url <- "http://justice.fultoncountyga.gov/PAJailManager/JailingSearch.aspx?ID=400"

# start session
jail <- html_session("http://justice.fultoncountyga.gov/PAJailManager/default.aspx")
#jump to search page
jail2 <- jail %>% jump_to("http://justice.fultoncountyga.gov/PAJailManager/JailingSearch.aspx?ID=400")

#list the form's fields
html_form(jail2)[[1]]

# Grab initial form
form.unfilled <- jail2 %>% html_node("form")

注意:验证您的操作是否符合网站的服务条款.许多网站确实有针对抓取的政策.

Note: Verify that your actions are within the terms of service for the website. Many sites do have policy against scraping.

这篇关于如何在搜索表单上找到 html_node?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆