如何使用 GTFS 列出与路线相关的所有停靠点? [英] How can I list all the stops associated with a route using GTFS?

查看:10
本文介绍了如何使用 GTFS 列出与路线相关的所有停靠点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一些 GTFS 数据,并且希望能够创建一个路线服务的所有相关站点的列表.我真的不明白如何处理 GTFS 数据.

I'm working with some GTFS data and would like to be able to create a list of all stops associated served by a route. I don't really understand how to do with with GTFS data.

Trips.txt 的格式如下:

Trips.txt comes in a format like this:

route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id1,A20120610WKD,A20120610WKD_000800_1..S03R,南渡,1,,1..S03R1,A20120610WKD,A20120610WKD_002700_1..S03R,南渡,1,,1..S03R1,A20120610WKD,A20120610WKD_004700_1..S03R,南渡,1,,1..S03R1,A20120610WKD,A20120610WKD_006700_1..S03R,南渡,1,,1..S03R1,A20120610WKD,A20120610WKD_008700_1..S03R,南渡,1,,1..S03R

我尝试使用 shape_id 读取匹配的形状,然后寻找具有匹配纬度和经度的停靠点,但这似乎并不可靠.有人知道怎么做吗?

I tried reading in the matching shape using the shape_id and then looking for stops with matching latitudes and longitudes but that doesn't seem to work reliably. Does anybody know how to do this?

推荐答案

正如您所注意到的,GTFS 中的路线和停靠点之间没有直接关系.相反,停靠点与行程相关联,其中每次行程代表车辆沿特定路线的单次运行".这反映了一个事实,一条路线不一定在任何时候都服务于它的每一站——例如,在周末,它可能会跳过高中外的站.

As you've noticed, there isn't a direct relationship between routes and stops in GTFS. Instead, stops are associated with trips, where each trip represents a single "run" of a vehicle along a particular route. This reflects the fact a route does not necessarily serve every one of its stops at all times—on weekends it might skip stops outside a high school, for instance.

因此,获取路线所服务的每个站点的列表需要组合多个模型:

So getting a list of every stop served by a route involves combining several models:

  • routes.txt 为您提供您感兴趣的路线的路线 ID.
  • trips.txt 为您提供该路线的一组行程 ID.
  • stop_times.txt 为您提供了一组停靠点 ID,用于这些行程中的每一个.
  • stops.txt 为您提供每个停靠点的信息.
  • routes.txt gives you the route ID for the route you're interested in.
  • trips.txt gives you a set of trip IDs for that route.
  • stop_times.txt gives you a set of stop IDs for the stops served on each of these trips.
  • stops.txt gives you information about each of these stops.

假设您使用 SQL 数据库来存储您的 GTFS 数据,您可能会使用这样的查询(一旦您获得了路线 ID):

Assuming you're using an SQL database to store your GTFS data, you might use a query like this (once you've obtained the route ID):

SELECT stop_id, stop_name FROM stops WHERE stop_id IN (
  SELECT DISTINCT stop_id FROM stop_times WHERE trip_id IN (
    SELECT trip_id FROM trips WHERE route_id = <route_id>));

但是请记住,这将输出路线所服务的每个曾经站点的记录.如果您要为乘客生成时间表信息,您可能希望将查询限制为仅在今天运行的行程和仅在接下来的 30 分钟内出发的停靠时间.

Remember, though, this will output a record for every stop that is ever served by the route. If you're generating schedule information for a rider you'll probably want to limit the query to only trips running today and only stop times with departures in, say, the next thirty minutes.

更新:我按照自己的方式编写了上面的 SQL 查询,因为我觉得它最简单地说明了 GTFS 模型之间的关系,但是 btse 是正确的(在下面他的回答中)这样的查询这实际上永远不会在生产中使用.太慢了.您可以改为使用表连接和索引来保持合理的查询时间.

Update: I wrote the above SQL query the way I did as I felt it most simply illustrated the relationship between the GTFS models, but btse is correct (in his answer below) that a query like this would never actually be used in production. It's too slow. You would instead use table joins and indices to keep query times reasonable.

这是一个等效的查询,其编写方式更适合复制并粘贴到实际应用程序中:

Here is an equivalent query, written in a way more suited to being copied and pasted into a real application:

SELECT DISTINCT stops.stop_id, stops.stop_name
  FROM trips
  INNER JOIN stop_times ON stop_times.trip_id = trips.trip_id
  INNER JOIN stops ON stops.stop_id = stop_times.stop_id
  WHERE route_id = <route_id>;

通常,您还会为 JOINWHERE 子句中使用的每一列创建一个索引,在这种情况下意味着:

Typically you would also create an index for each column used in a JOIN or WHERE clause, which in this case would mean:

CREATE INDEX stop_times_trip_id_index ON stop_times(trip_id);

CREATE INDEX trips_route_id_index ON trips(route_id);

(请注意,RDBMS 通常通过主键自动索引每个表,因此无需在 stops.stop_id 上显式创建索引.)

(Note that RDBMSes normally index each table by its primary key automatically, so there is no need to explicitly create an index on stops.stop_id.)

许多进一步的优化是可能的,这取决于所使用的特定 DBMS 以及您为了性能而牺牲磁盘空间的意愿.但是这些命令几乎可以在任何 RDBMS 上产生良好的性能,而不会不必要地牺牲清晰度.

Many further optimizations are possible, depending on the specific DBMS in use and your willingness to sacrifice disk space for performance. But these commands will yield good performance on virtually any RDBMS without needlessly sacrificing clarity.

这篇关于如何使用 GTFS 列出与路线相关的所有停靠点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆