如何使用GTFS列出与路线相关的所有站点? [英] How can I list all the stops associated with a route using GTFS?

查看:127
本文介绍了如何使用GTFS列出与路线相关的所有站点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一些 GTFS 数据,并希望能够创建由路线服务的所有站点的列表。我不太了解如何处理GTFS数据。



Trips.txt采用如下格式:



route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id
1,A20120610WKD,A20120610WKD_000800_1..S03R,SOUTH FERRY,1,,1..S03R
1,A20120610WKD,A20120610WKD_002700_1..S03R,SOUTH FERRY,1,1..S03R
1,A20120610WKD,A20120610WKD_004700_1..S03R,SOUTH FERRY,1,1..S03R
1,A20120610WKD, A20120610WKD_006700_1..S03R,SOUTH FERRY,1,1..S03R
1,A20120610WKD,A20120610WKD_008700_1..S03R,SOUTH FERRY,1,,1..S03R



我尝试使用shape_id读取匹配形状,然后寻找具有匹配纬度和经度的停止点,但似乎并不可靠。是否有人知道如何做到这一点?

解决方案

正如你已经注意到的,路线和站点之间没有直接的关系在GTFS中。相反,站点与车次相关联,其中每个车次代表沿着特定路线的车辆的单次运行。这反映了一个事实,一条路线并不一定为每一站都服务 - 例如周末它可能会跳过高中以外的站点。



因此,获得一个路线服务的每个站点的列表包括组合几个模型:


  • routes.txt
  • trips.txt 为您提供一组行程ID
  • stop_times.txt 为您提供了在每次出行时提供的停靠点的一组停车ID。
  • li>
  • stops.txt 会为您提供有关这些停靠点的信息。



假设你使用SQL数据库来存储你的GTFS数据,你可以使用这样的查询(一旦你获得了路由ID):

pre $ SELECT stop_id,stop_name FROM stops WHERE stop_id IN(
SELECT DISTINCT stop_id FROM stop_times WHERE trip_id IN(
SELECT t rip_id FROM trips WHERE route_id =< route_id>));

请记住,这将为每个 由路线服务。如果您为骑手生成计划信息,那么您可能希望将查询限制为仅在今天运行,并且仅限于在接下来的三十分钟内离开的停车时间。






更新:我按照我的方式编写了上述SQL查询,因为我觉得它最简单地说明了GTFS模型之间的关系,但btse是正确的(在他的回答中),这样的查询永远不会真正用于生产。这太慢了。您可以使用表连接和索引来保持查询时间合理。



这是一个等价的查询,用更适合于复制并粘贴到真实应用程序的方式编写:

  SELECT DISTINCT stop.stop_id,stops.stop_name $ b $ FROM trips 
INNER JOIN stop_times ON stop_times。 trip_id = trips.trip_id
INNER JOIN停止ON stops.stop_id = stop_times.stop_id
WHERE route_id =< route_id>;

通常,您还可以为 JOIN WHERE 子句,在这种情况下意味着:

  CREATE INDEX stop_times_trip_id_index ON stop_times(trip_id); 

CREATE INDEX trips_route_id_index ON旅行(route_id);

(请注意,RDBMS通常会自动为其主键索引每个表,因此不需要明确在 stops.stop_id 中创建索引。)



根据所用的特定DBMS,可能会进行更多的优化并且愿意牺牲磁盘空间以提高性能。但是这些命令几乎可以在任何RDBMS上产生良好的性能,而不会不必要地牺牲清晰度。

I'm working with some GTFS data and would like to be able to create a list of all stops associated served by a route. I don't really understand how to do with with GTFS data.

Trips.txt comes in a format like this:

route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id 1,A20120610WKD,A20120610WKD_000800_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_002700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_004700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_006700_1..S03R,SOUTH FERRY,1,,1..S03R 1,A20120610WKD,A20120610WKD_008700_1..S03R,SOUTH FERRY,1,,1..S03R

I tried reading in the matching shape using the shape_id and then looking for stops with matching latitudes and longitudes but that doesn't seem to work reliably. Does anybody know how to do this?

解决方案

As you've noticed, there isn't a direct relationship between routes and stops in GTFS. Instead, stops are associated with trips, where each trip represents a single "run" of a vehicle along a particular route. This reflects the fact a route does not necessarily serve every one of its stops at all times—on weekends it might skip stops outside a high school, for instance.

So getting a list of every stop served by a route involves combining several models:

  • routes.txt gives you the route ID for the route you're interested in.
  • trips.txt gives you a set of trip IDs for that route.
  • stop_times.txt gives you a set of stop IDs for the stops served on each of these trips.
  • stops.txt gives you information about each of these stops.

Assuming you're using an SQL database to store your GTFS data, you might use a query like this (once you've obtained the route ID):

SELECT stop_id, stop_name FROM stops WHERE stop_id IN (
  SELECT DISTINCT stop_id FROM stop_times WHERE trip_id IN (
    SELECT trip_id FROM trips WHERE route_id = <route_id>));

Remember, though, this will output a record for every stop that is ever served by the route. If you're generating schedule information for a rider you'll probably want to limit the query to only trips running today and only stop times with departures in, say, the next thirty minutes.


Update: I wrote the above SQL query the way I did as I felt it most simply illustrated the relationship between the GTFS models, but btse is correct (in his answer below) that a query like this would never actually be used in production. It's too slow. You would instead use table joins and indices to keep query times reasonable.

Here is an equivalent query, written in a way more suited to being copied and pasted into a real application:

SELECT DISTINCT stops.stop_id, stops.stop_name
  FROM trips
  INNER JOIN stop_times ON stop_times.trip_id = trips.trip_id
  INNER JOIN stops ON stops.stop_id = stop_times.stop_id
  WHERE route_id = <route_id>;

Typically you would also create an index for each column used in a JOIN or WHERE clause, which in this case would mean:

CREATE INDEX stop_times_trip_id_index ON stop_times(trip_id);

CREATE INDEX trips_route_id_index ON trips(route_id);

(Note that RDBMSes normally index each table by its primary key automatically, so there is no need to explicitly create an index on stops.stop_id.)

Many further optimizations are possible, depending on the specific DBMS in use and your willingness to sacrifice disk space for performance. But these commands will yield good performance on virtually any RDBMS without needlessly sacrificing clarity.

这篇关于如何使用GTFS列出与路线相关的所有站点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆