When did the bus arrived at that stop yesterday? This isn’t a question that transit riders ask often — but it is a question that analysts and planners often do ask.
Interline and the Metropolitan Transportation Commission have been archiving and processing the Regional GTFS Realtime feed in order to answer these questions.
What are stop observations?
“Stop observations” describes transit service as was delivered to riders: the arrival time of a vehicle at a designated stop, along with any additional contextual information available from GTFS and GTFS Realtime feeds.
Stop observations are in contrast with the stop_times.txt
file in the daily Regional Feed, which uses static GTFS data to capture transit service as it is planned and scheduled. This is also in contrast with the monthly Historic Regional Feed, which uses static GTFS data to capture transit service schedules, retrospectively, as they changed over an entire month (with a different schedule for each agency for each day).
Stop observations can be used to better understand the real-world realities of transit service as it fluctuates from day to day and location to location. Stop observations can be used to estimate on-time performance (also known as schedule adherence or punctuality) and to inform potential schedule adjustments by service planners. Stop observations can also be used by navigation app developers to provide their users with contextual information, such as an estimated range of times when a route is likely to depart from a given stop.
Measurement error and (im)precision
Stop observations are inferred from public GTFS Realtime data feeds (trip updates and vehicle positions) and therefore, there are many potential sources for measurement error. Potential sources of error include GPS error on transit vehicles, misassignment of vehicles to “runs” of a given trip and route, imprecision or data errors in the transit agency’s computer-aided dispatch/automatic vehicle location (CAD/AVL) system, issues in the transit agency’s GTFS Realtime emitting system, issues in 511.org’s GTFS Realtime aggregation system, or issues in the Regional Realtime GTFS Feed.
We’ll discuss how stop observations are calculated later in this blog post, to help users better understand stop observations as measurements. When considering potential applications for stop observations, we ask users to keep the limitations of this data source in mind.
What information is contained in a stop observation?
We are distributing stop observations in the Historic Regional Feed. When users request a Historic Regional Feed with stop observations for a given year and month, a stop_observations.txt
file will be included with all stop observations for that month.
The stop_observations.txt
file is a CSV file based on the GTFS-Performance draft proposal. GTFS-Performance is an extension to the GTFS specification prepared by Remix, Swiftly, and their partners.
The following table is based on the GTFS-Performance draft proposal and covers columns included in our implementation of stops_observations.txt
, as well as columns that are in the proposal but not in our implementation:
Field Name | Details | Regional GTFS stop observations support |
---|---|---|
trip_id
|
Identifies a trip.
When |
✅ |
trip_start_time |
See GTFS Realtime Required to specify the start time of a trip that’s not
defined in Static GTFS which occurs when using frequency-based service
defined by |
✅ (when provided in agency source RT feed) |
stop_id
|
Identifies a stop |
✅ (added by Interline for developer convenience) |
stop_sequence
|
Identifies the order of stops for a particular trip. |
✅ |
schedule_relationship
|
Valid options are:
|
✅ |
schedule_adherence_stop
|
Indicates if this observation is used for reporting purposes when a subset of stops are utilized, instead of all stops. |
No |
trip_start_date
|
The start date of this trip instance in |
✅ |
vehicle_id
|
Unique identifier for the vehicle that served this stop. This should match the value provided in real-time via |
✅ (when provided in agency source RT feed) |
scheduled_arrival_time
|
GTFS static scheduled arrival time |
✅ (added by Interline for developer convenience) |
observed_arrival_time
|
Observed arrival time at a specific stop for a specific trip
on a route. For times occurring after midnight on the service day, enter
the time as a value greater than |
✅ |
observed_departure_time
|
Observed departure time at a specific stop for a specific
trip on a route. For times occurring after midnight on the service day,
enter the time as a value greater than |
✅ (when available) |
new_departure_time
|
Required when |
No |
dwell_time_secs
|
Seconds spend boarding/alighting riders at the stop. |
No |
scheduled_dwell_time_secs
|
Seconds scheduled for boarding/alighting riders at the stop (when provided in agency source RT feed). |
✅ |
boardings
|
See GTFS-ride's board_alight.boardings |
No |
alightings
|
See GTFS-ride's board_alight.alightings |
No |
source
|
Valid options are:
|
No |
uncertainty
|
See |
✅ |
occupancy_status
|
See |
✅ (when provided in agency source RT feed) |
occupancy_percentage
|
See |
✅ (when provided in agency source RT feed) |
occupancy_type
|
See GTFS-ride's |
No |
occupancy_source
|
See GTFS-ride's |
No |
How are stop observations calculated?
Stop observations are derived retrospectively using an archive of the Regional GTFS Realtime Feed (updated every 15 seconds). Once a day, our workflow processes GTFS Realtime trip updates and vehicle positions from the archive, combined together with the day’s static GTFS daily regional feed.
Deriving stop observations from trip updates: When an agency supplies GTFS Realtime trip updates, our workflow identifies the final TripUpdate
sent before a vehicle reaches a given stop. From that record, our workflow extracts the most recent StopTimeUpdate
. Here is an example showing three trip updates fetched from BART. The first two trip updates are before the train reaches Montgomery Station; the third trip update is after the train departs Montgomery Station. The portion of all of the records that is use to create a stop observation is marked with the orange box:
Deriving stop observations from vehicle positions: When an agency supplies vehicle positions for a given route, our workflow also identifies the last update sent before the stop location. To do this, we need to consider the geometry of the route alignment from shapes.txt
in the static RG feed. The workflow snaps the latitude/longitude point from each vehicle position to the route alignment, enabling it to determine from which update to take the StopTimeUpdate
portion. Like many geometric algorithms, this approach can be stymied by complicated paths, such as looping routes that visit the same station multiple times. Our workflow can take advantage of stop sequence, when specified. Still, the algorithm can only do its best to estimate when the route alignment is complicated and/or the update frequency is infrequent.
By being able to derive stop observations from two different parts of GTFS Realtime feeds, we have also been able to test and tune our workflow, to ensure that the two approaches agree when source feeds are of high quality.
Related data specifications
Our stop observations data model is based on the GTFS-Performance proposal. The GTFS-Performance proposal is, in turn, inspired by the larger TCRP G-18 project. The GTFS-Performance stop_visits.txt
is supposed to represent a subset of the TCRP G-18 stop_visits.txt
file. Thank you to the creators of GTFS-Performance and to the G-18 research team.
GTFS-Performance and TCRP G-18 go well beyond the functionality that we have implemented in our system for producing stop observations, to also address more internal operating concerns of transit agencies. We have intentionally implemented our system to create stop observations using only publicly available GTFS and GTFS Realtime inputs.
Accessing stop observations
Stop observations are now included in historical monthly RGs. To download a historical monthly RG including stop observations, make an HTTP request like so:
wget https://api.511.org/transit/datafeeds?operator_id=RG&historic=2022-03-so&api_key=xxx
Replace 2022-03
with a given year and month. Note the addition of -so
to request the inclusion of stop observations. Note that wget
is a Unix command that can be replaced by any other utility to download the URL; also note that xxx
should be replaced by your own 511.org Open Data API key.
Monthly RGs without stop observations are roughly 50Mb in size; with stop observations, the monthly RG will be roughly 500Mb to download as a compressed ZIP archive and then roughly 2Gb of uncompressed CSV files.
Stop observations are available on a more frequent basis to staff at MTC and Bay Area transit agencies. Transit agencies may contact the 511 SF Bay Data Portal at transitdata@511.org for more information.
Using stop observations to understand transit
Public-transit agencies can be hesitant to release this type of data. In part, it’s a challenge to assemble operations data in an ongoing and consistent manner. In part, it’s a risk that the general public will take operations data out of context.
While we began this blog post by asking When did the bus arrived at that stop yesterday?, we hope that users of stop observations will ask an even wider range of questions of this dataset, especially questions that can explore the complexity of daily operations for over 20 agencies operating buses, trains, subways, and ferries across the Bay Area and questions that can help to improve the experience for riders using navigation apps and similar software.