GTFS · · 6 min read

Observed stop arrivals and departures through the SF Bay Area

Interline and the Metropolitan Transportation Commission are now releasing monthly "stop observation" datasets derived from the Regional GTFS Realtime Feed.

When did the bus arrived at that stop yesterday? This isn’t a question that transit riders ask often — but it is a question that analysts and planners often do ask.

Interline and the Metropolitan Transportation Commission have been archiving and processing the Regional GTFS Realtime feed in order to answer these questions.

What are stop observations?

“Stop observations” describes transit service as was delivered to riders: the arrival time of a vehicle at a designated stop, along with any additional contextual information available from GTFS and GTFS Realtime feeds.

Stop observations are in contrast with the stop_times.txt file in the daily Regional Feed, which uses static GTFS data to capture transit service as it is planned and scheduled. This is also in contrast with the monthly Historic Regional Feed, which uses static GTFS data to capture transit service schedules, retrospectively, as they changed over an entire month (with a different schedule for each agency for each day).

Stop observations can be used to better understand the real-world realities of transit service as it fluctuates from day to day and location to location. Stop observations can be used to estimate on-time performance (also known as schedule adherence or punctuality) and to inform potential schedule adjustments by service planners. Stop observations can also be used by navigation app developers to provide their users with contextual information, such as an estimated range of times when a route is likely to depart from a given stop.

Measurement error and (im)precision

Stop observations are inferred from public GTFS Realtime data feeds (trip updates and vehicle positions) and therefore, there are many potential sources for measurement error. Potential sources of error include GPS error on transit vehicles, misassignment of vehicles to “runs” of a given trip and route, imprecision or data errors in the transit agency’s computer-aided dispatch/automatic vehicle location (CAD/AVL) system, issues in the transit agency’s GTFS Realtime emitting system, issues in 511.org’s GTFS Realtime aggregation system, or issues in the Regional Realtime GTFS Feed.

We’ll discuss how stop observations are calculated later in this blog post, to help users better understand stop observations as measurements. When considering potential applications for stop observations, we ask users to keep the limitations of this data source in mind.

What information is contained in a stop observation?

We are distributing stop observations in the Historic Regional Feed. When users request a Historic Regional Feed with stop observations for a given year and month, a stop_observations.txt file will be included with all stop observations for that month.

The stop_observations.txt file is a CSV file based on the GTFS-Performance draft proposal. GTFS-Performance is an extension to the GTFS specification prepared by Remix, Swiftly, and their partners.

The following table is based on the GTFS-Performance draft proposal and covers columns included in our implementation of stops_observations.txt, as well as columns that are in the proposal but not in our implementation:

Field Name Details Regional GTFS stop observations support
trip_id

Identifies a trip.

When schedule_relationship=DUPLICATED, then this references the trip that is copied, like GTFS Realtime TripUpdate.trip.trip_id.

trip_start_time

See GTFS Realtime TripDescriptor.start_time.

Required to specify the start time of a trip that’s not defined in Static GTFS which occurs when using frequency-based service defined by frequencies.txt or when an ad-hoc added service based on an existing trip via TripUpdate.tripschedule_relationship=DUPLICATED.

✅ (when provided in agency source RT feed)

stop_id

Identifies a stop

✅ (added by Interline for developer convenience)
stop_sequence

Identifies the order of stops for a particular trip.

schedule_relationship

Valid options are:

  • 0 - SCHEDULED
  • 1 - SKIPPED
  • 2 - NO_DATA
  • 3 - UNSCHEDULED
  • 4 - CANCELED
  • 5 - DUPLICATED
  • 6 - MODIFIED

schedule_adherence_stop

Indicates if this observation is used for reporting purposes when a subset of stops are utilized, instead of all stops.

No

trip_start_date

The start date of this trip instance in YYYYMMDD format to specify the service date of the corresponding trip. This should match the value in TripDescriptor.start_date.

vehicle_id

Unique identifier for the vehicle that served this stop. This should match the value provided in real-time via VehicleDescriptor.id.

✅ (when provided in agency source RT feed)

scheduled_arrival_time

GTFS static scheduled arrival time

✅ (added by Interline for developer convenience)

observed_arrival_time

Observed arrival time at a specific stop for a specific trip on a route. For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.

observed_departure_time

Observed departure time at a specific stop for a specific trip on a route. For times occurring after midnight on the service day, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.

✅ (when available)

new_departure_time

Required when schedule_relationship=DEPARTURE_TIME_MODIFIED.

No

dwell_time_secs

Seconds spend boarding/alighting riders at the stop.

No

scheduled_dwell_time_secs

Seconds scheduled for boarding/alighting riders at the stop (when provided in agency source RT feed).

boardings

See GTFS-ride's board_alight.boardings

No

alightings

See GTFS-ride's board_alight.alightings

No

source

Valid options are:

  • 0 - Manual
  • 1 - Door Sensor
  • 2 - Estimated using GPS/AVL data

No

uncertainty

See StopTimeEvent.uncertainty.

occupancy_status

See VehiclePosition.occupancy_status.

✅ (when provided in agency source RT feed)

occupancy_percentage

See VehiclePosition.occupancy_percentage.

✅ (when provided in agency source RT feed)

occupancy_type

See GTFS-ride's board_alight.load_type.

No

occupancy_source

See GTFS-ride's board_alight.source.

No

How are stop observations calculated?

Stop observations are derived retrospectively using an archive of the Regional GTFS Realtime Feed (updated every 15 seconds). Once a day, our workflow processes GTFS Realtime trip updates and vehicle positions from the archive, combined together with the day’s static GTFS daily regional feed.

Deriving stop observations from trip updates: When an agency supplies GTFS Realtime trip updates, our workflow identifies the final TripUpdate sent before a vehicle reaches a given stop. From that record, our workflow extracts the most recent StopTimeUpdate. Here is an example showing three trip updates fetched from BART. The first two trip updates are before the train reaches Montgomery Station; the third trip update is after the train departs Montgomery Station. The portion of all of the records that is use to create a stop observation is marked with the orange box:

illustration of three GTFS Realtime feed trip updates, showing the portion of stop time updates retained to create a stop observation record

Deriving stop observations from vehicle positions: When an agency supplies vehicle positions for a given route, our workflow also identifies the last update sent before the stop location. To do this, we need to consider the geometry of the route alignment from shapes.txt in the static RG feed. The workflow snaps the latitude/longitude point from each vehicle position to the route alignment, enabling it to determine from which update to take the StopTimeUpdate portion. Like many geometric algorithms, this approach can be stymied by complicated paths, such as looping routes that visit the same station multiple times. Our workflow can take advantage of stop sequence, when specified. Still, the algorithm can only do its best to estimate when the route alignment is complicated and/or the update frequency is infrequent.

By being able to derive stop observations from two different parts of GTFS Realtime feeds, we have also been able to test and tune our workflow, to ensure that the two approaches agree when source feeds are of high quality.

Our stop observations data model is based on the GTFS-Performance proposal. The GTFS-Performance proposal is, in turn, inspired by the larger TCRP G-18 project. The GTFS-Performance stop_visits.txt is supposed to represent a subset of the TCRP G-18 stop_visits.txt file. Thank you to the creators of GTFS-Performance and to the G-18 research team.

GTFS-Performance and TCRP G-18 go well beyond the functionality that we have implemented in our system for producing stop observations, to also address more internal operating concerns of transit agencies. We have intentionally implemented our system to create stop observations using only publicly available GTFS and GTFS Realtime inputs.

Accessing stop observations

Stop observations are now included in historical monthly RGs. To download a historical monthly RG including stop observations, make an HTTP request like so:

wget https://api.511.org/transit/datafeeds?operator_id=RG&historic=2022-03-so&api_key=xxx

Replace 2022-03 with a given year and month. Note the addition of -so to request the inclusion of stop observations. Note that wget is a Unix command that can be replaced by any other utility to download the URL; also note that xxx should be replaced by your own 511.org Open Data API key.

Monthly RGs without stop observations are roughly 50Mb in size; with stop observations, the monthly RG will be roughly 500Mb to download as a compressed ZIP archive and then roughly 2Gb of uncompressed CSV files.

Stop observations are available on a more frequent basis to staff at MTC and Bay Area transit agencies. Transit agencies may contact the 511 SF Bay Data Portal at transitdata@511.org for more information.

Using stop observations to understand transit

Public-transit agencies can be hesitant to release this type of data. In part, it’s a challenge to assemble operations data in an ongoing and consistent manner. In part, it’s a risk that the general public will take operations data out of context.

While we began this blog post by asking When did the bus arrived at that stop yesterday?, we hope that users of stop observations will ask an even wider range of questions of this dataset, especially questions that can explore the complexity of daily operations for over 20 agencies operating buses, trains, subways, and ferries across the Bay Area and questions that can help to improve the experience for riders using navigation apps and similar software.

Read next