Sydney new public transport timetable
Transport Data Exchange (TDX) program
You can download current timetable (and traffic) from TDX. Registration is required.
General Transit Feed Specification (GTFS)
GTFS contains several files saved as comma-delimited text and properly escaped i.e. CSV. File encoding should be UTF-8 and BOM is acceptable. The minimal feed consist of 6 files:
File | Defines |
---|---|
agency.txt | One or more transit agencies that provide the data in this feed |
stops.txt | Individual locations where vehicles pick up or drop off passengers |
routes.txt | Transit routes. A route is a group of trips that are displayed to riders as a single service |
trips.txt | Trips for each route. A trip is a sequence of two or more stops that occurs at specific time |
stop_times.txt | Times that a vehicle arrives at and departs from individual stops for each trip |
calendar.txt | Dates for service IDs using a weekly schedule. Specify when service starts and ends, as well as days of the week where service is available |
GTFS dataset structure
// you can use different engines to render chunks
// see http://yihui.name/knitr/demo/engines/
graph gtfs {
node [shape = box]
subgraph cluster_1 {
label = "Transport Data Exchange (TDX) program"
agency -- routes [label = "agency_id", constraint=false]
routes -- trips [label = "route_id"]
trips -- stop_times [label = "trip_id"]
stop_times -- stops [label = "stop_id"]
shapes -- trips [label = "shape_id"]
calendar -- trips [label = "service_id"]
calendar_dates -- trips [label = "service_id"]
}
stops -- transfers [label = "stop_id"]
fare_attributes -- fare_rules [label = "fare_id"]
fare_rules -- routes [label = "route_id"]
}
General Transit Feed Specification (GTFS) is documented at Google Developers web site.
BOM (byte-order-mark)
If your imported data set has the first column beginning with 'X.' then the input file contains BOM. Fortunately, this was fixed in R 3.0:
readLines() and scan() (and hence read.table()) in a UTF-8 locale now discard a UTF-8 byte-order-mark (BOM). Such BOMs are allowed but not recommended by the Unicode Standard: however Microsoft applications can produce them and so they are sometimes found on websites.The encoding name “UTF-8-BOM” for a connection will ensure that a UTF-8 BOM is discarded.
make.names("\357\273\277stop_id")
## [1] "X.stop_id"
In older versions of R you need to rename the first column or remove BOM manually
Extent of Greater Sydney public transport area
GTFS dataset must have stops.txt file which contains stop location (WGS84), name and several other attributes. Let's read the previous timetable:
old <- gtfs.load("data/Oct2012")
stop_id | stop_code | stop_name | stop_lat | stop_lon | location_type | parent_station | wheelchair_boarding |
---|---|---|---|---|---|---|---|
254122 | 254122 | Worrigee House Reception Centre Worrigee Rd | -34.91 | 150.6 | NA | NA | 0 |
254112 | 254112 | Nowra High School Moss St | -34.87 | 150.6 | NA | NA | 0 |
254172 | 254172 | Princes Hwy Before Central Av | -34.91 | 150.6 | NA | NA | 0 |
254162 | 254162 | Princes Hwy Near Hillcrest Av | -34.90 | 150.6 | NA | NA | 0 |
254142 | 254142 | Illaroo Road Ps | -34.86 | 150.6 | NA | NA | 0 |
2540317 | 2540317 | Park Row Near Flora St | -34.91 | 150.8 | NA | NA | 0 |
and the new timetable:
new <- gtfs.load("data/Oct2013")
stop_id | stop_code | stop_name | stop_lat | stop_lon | location_type | parent_station | wheelchair_boarding |
---|---|---|---|---|---|---|---|
254122 | 254122 | Worrigee House Reception Centre Worrigee Rd | -34.91 | 150.6 | NA | NA | 0 |
254112 | 254112 | Nowra High School Moss St | -34.87 | 150.6 | NA | NA | 0 |
254172 | 254172 | Princes Hwy Before Central Av | -34.91 | 150.6 | NA | NA | 0 |
254162 | 254162 | Princes Hwy Near Hillcrest Av | -34.90 | 150.6 | NA | NA | 0 |
254142 | 254142 | Illaroo Road Ps | -34.86 | 150.6 | NA | NA | 0 |
2540317 | 2540317 | Park Row Near Flora St | -34.91 | 150.8 | NA | NA | 0 |
The sp package has gConvexHull function which returns the smallest convex polygon that contains all subgeometries, in our case all transit stops.
polygons <- rbind(
data.frame(gtfs.bbox(new, "convex"), type = "new"),
data.frame(gtfs.bbox(old, "convex"), type = "old")
)
Apparently, public transport coverage remains unchanged under the new time table.
# get base map
# base map is 640x640 (Google API limits) and is centered at [mean(x), mean(y)] and zoom level is 7
base <- with(polygons, get_map(paste(mean(y), mean(x), sep = " "), zoom = 7, color = "bw"))
# zoom in and keep margins around data points &uot; polygons
bbox <- coord_map(
xlim = extendrange(polygons$x),
ylim = extendrange(polygons$y)
)
# final map: bounding rectangles and transit stops
ggmap(base) +
geom_path(data = polygons, aes(x, y, colour = type), size = 1) +
scale_color_manual(values = c("blue", "red")) +
geom_point(data = new[["stops"]], aes(x = stop_lon, y = stop_lat), size = 0.5, colour = "black") + # transit stops in black
bbox +
ggtitle("Greater Sydney public transport area")
Network overview
Shape data can be used to display densidity of public transport across all routes. Shape file mirros roads and tracks.new.segments <- compute.segments(new[["shapes"]])
g <- ggplot(data = new.segments) +
geom_segment(aes(x = x1, xend = x2, y = y1, yend = y2, colour = cut(count,
breaks = c(0, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000),
labels = c("1", "10", "20", "50", "100", "200", "500", "1000", "2000")
))) +
scale_colour_brewer(name = "Services", palette = "PuBuGn", guide = "legend")
g + bbox + ggtitle("Network overview\n(new timetable)")
and Inner Sydney area:
g + coord_equal(xlim = c(151, 151.35), ylim = c(-33.95, -33.75)) +
ggtitle("Network overview\n(new timetable)")
Basic network statistics
Transit Capacity and Quality of Service Manual TCQSM, 3rd edition provides instructions on how to quantitatively review everything from the person-capacity of a rail transit line to the quality of a bus stop. Most of the calculations are complex and in many cases external datasets (e.g. census data) are required to perform those calculations.
GTFS dataset provided by TDX programme is rather a shapshot rather than a complete timetable. Hence, only single workday from each timetable (old and new) is analysed. The following days are used:
- old timetable: 2013-10-09
- new timetable: 2013-11-06
old.singleday <- gtfs.trips(old, as.POSIXct("2013-10-09"))
new.singleday <- gtfs.trips(new, as.POSIXct("2013-11-06"))
latlon <- old[["stops"]][, c("stop_id", "stop_lat", "stop_lon")]
# add location info to transit stops
old.singleday <- merge(old.singleday, latlon)
old.singleday$time <- with(old.singleday,
truncDiffTime(min(arrival_time, na.rm = T), arrival_time) %% 86400
)
old.singleday <- old.singleday[!is.na(old.singleday$arrival_time), ]
new.singleday <- merge(new.singleday, latlon)
new.singleday$time <- with(new.singleday,
truncDiffTime(min(arrival_time, na.rm = T), arrival_time) %% 86400
)
new.singleday <- new.singleday[!is.na(new.singleday$arrival_time), ]
# http://www.csgnetwork.com/degreelenllavcalc.html
# Selected Specific Latitude = -33.8230854940154
# 1deg lat -> 110 919.17m
# 1deg lon -> 92 575.84m
# this give us 4km hex bins
lat.bin.width <- 4000 / 110919.17
lon.bin.width <- 4000 / 92575.84
xbnds <- c(
floor(min(polygons$x) / lon.bin.width) * lon.bin.width,
ceiling(max(polygons$x) / lon.bin.width) * lon.bin.width
)
ybnds <- c(
floor(min(polygons$y) / lat.bin.width) * lat.bin.width,
ceiling(max(polygons$y) / lat.bin.width) * lat.bin.width
)
registerDoMC() # 4 cores, takes ~ 2 x 20 minutes
old.bins <- compute.hex.bins(old.singleday, xbnds, ybnds, lat.bin.width, lon.bin.width)
new.bins <- compute.hex.bins(new.singleday, xbnds, ybnds, lat.bin.width, lon.bin.width)
new.old.bins <- with(
merge(old.bins, new.bins, by = c("ID", "time"), all = T, suffixes = c(".old", ".new")),
data.frame(
ID = ID,
time = time,
count.old = ifelse(is.na(count.old), 0, count.old),
count.new = ifelse(is.na(count.new), 0, count.new),
count.diff = ifelse(is.na(count.new), 0, count.new) - ifelse(is.na(count.old), 0, count.old),
lat = ifelse(is.na(y.old), y.new, y.old),
lon = ifelse(is.na(x.old), x.new, x.old),
change.old.new = interaction(is.na(y.old), is.na(y.new))
)
)
Sydney transport network has three modes: bus, train, and ferry.
Old timetable | New timetable | |
---|---|---|
Number of trips | 42076 | 49123 |
- Bus | 38548 | 45300 |
- Train | 2776 | 2901 |
- Ferry | 752 | 922 |
Longest trip | ||
- Bus | Sydney to Port Stephens | Sydney to Port Stephens |
- Train | Blue Mountains Line | Blue Mountains Line |
- Ferry | Eco Hopper | Eco Hopper |
ggplot(data = old.summary, aes(y = travel_time, x = distance)) +
geom_point(aes(colour = route_type), alpha = 0.7) +
geom_abline(intercept = 0, slope = 3600/31, linetype = "dotted") + # AM peak
geom_abline(intercept = 0, slope = 3600/42, linetype = "dotted") + # PM peak
scale_y_continuous(breaks = seq(0, 5, 1) * 3600, labels = seq(0, 5)) +
xlim(0, 250) +
facet_grid(route_type ~ .) +
ylab("Travel time [hours]") +
xlab("Distance [km]") +
ggtitle("Old timetable\n(cars: AM peak: 31km/h, PM peak: 42km/h)")
ggplot(data = new.summary, aes(y = travel_time, x = distance)) +
geom_point(aes(colour = route_type), alpha = 0.7) +
geom_abline(intercept = 0, slope = 3600/31, linetype = "dotted") +
geom_abline(intercept = 0, slope = 3600/42, linetype = "dotted") +
scale_y_continuous(breaks = seq(0, 5, 1) * 3600, labels = seq(0, 5)) +
xlim(0, 250) +
facet_grid(route_type ~ .) +
ylab("Travel time [hours]") +
xlab("Distance [km]") +
ggtitle("New timetable\n(cars: AM peak: 31km/h, PM peak: 42km/h)")
Public transport availability during a workday
# collect all images and create animation
opts_chunk$set(animation.fun = hook_scianimator)
values <- unique(c(seq(min(new.old.bins$count.diff), 0, length.out = 6), seq(0, max(new.old.bins$count.diff), length.out = 6)))
mid = abs(min(new.old.bins$count.diff) / max(new.old.bins$count.diff))
rescaled <- unique(c(
rescale(seq(min(new.old.bins$count.diff), 0, length.out = 6), c(0, mid)),
rescale(seq(0, max(new.old.bins$count.diff), length.out = 6), c(mid, 1))
))
pal <- brewer.pal(11, "RdYlBu")
pal[5] <- brewer.pal(9, "Greens")[4]
# display.brewer.all()
g <- scale_fill_gradientn(
limits = range(values),
colours = pal,
values = rescaled,
breaks = round(values, -2),
labels = round(values, -2),
name="Difference in available services\n(reduced .. same .. increased)"
)
# l_ply can be used here as well
anim <- llply(sort(unique(new.old.bins$time)), function(t, d, grad) {
ggmap(base) +
geom_hex(data = d[d$time == t, ], stat="identity", aes(x = lon, y = lat, fill = count.diff), alpha = 0.8) +
grad + bbox +
ggtitle(format(trunc(Sys.time(), "day") + t, "Public transport availability at %H:%M")) +
theme(legend.position="bottom")
}, d = new.old.bins, grad = g)
print(anim)