Data Streams

Posted on Posted in Technology

Our effort to encourage public transport use in Kuwait involves a data collection and management challenge. Additionally, after collecting data about bus routes, stops, depots and schedules we aim to produce some information products. Examples of these are:

  1. Maps
  2. A routing system for the bus network linking you and your destination
  3. Timetable

The aim of all this? So you can use these information products, derived from these data, to potentially reduce unnecessary car trips. For example, to determine if you can catch the bus to work or not.

So why does this post have ‘Stream’ in the title? Stream implies flow, but you might not think data usually flows. Well, it’s an analogy for the fact that we are gradually building a complete set of the data. For example, we have a stream of stop position and name data coming in. We also have a stream of scheduling data coming in. If we could not have simply downloaded all the bus routes from KPTC, then we would have a stream of route data as well, as people gradually traversed all routes with a GPS recording a track. So, what are the data streams on this project?

  1. Stop positions
  2. Route schedules
  3. 360° imagery of all routes

Let’s describe each one, and the quality assurance and quality control (QAQC) processes used to make sure we get good data so we can derive great information products from them.

Stop Position Data Stream

The Stop Position Data Stream is the most labour intensive to collect. We have to traverse each route looking for stops, then use a GPS to record, for each one:

  1. The position and name of each one
  2. The name and/or ID of each one
  3. The status of each stop (broken, shelter or unknown denoted with codes of b, s and u respectively)
  4. Optionally, the route(s) servicing the stop, based on what the sign states
  5. Optionally, a photo of the stop although this is also captured in our 360° imaging data stream.

It can take some time to record this information at each stop, therefore we prefer to collect this information with a driver and co-driver. Additionally, sometimes it is unsafe to stop on the road and get all the information, so we leave the road at the next opportunity then walk or drive back on a side road to the stop. What tool do we use to collect data from this stream? Simply a smartphone running the My Maps app. We then open our Kuwait Bus Routes map in this app and add new points in the stops layer. This is a great way to do it because the map allows everyone to see our progress in real time. We can also download the stops for further analysis, as shown in the last post. When you click a stop on the map you will see the stop ID and if we collected other data it will be separated by a comma. For example, 341,s means the stop ID is 341 and it has a shelter as well. Sometimes the sign or the shelter is broken, in which case we would put 341,s,b. It’s a simple system optimised for efficient data collection as we don’t want to spend more than a few seconds at each stop.

So far we have collected 355 stops along only ~5.3 routes. Given the 999, 105, 106, 66, 602, 24 (partial) routes have a length of (55+63.1+54.6+34.4+30.8+(35*0.3))=248.4km, this means there is a stop roughly every 700m. Note, however, that some of these routes overlap, therefore the length used is wrong. Another blog post will cover what the true length is, but by eye it appears there is about 20% overlap, so the total length is only about 198.72, for an average distance between stops of 560m. Given we know the total network length is 1433km, we can anticipate the length without overlapping is 1146.4km, meaning we will have to collect about 641 stops, or an average of 22 stops per route. Given we have already collected 355 stops, we are 55% done with this data stream. Furthermore, we started collecting stops in January. It is now April. To have completed 55% of the task in 4 months implies we will take about 8 months to complete the task and so should be done by August.

Route Schedules Data Stream

Whilst stops are the most labour intensive, schedules are the most frustrating. There is no publicly available bus timetable. The bus drivers either do not know, do not divulge, or give differing information about bus frequency. The depot staff as well. There is a lack of discipline amongst bus drivers regarding keeping to a schedule because they stop anywhere, thereby changing the time required to complete a route compared to stopping only at stops. Due to this problem, we are having to simply ask as many people as we can about bus schedules, and report it to users with a disclaimer. We also stand at bus stops or along routes with a timer to get the frequency. For example, yesterday on Friday 14th April it was found that the 40 bus took 1 hour and 15 minutes between services past Gate Mall in Egaila. There was no other way to get this information at the time than simply standing there and waiting for two buses to go past. Lack of comprehensive, trustworthy schedule information was found to be the top priority issue in a survey we took of public opinion about buses in Kuwait. Our frustrations with this data stream match what this survey said. We would greatly appreciate assistance from bus companies in providing their planned schedules, at least, with us. Then again, given the lack of discipline of drivers this information product would at present be misleading. At least a rough frequency per hour would be useful though. This is what we feel is adequate for this data stream given present circumstances.

360° Imagery Data Stream

You have probably heard of Google Streetview. There are at least two crowdsourced versions of this, however, Mapillary and OpenStreetCam. We are driving all routes with a LG 360 Cam (R 105) set to take a geotagged shot every 2 seconds. This means, at 45km/h we take a 360° image every 25m. As you can see on Mapillary when you hit the play button up the top after clicking on a route, this leads to a compelling visualisation of the streetscape. Enjoy panning all around the car! So far we have uploaded to Mapillary ~13,700 spherical photos over a distance of 476km. Going by the abovementioned a rough estimate of the route network, we are 42% done with this data stream and therefore look to be finishing by about August. Fun fact, each image is about 4MB so the total uploaded so far is 56GB 🙂 We would prefer to upload to OpenStreetCam as it is truly open source in that one can download again all images one has uploaded with the coordinates attached to each image. The only thing is they’ve only recently hired someone to improve the service so it can handle spherical imagery. Once they’re done, we’ll start uploading there.

Quality Assurance and Quality Control

An example of how this data stream can help us is in QAQC of our stop position data. For example, this image shows a stop with a shelter. One can see that we currently do not have this stop in our map as there is no dot on the map just after Street 19 on Cairo Street in Qadsiya. This brings up the topic of project management.

Project Management

The problem of a missing point in our stops dataset whilst already having a stop image brings up a coordination problem we are having. We will do a lot of double work in terms of traversing all routes for stops and imagery if we do not get them both in one pass. Essentially, it could be done in about 1000km or double that. It could be done in 4 months or double that. This brings up the issue of project management, a programme of works and a critical path. These topics will be covered in future blog posts about managing this project effectively. How do we stand though, after 4 months of the current project management approach, what do our information management products look like?

Information Products

So far we have produced two information products – a map with the routes and stops on it and 360° imagery for 400+km of Kuwait’s roads. The map is at 11,000+ views and is 2nd in Google search results for ‘Kuwait bus routes’. This is not bad, after 4 months of effort we are the second top result to a very simple blog post about bus routes which has been around for years. The imagery is also a compelling product in terms of helping people to see where routes go. For example they can use Mapillary to check if a shop they were wanting to visit via a bus route is actually visible from the street and what it might look like if they were looking for it from the road. There is still a big problem, however, which is that we need to make a mashup of these two information products so that the imagery can function from within our route map. This will be discussed in future posts about our ambitions for building an application from these data streams so that we have a truly useful, integrated bus route information product for Kuwait.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax