Some thoughts on the Myki data leak

This, the other week, was interesting:

In a concerning revelation, researchers have found that myki, in conjunction with social media, can be used to uncover a wealth of information about card users.

ABC: ‘Shocking’ myki privacy breach for millions of users in data release

Here’s the report and media release from the Office of the Victorian Information Commissioner:

Information Commissioner investigates breach of myki usersโ€™ privacy

Here’s the original study:

Two data points enough to spot you in open transport records

What happened was that PTV released a whole bunch of Myki touch on/off data for a “datathon” event, where people see what handy things they can do with the data.

It was “de-identified” – that is, Myki card numbers were removed and replaced with another identifier, which could link trips from a single card together, but not back to a card holder.

Or so they thought.

Part of the problem was they left in a flag indicating the card type. This is not just Full Fare (Adult) or Concession – it goes down to the precise type of Concession or free pass. For instance type 39 is a War Veterans Travel Pass; type 46 is a Federal Police Travel Pass.

With more than 70 types of card, some of the more obscure types are pretty rare, so if the person you’re trying to track down is using one of them, they’re probably not that hard to find, particularly if you know which stations they regularly use.

That’s presumably how the researchers found Anthony Carbines, State MP for Ivanhoe, I’m guessing travelling on a State Parliamentarian Travel Pass – by looking at the data, and matching it up with his social media posts, which included at least one from Rosanna Station.

I’m probably in there too. And so are you. (I’ve only seen a sample of the data; a mere 30 million card touch records out of the total 1.8 billion originally released.)

Myki machines at Southern Cross

Ultimately, it’s good that data sets like this are released. There actually should be a lot more of it – at present, the data released by PTV is very limited. Anything related to patronage or bus service performance is really difficult to find.

Perhaps the problem with not adequately cleaning the data is that they’re out of practice. Almost everything currently available either has nothing to do with passengers directly, or is at such a high level that it could never be used to find individuals.

More data should be out there. Ultimately, the public transport network is funded by taxpayers, and it should be a lot more accountable and transparent than it is.

One thing’s for sure: if they have a go at releasing this level of detailed data again – and I hope they do – they’ll need to be more careful to remove information that could be used to re-identify individuals.

Geek transport

“data from all Vic govt agencies will now be supplied in a machine-readable format” – PT timetables expected mid 2013

Back in 2010, Victorian government timetable data was released to the public, as part of the App My State competition.

The PTUA submitted an app as part of a study that showed how bad train/bus connections were, which got some media attention — and also managed to progress the debate around connections: the government went from denial to excuses.

Predictably it didn’t win a prize in the competition, and the timetable data was subsequently pulled, and never updated or put back. Could it be they weren’t very impressed at the data being used to embarrass the government?

Timetable data

The release of data was something that then seemed to go pretty quiet until after the Coalition came into power. Then:

“As a default position, data from all Victorian government agencies will now be supplied in a machine-readable format”

Govt press release: Coalition to unleash the economic power of Vic data

Good news… But where’s the public transport stop and timetable data then?

Still not available — in fact the blurb provided hasn’t been updated since Metlink was subsumed into PTV:

As the Victorian Government is currently evaluating the arrangements for release of public sector information under the Creative Commons licence, any requests for train, tram and bus route, stop and timetable data must still be made directly to Metlink Victoria Pty Ltd, the custodians of public transport data on behalf of the Director of Public Transport. Each request will be assessed on its merits.

Hey gunzels, New #PTV logos on tram 2049.

As it happens, information flying around from multiple sources says PTV are now aiming to have timetable data released in mid-2013, in a format that allows not only Google Transit, but other developers (including small independent ones) to use it too.

This will be a good thing.

As Gordon Rich-Phillips said in the press release:

“In driving the release of useable, high-quality data, these new policies will stimulate significant innovation and economic activity, creating a platform on which to develop new technologies, new services and ultimately, new jobs.”