This, the other week, was interesting:
In a concerning revelation, researchers have found that myki, in conjunction with social media, can be used to uncover a wealth of information about card users.ABC: ‘Shocking’ myki privacy breach for millions of users in data release
Here’s the report and media release from the Office of the Victorian Information Commissioner:
Here’s the original study:
What happened was that PTV released a whole bunch of Myki touch on/off data for a “datathon” event, where people see what handy things they can do with the data.
It was “de-identified” – that is, Myki card numbers were removed and replaced with another identifier, which could link trips from a single card together, but not back to a card holder.
Or so they thought.
Part of the problem was they left in a flag indicating the card type. This is not just Full Fare (Adult) or Concession – it goes down to the precise type of Concession or free pass. For instance type 39 is a War Veterans Travel Pass; type 46 is a Federal Police Travel Pass.
With more than 70 types of card, some of the more obscure types are pretty rare, so if the person you’re trying to track down is using one of them, they’re probably not that hard to find, particularly if you know which stations they regularly use.
That’s presumably how the researchers found Anthony Carbines, State MP for Ivanhoe, I’m guessing travelling on a State Parliamentarian Travel Pass – by looking at the data, and matching it up with his social media posts, which included at least one from Rosanna Station.
I’m probably in there too. And so are you. (I’ve only seen a sample of the data; a mere 30 million card touch records out of the total 1.8 billion originally released.)
Ultimately, it’s good that data sets like this are released. There actually should be a lot more of it – at present, the data released by PTV is very limited. Anything related to patronage or bus service performance is really difficult to find.
Perhaps the problem with not adequately cleaning the data is that they’re out of practice. Almost everything currently available either has nothing to do with passengers directly, or is at such a high level that it could never be used to find individuals.
More data should be out there. Ultimately, the public transport network is funded by taxpayers, and it should be a lot more accountable and transparent than it is.
One thing’s for sure: if they have a go at releasing this level of detailed data again – and I hope they do – they’ll need to be more careful to remove information that could be used to re-identify individuals.