"Hacker Games Pro: Open Data" Hackathon Impressions

Last weekend I had great fun participating in an Open Data hackathon organized by Startup Lithuania and Vilnius Tech Park.

The basic premise of the event is to match hackers with organizations that have huge datasets but do not necessarily know what to do with them. Some of the participating organizations include Vilnius Municipality, State Social Insurance Board, Center of Registry, TRAFI and others.

The summary of the event can be found by one of the news publishers (in Lithuanian).

Today's blog is not really about what it was like to participate in a hackathon (the usual lack of sleep and so on) but general observations that I gained about open data after the hackathon.

Challenges

Whilst the vision of the hackathon is great and many people attended the initial pitches, in the end only 4 teams remained. In contrast, previous games-themed hackathons held in the same place were overflowing with people. This contrast can be explained by several reasons.

First, whilst the enthusiasm for use cases of open data is high, in reality it's hard to figure out how to extract real value out of it.

Second, even if you think you have a good idea, after talking to the organization's representatives that provided the datasets you may found out that they already do what you thought. This happened to our team.

Third, even if you have a decent idea, the implementation will most likely be really tricky and require a high degree of competency which still relatively few possess.

The final game stopper is the fact that the data provided by the participating organizations, at least the ones in whose datasets we were interested, was simply too messy and hard to find.

As an example, my team was thinking of doing location-based analytics application that would help evaluate the suitability of a given location for your future business. We identified the required datasets with Centre of Registry representative who, by the way, turned out to be really helpful and insightful. However, when we actually got to work that night, we could not find any of the datasets that we talked about. There were too many folders, Excel sheets, and tabs within them, most with obscure titles that take too long to decypher. There were some descriptions but not enough and in the end, we just gave up on the idea, at least for now.

As a result, our team dispersed in separate ways and I found myself with another creating a statistics visualization tool where I helped to implement interactive features for the visualizations. The tool is non-commercial and is intended to be overtaken by Vilnius Lyceum students so they can gain experience developing products with real usage. The final product that we made is called Lithuanian Numbers and for now it summarizes various demographic statistics. With more data it could outgrow being a toy and could become useful to journalists, researchers, and general curious lurkers.

Future vision

Since open data hackathons are still new, both hackers and organizations are still struggling to find a good fit and use cases. I envision that in the future more hackers will become skilled at data analytics, data science or however you want to call it, and thus will become capable of extracting more value out of open data.

At the same time, organizations over time will get better at opening their datasets to the public. The data will become much easier to access and its attributes easier to understand. In cases where data is too big to just simply open to everyone (terabytes and much more), an organization would provide samples of the datasets that would be enough for prototyping and ideas exploration, and then if needed provide the entire dataset on demand.

Currently, we are experiencing a sort of chicken and egg scenario. Hackers rarely know what to build immediately and are unable to inspect the data. And data is not available because organizations do not know what will be useful and at least for now they are incapable of just quickly opening everything up in a convenient way.

Hackers and open data

Once the chicken and egg problem gets solved, we can expect many useful applications of open data to sprawl up that would, for example, help cities make data-driven decisions or enable you to solve your own problems and by doing so solve the same problem for others. However, like with chicken and the egg problem we might never which one came first, hackers that can deal with super messy datasets or organizations that made the data more convenient to work with.

Having said that, there are lots of cool things you can do now already if you have the patience. As a concrete example, one of the teams worked on an idea to create an app for finding electric cars charging stations. The team lead of the team owns an electric car himself and thus is painfully aware of logistical challenges that electric car owners face in Lithuania. By using open data he can solve his own problem of finding electric charging stations and instantly make other car drivers lives easier too. If that's not cool, I don't know what is.

For now though I think we are still in the very early stages of the possibilities that are ahead but I see a bright future which we now all have a chance to shape together!