Human behaviour, observed one trillion times

It’s a mistake to think the internet is simply data; it is a living, breathing manifestation of our most human actions and motivations – if we know how to look. With Associate Professor Paul Raschky.

Election-rigging. Corruption. Sleep deprivation. It’s all revealed, every time we log on and off.

“We can infer human behaviour from internet behaviour,” says Associate Professor Paul Raschky.

He is part of a unique data analysis project that starts with one trillion anonymous observations of internet use and reveals the all-too-human machinations that lie beneath it.

With his colleagues Associate Professor Simon Angus and Dr Klaus Ackermann, Associate Professor Raschky has analysed events such as the 2018 Russian Presidential election. What they saw was a disquieting picture of how a democratic process can be slyly manipulated.

But the data shows other things; how the corruption of local officials in Africa literally results in a luminous glow; who gets the best sleep in the world; and where to send emergency relief following a hurricane.

Read the transcript

Michael Pascoe:  Hello, I’m Michael Pascoe. Welcome to Thought Capital. The podcast that delves into the wealth of ideas created by the experts at Monash Business School in Melbourne Australia.

Paul Raschky:   We have over a trillion observations which sets us apart from a lot of data sets that’s in particularly used in social sciences and economics.

Michael Pascoe:   This podcast looks at the very big picture of big data.

Paul Raschky:      You cannot handle this amount of data on the normal desktop computer.

Michael Pascoe:   No, not the usual cliches of what Facebook is doing with your personal information, or the discount store knowing a woman is pregnant before she tells her family.

Michael Pascoe:   We’re going bigger than that, with Professor Paul Raschky, from the Department of Economics.

Michael Pascoe:   Paul, you use large amounts of data for conducting your research, I’m going to let you define big data first.

Paul Raschky:  Thank you Michael for having me. One definition of big data, just a large number of observations that you store in very big computers. You don’t use just standard statistical methods, but more modern statistical methods.

Paul Raschky:  We have over a trillion observations, which sets us apart from a lot of data sets that’s in particularly used in social sciences and economics.

Paul Raschky:    You cannot handle this amount of data on a normal desktop computer. You need just bigger computational capacity, so we use the Australian synchrotrons computing power for that.

Michael Pascoe:  To get my simple one bit at a time mind around this, when you’re dealing with the trillions. You’ve been looking at Internet flows in Russia for example, leading up to the election.

Vladimir Putin:  Russia! Russia!

Michael Pascoe:  What exactly are you looking at? Are you out rationing the Russians?

Paul Raschky:  We’re looking at two things. The first thing is we look at online-offline activity. So in each of the Russian district or Oblasts, we selected a large number of representative IP addresses and looked just whether these are online or offline over a certain period of time within a day.

Paul Raschky:  So we started observing the Russian Internet space in early December, and observed the space on a daily basis at hourly intervals up to after the elections.

Paul Raschky:  So we have an ongoing project which is called the IP Observatory, and the goal there is to observe internet availability and internet quality during broadly speaking, critical events.

Paul Raschky:   So critical events could be elections, they are important for democracy, where a functioning Internet is very important because voters have to make informed decisions. That’s one critical event.

Paul Raschky:   Another critical event are natural disasters. But coming back to the elections we initially started off by monitoring the Internet during the Turkish constitutional referendum in 2017. Prior to that referendum, there was already anecdotal evidence that there could be disruptions to the Internet availability. So we decided to start with that.

Paul Raschky:  And then we went on and looked at the elections following that, which were the presidential elections in Iran, and after that came the Russian elections.

Paul Raschky:  We have already … Research done, my colleague Klaus Ackermann, who is a former PhD student of ours, has already analysed internet behaviour and internet activity during the previous Russian elections, and he found systematic evidence that there was tampering with the Internet, especially in districts which are more supportive of opposition.

Michael Pascoe:  Tampering with the internet means you collect data from all districts in Russia and observe if the speed of availability changes over time within and between districts. There could be a lot of natural causes why the Internet speed might go down especially in a country like Russia where infrastructure of the internet is not as solid as in some other countries.

Michael Pascoe:  But looking at systemic changes over time allows you to get a bigger picture of deviations. So far, you’ve looked at Internet speeds during three elections, Turkey, Iran and Russia. What did you find?

Paul Raschky:   So far we did not find any evidence of systematic tampering with the Internet infrastructure, that’s what we analysed. We analysed the latency, so that’s the … You can see the speed, and the general availability of the Internet.

Paul Raschky:  Now in the case if people said look, there’s a lot of evidence that the incumbent regimes tamper, or try to distort the internet, or at least the information flow on the Internet. What we can observe is that in Turkey, in Iran and in Russia, people block specific websites.

Paul Raschky:  Given that we only have a study from Russia that we conducted, we think that the regime just became more sophisticated in the way they tampered with the Internet, or they tried to influence the Internet and distort the information flow.

Paul Raschky:  Instead of just shutting down the internet in an entire region which is very costly. Costly in the means that if you shut down the infrastructure, you’re not just disrupting the information flow from the opposition politicians for example, but you disrupt the infrastructure for local businesses. So, it’s a costly thing.

Paul Raschky:   Now, you can do that more effectively and less costly by just shutting down particular websites, or you just shut down the access of all Russians to, let’s say Facebook, or WhatsApp, or things like that, and these things happen.

Michael Pascoe:  When there’s a protest in China you can suddenly be without your social media that sort of thing?

Paul Raschky:   Yes.

Michael Pascoe:  I’m still trying to get my head around how you actually do it. You’re sitting here in Melbourne with your finger on Russia’s Internet, Turkey’s Internet, Iran’s Internet, through a technology which is really available, you are making available, or what?

Paul Raschky:   So we have developed a technology that allows us to monitor the Internet basically in real time. We scan millions of items on the Internet, we have no information about the content. We only know when a router or a computer’s online or offline. That’s what we know. And we know how fast the connection is of this computer.

Paul Raschky:   So in a sense, what we measure is basically the quality of the Internet infrastructure, as opposed to the content. What we make available is the result of that, or the results of that scan.

Paul Raschky:  So in an aggregated form, the aggregated … Let’s say District area, it’s not possible that you identify individual computers.

Michael Pascoe:  And what is the end point of that?

Paul Raschky:   Free, undistorted Internet access is a human right, that’s one thing. It’s basically this provision of information to a general public that our project aims at.

Paul Raschky:   The other thing is we can infer human behaviour from internet behaviour. So we have used this one trillion observations, and then looked at how Internet behaviour online-offline activity correlates with other behaviour such as sleep patterns or commuting patterns.

Paul Raschky:   So one study we did was we used this data to link it to sleep patterns around the world with close to 700 cities, and that allowed us to make the first comparable study of sleep patterns around the globe.

Paul Raschky:   Which is not necessarily in the area of economics, but what we wanted to show is that when you think of sleep, the distinction between when people in Argentina wake up and go to bed, so it’s most bed time and out of bed time. We don’t obviously measure when people fall asleep. There might be differences of course across cultures, but that’s one of the most fundamental things that humans do on a micro scale, on a daily basis. And we use that as a showcase to say hey look we were able to show that just based on this large amount of Internet data.

Michael Pascoe:  Who are the world’s best sleepers?

Paul Raschky:  I think Argentinian’s are number one. Australians do pretty well.

Michael Pascoe:  And who are the world’s worst sleepers?

Paul Raschky:  Japanese have the shortest amount of shortest amount of sleep according to our study.

Michael Pascoe:  Close your eyes for a second, after all we’ve just been talking about sleeping. You are somewhere in southern Africa, it’s nighttime. NASA satellites are constantly circling the Earth taking enormous numbers of pictures.

Michael Pascoe:  At night these images show us where the lights are on. A single fact that can tell us a great deal about economic growth and fundamental human behaviour. Paul, you’re measuring nighttime luminosity, what are you looking for?

Paul Raschky:  The project started with the idea that we have been doing a lot of research on things that influence economic development in poor countries. Now to do that as an empirical Economist, you need data. And data for development especially in Africa, on a let’s say, sub national scale, is relatively poor.

Paul Raschky:  We have GDP data or datas about the Gross Domestic Product which is let’s say collected by the World Bank or the IMF at a country level.

Paul Raschky:  But first of all, this data is often not necessarily very reliable, and when you want to zoom into the country, very often we have hardly any information. There’s some server data from a few spots about people investing a lot of time and effort to collect household surveys, but to make like a big comparison across the entire continent, that’s hard.

Paul Raschky:  There is other data available and we can use other data as proxies, and one of that is to use satellite data on nighttime luminosity.

Paul Raschky:   So there are a large number of other satellites that make recording, they collect a lot of information. One of those site products if you want to call it, are pictures of the earth by night.

Paul Raschky:   So it’s like a very big photograph with dark and bright pixels, and those pixels have basically a value and the initial idea was to say maybe that reflects human economic behaviour. So we said okay let’s start with that because human economic behaviour emits light at night.

Paul Raschky:  Government expenditure for example when they build infrastructure, a lot of that is lit at night. Investment, when you build a new plant that emits light and a lot of consumption, driving a car, houses, all of those things emit light.

Paul Raschky:   Now in the first step we looked whether there’s a systematic correlation between these proxy measure for economic activity and actual GDP measures, and we find that there is actually a pretty strong correlation.

Paul Raschky:  Then we said okay we can use that now to go in depth within countries and compare differences in development. Again, within a region over time and between regions.

Michael Pascoe:  It would seem to be a fairly obvious correlation between availability of electricity mainly, and luminosity at night. Is it harder to measure electricity usage? I suppose this way you don’t have to be on the ground to do it.

Paul Raschky:   That’s true. Especially in Africa, a lot of electricity is produced off the grid, you have diesel generators and things like that. Then in addition there’s again simply no data. I mean it’s even hard to get reliable data on where the grid it’s currently at.

Paul Raschky:   There’s actually a nice little study done on the effect of piracy in Somalia, on the economic development. So when you think of piracy in Somalia, the people who become pirates actually, most of them are from poor fishing villages on the coast, where there is no access to electricity and things like that.

Paul Raschky:   Now, when they capture a boat and get ransom payments, you suddenly have an inflow of millions of dollars into that region, which is a huge positive financial shock for those little villages. And this study showed that you can see the effect from outer space of those cities become brighten up.

Michael Pascoe:  Piracy, lighting up the coast of Somalia, one ship at a time. Obviously in economics, it’s often about the measurement of change, that’s one example. What about the politics of change, does that show up?

Paul Raschky:  Yes. There’s anecdotal evidence that political leaders favour different regions within the country. That could be purely from a re-election perspective that say I would like to support my constituents, or basically pay them back for their support.

Paul Raschky:  It could be simply as the case in Bolivia with Evo Morales, showed he was the first indigenous president elected, and people from regions with a higher percentage of indigenous population simply said, “We are relatively poor, we have been disadvantaged over the years. Now it’s our time, and we would like to have some redistribution of government funding.”

Paul Raschky:   Or it could be cases of basically outright corruption as the case of Mobutu Sese Seko in at that time, Zaire and now the Democratic Republic of Congo. He was known to be a kleptocrat.

Paul Raschky:  He basically embezzled aid money, funds from or rents from natural resources, and used that money to build palaces and a very nice airport in his hometown of Gbadolite, which was previously just a small jungle town, and it became the country’s best electricity. He even hired Swiss farmers to look after his cattle.

Michael Pascoe:  So once you can see that change happening, how should that influence, the allocation of aid money, the allocation of World bank activities?

Paul Raschky:  What we showed basically is that, as we said, it’s often very hard to make a normative statement of whether regional favouritism is good or bad. So in the case of Evo Morales where you have redistribution going on and people with a preference for a more equal society would say well that’s good. We see the redistribution and regional favouritism is going on and you help the poor people.

Michael Pascoe:   And you can also see where the money isn’t going.

Paul Raschky:  Exactly. But the problem is what we found in general is that, this regional favouritism is not long lasting. It takes a few years for the politician to channel money into that region, and then the light picks up. But once the leader is out of power, the region reverts back to its initial state.

Paul Raschky:  This investment that we see that leads to a higher luminosity during the reign of the political leader, does not have a sustainable effect.

Michael Pascoe:  Well how much further can that research go?

Paul Raschky:  Where we said okay, first we looked at the birthplace of this leader, now we looked whether ethnicity matters. So you favour in particular your ethnicity which could be around your birthplace, but could be somewhere else in the country.

Paul Raschky:  And then we found something interesting in the way that people would normally assume that this type of ethnic favouritism is just an African phenomenon, but our study actually showed that it’s a more global phenomenon.

Paul Raschky:  So there are a lot more ethnically diverse countries around there where, different ethnicities put their politician into power, and we see these patterns of ethnic favouritism.

Paul Raschky:  And what we find there interesting is the following. So regional favouritism, this birth place favouritism can actually be mitigated with good political institutions. So if you have checks and balances in place, we see hardly any of the regional favouritism. Or at least favouritism that we can observe from outer space.

Paul Raschky:  Whereby with our ethnic favouritism, what we find it’s actually that ethnic favouritism is stronger in more democratic countries. So there, if you think that democracy is a tool for very good political institutions, it seems that the reelection concerns that come with a democratic regime just encourage ethnic favouritism.

Paul Raschky:  It’s in a sense … At the end, you want to have to pay back the support.

Michael Pascoe:  And what sort of response do you get from traditional economists who just look at their old GDP numbers, are they excited by it or just threatened?

Paul Raschky:  I think initially there was … Some people of course were suspicious. I mean that’s good, as a scientist, you have to be suspicious. You should not treat it as a substitute for GDP data but just as a complement. At the end of the day GDP data is also just a proxy for welfare, human economic activity, and we add another imperfect proxy to that. To get a more detailed picture of what’s going on in society.

Michael Pascoe:  Was it fun to find these new areas of [crosstalk 00:17:52] discovery?

Paul Raschky:  Of course. If you don’t get utility out of the process, that’s what it should be about in terms of research.

Michael Pascoe:  This big picture data of yours both, luminosity and Internet speeds, obviously you must see immediate reaction when there’s a disaster. When there’s a cyclone, when there’s a bush fire. What does that tell you, what’s the point of that?

Paul Raschky:  When you have a disaster for especially first responders, is that you do not necessarily know how big the extent of the disaster is. So with a hurricane we know where the hurricane goes, we know the exact path and we know it has hit this area, but we don’t know that how big the destruction was.

Paul Raschky:  We can monitor the internet before that hurricane arrives, and then once the hurricane has hit the area, we see that the Internet slowed down, or whether there’s no more Internet available. And that allows us then to map basically the destructive extent of the hurricane.

Paul Raschky:  So it could be that an area is hit but maybe the hurricane, the power of the hurricane has already died down, and has not sufficient destructive power to at least destroy assets to a certain extent.

Paul Raschky:  And in our case we can basically provide real time visualisation of the destructive path of the hurricane.

Michael Pascoe:  Where the greatest need is to send relief?

Paul Raschky:  Exactly, yes. We monitored the Internet during Hurricane Irma. we’re doing pretty good in measuring the physical strength of the disaster, and the direction and … We can do that with satellite data or with weather stations on the ground.

Paul Raschky:  But to measure the human extent of the disaster, that’s pretty hard. You have to send people in, and so I think we can just provide more timely information. Of course it’s not very precise, but we can give people at least immediately, a first glimpse of the idea how big the extent is and which areas were affected.

Michael Pascoe: What’s the next area of big picture, big data that’s interesting you?

Paul Raschky:  So one thing is in terms of the satellite data. So far, we have relied on night-time light data, and that requires a certain level of development to be detected. And especially in Africa, that area with the biggest lack of data, there are still a lot of areas which are populated and there’s economic activity, but the nighttime emissions are not high enough to be picked up by satellites.

Paul Raschky:  So what we are currently working on is using the daytime images, similar to what you see on Google Maps. It’s just an image. If you look at that, you can identify, yeah that’s a road, that’s a house, and that’s a footy ground.

Paul Raschky:  But, you need to do that in the case of Africa for over 30 million square kilometres this detection. So you need to automise this detection, there again, big data methods come in.

Paul Raschky:  So we have a very big data set, it’s multiple terabyte of picture information, and then be overlap it with machine learning tools to detect items. And the idea then is to say, maybe we can come up with an index that again correlates very well with official figures of wealth, consumption or economic development.

Michael Pascoe:  So if there’s change in the image, if something gets built-

Paul Raschky:  Exactly [crosstalk 00:21:16]. That’s the idea.

Michael Pascoe:  Professor Paul Raschky, thanks for talking to us on Thought Capital.

Paul Raschky:   Thank you.

Michael Pascoe:  You’ve been listening to Thought Capital, from Monash Business School. You can find out more at monash.edu/impact.

Michael Pascoe:   If you enjoyed Thought Capital also listen to Just Cases. Just Cases is the show about the biggest legal cases you’ve never heard of. Every day law courts make decisions that change the lives of those in the room. Some decisions change society itself. You can find Just cases on Itunes, Stitcher and Sound Cloud.

Michael Pascoe: Thought Capital was produced by Tina Zenou, editing and post production by Nadia Hume, technical support by Cameron Nichol, executive producer is Helen Westerman.

Published on 21 Nov 2018

More