Michael Pascoe: Welcome to Thought Capital, the podcast that delves into the wealth of ideas created by the experts at Monash Business School in Melbourne, Australia.
Bureau of Meteorology: Hello from the Bureau of Meteorology for Sunday the 19th of May.
Michael Pascoe: What’s the weather going to be tomorrow? That’s probably the most obvious forecast we routinely think of.
Bureau of Meteorology: The cold front is forecast to weaken as it approaches Southeast corner today.
Michael Pascoe: We’re flat out trying to understand the present and the recent past, but it seems we’re hard-wired to want to know the future, despite forecasting being very hard. Everyone from the government to Google wants them.
Rob Hyndman: An uncertain future can be scary, but if you can put some parameters around what that uncertainty is, if you have some idea about where that future values might lie, then I think it helps you feel more confident and more in control of what’s going on.
Michael Pascoe: Rob Hyndman is a professor of statistics and the head of Department of Econometrics and Business Statistics at Monash Business School. Welcome, Rob. Why are your skills in such high demand?
Rob Hyndman: Everyone needs forecasts to help them plan for an uncertain future. And no matter what business you’re in or what industry you’re working in, having some idea about what possible things could happen in the future is going to be really helpful for planning for that.
Michael Pascoe: Even if they’re wrong?
Rob Hyndman: Well, forecasts are always wrong. The trick is to do forecasting in a way that you understand how wrong they can be and what the probabilities are about them coming out in different ways.
Michael Pascoe: Let’s start at the very beginning. What is the definition of a forecast as opposed to a guess?
Rob Hyndman: The way we think about forecasting in a modern data science sense is that it’s an estimate of the value of something or other at a future time. And not just an estimate of that particular value, but understanding it probabilistically. So understanding what the probabilities might be around what values some future thing could take. So you mentioned the weather before, so rather than saying, “We think tomorrow’s going to be 26 degrees.” A forecaster would say, “We think the probability of the temperature being below say 24 degrees is this between 24, and 25, is something else, above 26, or something else.” So you come up with probabilities for the different possible values that future could take.
Michael Pascoe: One of the things I like about the Reserve Bank’s quarterly forecast graphs is that they come with an historical accuracy margin. This is the margin 70% sure 90% … Shouldn’t all forecasts come with that?
Rob Hyndman: Absolutely. Yeah. And one of the things that I try and convince my clients to do is to review how forecasts have gone in the past and to learn from them. To know how accurate they have been in the past and to use that information when planning for the future, because the forecast accuracy is unlikely to be very much better in the future than it has been in the past.
Michael Pascoe: In the past, people have been obsessed with trying to predict the future reading tealeaves, then trials of chooks, gazing at the stars. You can even argue perhaps that the desire to know the future is a major driver behind the desire for religion. Is there any historical basis for when we started forecasting?
Rob Hyndman: The earliest ones that I know about were in the Babylonian empire days where they had diviners who would carry around a sheep’s liver with the King of Babylon. And when he wanted to know should he invade some city or another city, they would look at their sheep’s liver and the distribution of maggots in the liver was how they determined which city to attack. So you could say that the very earliest forecasting software had bugs in it.
Michael Pascoe: Well, what do you use? Modern forecasting software, what is it?
Rob Hyndman: I use R for my research and my teaching, and actually I’m the author of some of the major packages that are used in R for forecasting. We take historical data, we build statistical models, and we output the forecasts. So from a user point of view, they will load their data into the package, they will run it through the functions in the packages that I’ve written and that will generate some forecasts that they can then use in decision making.
Michael Pascoe: Is it dangerous for people to trust forecasts too much?
Rob Hyndman: It’s dangerous to over-trust them. You have to realise that forecasts are always wrong and one of the things I always do, when I’m working with my clients, is insist that they have a measure of uncertainty associated with the forecast even if they don’t want them. So often I’ll be working with a company and they want weekly sales forecasts for the next few months, but rather than just give them my forecasts I will always give them a range of uncertainty. Might be 80% or 95% range, and I’ll say, “Well, here’s my best estimate of the sales for this week. And I expect that with 95% probability to between this number and this number.” And then at least they know how wrong they are and will not be putting too much confidence in forecasts simply by over believing them.
Michael Pascoe: You say forecasts are always wrong. You’ve got to get lucky sometimes don’t you?
Rob Hyndman: If you’re forecasting something numerical, you’re always going to be wrong. If you’re forecasting an event like it will rain tomorrow or not rain tomorrow, then sure, you’re going to get that sometimes right. But if you’re forecasting a particular number, the GDP of Australia’s going to be this specific number, then you’re always going to be wrong because you’ll never estimate it sufficiently accurately.
Michael Pascoe: Forecasting can be used in many areas, some more surprising than others. For 30 years Don Weatherburn was the Director of the Bureau of Crime Statistics and Research at the New South Wales Department of Justice.
Don Weatherburn: We didn’t really get into forecasting until it became clear that New South Wales had a serious prison overcrowding problem and were using inadequate tools to forecast the future trend in demand for prison beds. They need to be able to predict at least a year out what their prison population is going to be. Prison authorities would get much earlier warning of an increase in the prison population beyond prison capacity, and would make appropriate adjustments to deal with that problem. The more overcrowded a prison cell gets the bigger the risk that there’s going to be tension in the jail, violence in the cells, and in the extreme you could end up with a jail that’s so overcrowded, you get a riot. Which can cost millions of dollars if it results in part of the jail being burnt down, but it can also result in death and serious injury.
Don Weatherburn: The truth of the matter is that a prison system actually contains several different prisons. Maximum security prisoners have to be held separately from minimum security prisoners. Prisoners who … Offenders of the child sexual assault need to be kept away from other prisoners because they will be attacked if you put them together. And remand prisoners, people who haven’t yet been convicted of any crime, are meant to be kept separate from sentence prisoners, people who’ve been convicted and given a prison sentence. So you’re really managing several prison systems in the one go. And to do that, you need more than just the forecast of the total number. You need to be able to forecast the male and female prisoners separately, remand and sentenced prisoners separately, and so on.
Michael Pascoe: Forecasting is an essential part of running the New South Wales corrective system. We’ve heard how forecasting is used in the prison system in New South Wales. Rob Hyndman, what do you need to make a good forecast?
Rob Hyndman: So I’ve identified five things that are important for something to be easy to forecast, or for forecasts to be good. The first is you need to have a very good understanding of the factors that contribute to that variable that you’re trying to forecast. Secondly, there should be lots of data available. Thirdly, the forecast shouldn’t affect the thing you’re trying to forecast. Four, they should be a relatively low natural, unexplainable random variation. And fifth, the future should be somehow similar to the past.
Rob Hyndman: I’ll give you two examples of extremes, so take forecasting the sunrise tomorrow. You probably don’t even think of it as a forecast because it’s so accurate and so predictable that we just take it for granted, but that’s actually something that gets forecast. And it’s forecast very, very well because we have a very good understanding of the factors that contribute to it. There’s lots of data available going back millennia, the forecast can’t affect the thing you’re trying to forecast. My forecast of the sunrise is not going to change what time the sun comes up. There’s very low natural, unexplained, random variation and the future is very similar to the past.
Rob Hyndman: Something that’s quite difficult to forecast, a stock process. We don’t have a good understanding of the factors that contribute to them. There is lots of data available, but sometimes the data is not so relevant to the problem that we’ve got at hand. The forecast can affect the thing you’re trying to forecast. If I forecast the price of Google will rise tomorrow and I’m a well known forecaster, then that can actually affect the forecast. There’s quite a lot of unexplainable random variation and the future could be very different from the past. If the stock that I’m trying to forecast, if something happens in that company, the CEO dies, there’s some unexplained increase in their quarterly earnings, and then the future is very different from the past. So if you get those five things, or at least most of those five things, then the thing that you’re trying to forecast is not too difficult.
Michael Pascoe: So the stock market is notoriously difficult to forecast. Government policy isn’t very predictably either. Something Don Weatherburn at the New South Wales Department of Justice has experience with.
Don Weatherburn: Government activities, probably the least predictable part of the process that influences the prison population. It’s true that crime can be unpredictable, for example, the methamphetamine epidemic that’s currently sweeping Australia has caused a big increase in the number of people in prison. But without question, the most unpredictable thing about predicting the prison population is what government’s going to do. Governments have a habit of changing the law and tiny changes in the law can make a huge difference to prisoner numbers. So for example, if the government decided to toughen the law on bail. The prison system’s extremely sensitive to the number of people who are refused bail and that could have quite a big effect on the size of the prison population in very quick order.
Don Weatherburn: What’s happening now is the forecast becomes the baseline scenario for changes to the justice system. So if they say, for example, “We’re thinking of toughening the bail laws, we want to know what effect it’s going to have.” So they start off with the forecast that we provide and that’s, if you like, “What will happen if nothing else changes?” And then they put over the top of that the effect of toughing the bail laws. So they’re able to compare how things will play out if they don’t change anything, to how things will play out if they do change a particular thing. So it’s made the whole process of planning much more effective.
Michael Pascoe: So Rob Hyndman, stock markets and politicians make for difficult forecasts. What kind of problems do your forecasts solve?
Rob Hyndman: So I can tell you the sort of companies I’ve worked with and the sort of problems that they’ve had. So I’ve worked with airlines where they’re trying to forecast passenger traffic on city routes, say Melbourne Sydney route for example. I’ve worked with many retail companies where they’re trying to forecast sales and demand for their products. I’ve worked with federal governments, so the Pharmaceutical Benefits Scheme is something that needs to be forecast every year because the government subsidizes pharmaceutical products. They don’t know how many people are going to turn up in the chemist asking for different drug types in advance, so they need to forecast that. Electricity demand, so forecasting how much power’s going to be needed tomorrow, next week, or in 10 years time is all important. Forecasting for tomorrow is helping to plan generation capacity and making sure that we’re not going to have a blackout. Forecasting in 10 years time is all around, well, what generation do we need to develop to make sure that we can meet the demand that’s going to be there? Forecasting on Australian tourist demand, forecasting call centre volumes, mortality rates, population, lots of things.
Michael Pascoe: What’s the hardest of those? That’s as broad a range as I can think of. You didn’t mention the Melbourne Cup, aside from that, it’s pretty much everything.
Rob Hyndman: The hardest problems are where there’s not very good data or there’s no data at all. For example, forecasting the Pharmaceutical Benefits Scheme. Some of the products that are on the scheme, there’s no data for because they’re new products. I can’t build a statistical model because I haven’t got data to put into my model. And so then you rely on judgemental forecasting, which is much more difficult to do, and much less scientific in the sense that you’re building a mathematical model that’s describing things.
Michael Pascoe: Have you ever made a totally disastrous forecast? Something embarrassing?
Rob Hyndman: Yes, of course. Every forecast-
Michael Pascoe: So tell us about it.
Rob Hyndman: Every forecaster makes mistakes. Probably the one that I’m most embarrassed about was very, very early in my career. And I was asked by one of Australia’s largest car manufacturers to forecast their sales. They only gave me 15 years of data and they asked me for a 15 year forecast. And I was young and naive enough and wanting to get involved in this industry that I said yes. I should never have agreed to do that because it was 15 years data, you really shouldn’t be forecasting more than a few years ahead. And you don’t have enough history to know what’s going to happen longer term.
Michael Pascoe: And there’s so many variables obviously.
Rob Hyndman: And there’s lots and lots of variables, yeah. So the most embarrassing stuff is usually when someone wants you to forecast more than the data will bear or where there’s sufficient change in the environment that your model is not likely to hold down the track anyway. So that happens in electricity demand all the time. You’ll get data and they might give you 15 years of data and they want a 20 year forecast. Even if you had more data, 20 year forecasting electricity is crazy because even five years we might have batteries in every house, people might be having electric cars. The profile of usage is going to change a lot, we know it’s going to change a lot. So forecasting very far ahead in the electricity demand area is a little foolish.
Michael Pascoe: What’s the forecast that you’re most proud of? Something that had an outcome that you want to go home and tell everyone about?
Rob Hyndman: I’d say the work I did on the Pharmaceutical Benefits Scheme, largely because of how bad it was before I got involved. So back in the early 2000s the Australian government had underestimated the expenditure on the PBS, by nearly a billion dollars in two consecutive years. Now a billion dollars is a lot of money for a government to find. They called me up and said, “Do you think you can help?” And so I developed a new forecasting tool for them, which reduced the margin of error from about a billion dollars down to about, plus or minus, 50 million. Which is a huge advantage.
Michael Pascoe: That is an incredible difference.
Rob Hyndman: And so not only did that help solve a major national problem, but the models that I developed for that project, we then put out in open-source software so that everyone else could use them. And they’ve become one of the most widely used forecasting models in the world, that’s now used by maybe a million organisations around the world.
Michael Pascoe: Did you forecast that it would be so successful?
Rob Hyndman: No, I didn’t.
Michael Pascoe: Otherwise you would have copyrighted, or patented, or …
Rob Hyndman: No, actually my policy is not to patent or copyright anything that I do. All of my algorithms get put out as open-source code with a free license for anyone to use, and I think that’s a much better way to work. That means that my work has far more impact.
Michael Pascoe: What’s the forecast that you haven’t tried yet that you want to?
Rob Hyndman: So the one that I really want to do is to forecast individual household energy usage in Victoria. And the reason for that is Victoria is the only place in the world where there’s almost a hundred percent roll out of smart meters. So we have really good energy data, down to household level for the entire state, there’s about 8 million meters. And it’s the only place in the world where it has both 100% rollout and where one organization controls all those meters, which in this case is the state government. Everywhere else, including in Australia, there’s either not 100% rollout or the meters are owned by lots of different organizations and so you can’t actually build a coherent model across all of that data. So I’d love to get my hands on that data and take account of solar generation, as well as the temperature and humidity effects on usage, cloud cover, use of air conditioning. I think it’s possible, it’s an enormous dataset, but I think I know how to build a model like that. The problem is I don’t have the data.
Michael Pascoe: That would seem to be the most obvious thing that the government would want to give you on an offer like that.
Rob Hyndman: I’m working on it.
Michael Pascoe: We’ve spoken about the hardest and the easiest forecasts. What are the most surprising forecasts?
Rob Hyndman: So one area of forecasting, which may be a little surprising, is when you’re looking for surprises. So I’ve done some work recently with the Queensland Rivers where we’re looking to see where the health of the river is different from what we forecast it to be. So we have a lot of measurements on things like the turbidity, and the conductivity, and the level of the river. And we built a model that forecasts what that’s going to look like in the next hour, and the next day or two. And if it’s very different from what’s forecast that suggests there’s a problem, there’s either a pollution event, or there’s something happened to the meter that needs checking. And so we actually don’t care very much about our forecasts themselves, we care about when what we see is very different from what we forecast.
Michael Pascoe: And how much success are you having in finding those surprises?
Rob Hyndman: That work’s going very well, and the Queensland Department of Environment and Science has, using our methods to monitor their rivers, and we’re doing a pilot on a few rivers. And then we’re going to roll it out across many more sensors across the state.
Michael Pascoe: Where’s data science and forecasting heading? I’m asking you to forecast forecasting here.
Rob Hyndman: The models are getting more and more complicated and the data are getting more and more rich, with lots of different variables and a lot more data being collected. But eventually I imagine artificial intelligence will be able to build the models for us. I don’t know when that’s going to happen, but at some point you would expect that the sort of thing I do as a researcher in trying to come up with new ways of modelling data and using my models for forecasting, a computer’s going to do that better than me. And it’s going to be able to design better models and then implement them and then I can retire.
Michael Pascoe: Rob Hyndman, it’s been very interesting, thank you.
Michael Pascoe: Thank you also to Don Weatherburn, for talking to us about the prison system.
Michael Pascoe: You’ve been listening to Thought Capital from Monash Business School. You can find more episodes on iTunes, Spotify, and Stitcher, or wherever you listen to podcasts. This episode was produced by Tina Zenou, editor is Nadia Hume. Sound production by Gareth Popplestone. Executive producer is Helen Westerman. Thought Capitol is recorded at Monash School of Media, Film and Journalism.