A fairly recent, and continuing rage among data wonks is the creation of ranking indices (or indexes). Of course, we have had indexes for many, many years— an old historical example is the Dow Jones Industrial Average. And of course, we use similar techniques to compute FIFA World Rankings, university rankings, and more. What is more recent is the creation of myriad indices for countries around the world (as well as US states) which are used to rank them on every conceivable topic. A random selection of these are:
The Economic Freedom Index
The Global Peace Index
The Good Country
Index
The Global
Competitiveness Index
Rule of Law Index
World Press
Freedom Index
Happiness Index
Global Innovation
Index
Global Entrepreneurship
Index
Social Progress
Index
Corruption Perceptions
Index
Animal Protection
Index
The Big Mac Index
Global Hunger Index
Quality of Life IndexQuality of Death Index
I could go on listing these for days! Basically, if you want to create an index, sit down and decide on some factors that you think would go into explaining an idea such as "Happiness". Collect some data on these factors, and then figure out a way to create a sort of weighted average of these factors, adjusting them to a similar scale. A BurkeyAcademy subscriber Wrote in to ask me about the United Nations Development Programme's "Human Development Index". He wanted to know how was calculated. Let's have a look.
Looking at their 2015 report, they list the variables used to construct their index: Life expectancy, expected years of schooling (for young children now), mean years of schooling for all in the country, GNI per capita, and though it is unclear if (or how) they use this in their ranking, the difference in GNI ranking and their own Human Development Ranking (they could include this difference in their ranking in a recursive fashion using several steps).
Excerpt from Table 1, page 208, Human Development Report 2015, UN Development Programme
In order to figure out how this was calculated, my go-to tool is a regression. Regression attempts to discover the formula for a relationship, if you assume the function's basic form. Using a subset of 34 countries gives the following results:
For the uninitiated, the important things are the "Multiple R-squared" of 0.9966 tells us that our equation is an almost perfect fit; and the "Estimates" give us the following formula:
HUMDEV= -0.154 + 0.00767*LE + 0.0114*EYSB+ 0.0187*MYS + 0.0000005936*GNI - 0.001154*GNIHDI
It appears that the "GNI rank - HDI rank" does help predict their ranking index. Exactly how they did it, I don't know — they might explain it in the document, but I did not read it fully. So now, we understand how they calculated it — what we do not know is exactly why they chose these particular numbers as weights. Again, perhaps they mention it in the 250 page report, but generally these weights are chosen subjectively based on how important the creator believes each factor to be, and then normalizing the result to be an index between zero and one or zero and 100.
Go and read about some other indices, and have a go at creating your own! Here is a link to an excellent article on indexes from The Economist magazine.