## Body mass index: not (only) a matter of income

This entry is part of some findings in the exercises for the MOOC
Data visualization for storytelling and discovery.

Excess body weight is an important risk factor for mortality and morbidity from cardiovascular diseases, diabetes, cancers, and musculoskeletal disorders. It’s the cause of nearly 3 million annual deaths worldwide. Several studies on diferent levels show that adiposity, as measured by body mass index (BMI, calculated as weight in kg over m2), has increased in recent decades in many populations although BMI seems to have been stable or even decreased in some groups.

Body mass index is a value derived from the mass (weight) and height of an individual. The BMI is defined as the body mass divided by the square of the body height.

Commonly accepted BMI ranges are: underweight: under 18.5 kg/m2, normal weight: 18.5 to 25, overweight: 25 to 30, obese: over 30. Also the World Health Organization adheres to this classification.  So those are the line highlights in the Y axis of the graphs, to see which and how many countries fall into them.

## Correlation with income

I used the Gapminder 2012 dataset to explore a bit.

The mean BMI provides a simplified measure of the comparative weight of populations on a country by country basis, and my first hunch was to compare the mean BMI of each country with the income per person to see how it correlates. Maps didn’t show well the gradients, as the countries that have higher values of BMI are scarce and very small in dimensions in the map. So I used a scatterplot to see countries and also continents by colour, and see the trends.

Graph 1: BMI vs. income (men)

All the countries with an obese population (Nauru, Tonga, Samoa, Palau, French Polynesia) belong to the Polynesia, which may pose the question for an ethnic condition or if it’s considered necessary to use diferent parameters when studying this area.

Graph 2: BMI vs. income (women)

Considering the data for women, there are more countries with more BMI index for women, and also more into the category of obesity. Besides the mentioned before, there’s Kiribati, Marshall Islands for the Polynesian, Egypt and Kuwait for Middle East and Puerto Rico, Saint Kitts and Nevis, and Bermuda for America. This may have some kind of relationship with climate and hot temperatures (?), as all of them are located near the parallel of latitud 0. Some possible clue to keep on searching.

We can see that the BMI and the income doesn’t show a clear correlation in general, so I thought it would be better to filter and to analyse by continent and country more in detail.

There are several studies stating that wealth doesn’t have a direct correlation with BMI as there are more factors involved. “The persistence and emergence of income gradients suggests that disparities in weight status are only partially attributable to poverty and that efforts aimed at reducing disparities need to consider a much broader array of contributing factors”, as per Wang and Lauderdale.

In a study of the University of North Carolina, they employed microdata from China to provide the theoretical examination and empirical test of the predictions linking household income to adult BMI using both cross-sectional and panel data analysis. The results show an inverted-U shaped relationship between BMI and family income. Additional income brings about higher BMI and higher possibility of being overweight or obese for the poor than for the rich.

The median of the income per person in the Gapminder data for 2012 is only 14,460, and most of the African countries are under that median. But the rest of the countries are quite dispersed, specially in the case of East Asia and Pacific and South Asia.

The discrepancy with Asia has some particular condition. The WHO has determined that at any given BMI, Asians, including Singaporeans, generally have a higher percentage of body fat than do Caucasians. The BMI cut-off levels for Singaporeans have been revised such that a BMI 23 kg/m2 or higher marks a moderate increase in risk while a BMI 27.5 kg/m2 or more represents high risk for diabetes and cardiovascular diseases.

Besides that, and coming back to all the continents data, a histogram showed that the median for BMI is 25.56, similar to the mean, 25.14.

So in our analysis, most of the countries fall into the calification of overweight or obese, and as per several experts that’s the biggest problem in terms of alimentation that we have. More than underweighted we are eating bad food and not keeping a good metabolism balance. Also if you are poor and don’t have education you cannot resolve this situations to get the best nutrients and sustainable food at your hand. Education is one of other many variables that can have incidence in the causes of a higher BMI, such as ethniticity, and we cannot establish a serious correlation without searching deeper in other variables.

## Correlation with urban population

So I wanted to watch how urban population could correlate with the BMI index. Some studies at the national level find the lifestyle of urban people as one of the main causes of higher levels of obesity in cities, independently of income. It’s the case of a study in Brazil that found that urbanization and the more developed geographic regions were positively associated with the prevalence of overweight/obesity and negatively associated with the prevalence of underweight.

Graph: Body mass index vs. urban population

In the grid of scatterplots by continent, we can effectively see a positive correlation for every group. The Asian look still very spread out, anyway. I’d study them in particular, after revising more papers on their specificities, and wouldn’t include them in a general analysis like this. But for the rest, the correlation is positive.

There are a number of reasons for the association between obesity and economic growth in many economies. Technological changes that lead to the lower food prices and increased food consumption are some of the factors that explain economic growth and obesity, as a study by Finkelstein and Ruhm proved. Those factors increase working hours, which makes more people eat in restaurants and fast food joints.

I find this kind of explorations makes us pose more and more questions every time, and I could go on an on trying to find papers on each region and different variables, as I mentioned before, such as education, urban growth (not only total population), differences per latitudes, and so on.

## Exploring datasets: Bikes in Madrid and education expenditure in Argentina

During last weeks I’ve been doing a MOOC on Data visualization for storytelling and discovery with Alberto Cairo, which I intensely recommend. I’ll post here some of the findings I’ve got from there. The studies are not totally finished as they would need more work to be presented as a journalistic piece, so shouldn’t be taken as more than an exercise in the learning process.

First, I wanted to go local, and I live in Madrid. In my city we have a relatively new public bike rental service, and they have their datasets available, so I got a dataset with the data on the new daily users.

In the histogram I can see the concentration and the spread of the data. There’s a curious outlier that corresponds with the maximum value of the dataset: 1446 and there’s another isolated value around 700. I find those two points like something worth of more research. Probably they correspond to the day that the service started or went open to the public.

The x axis represents the number of new users of annual tickets per day. The y line represents the number of days that those users where registered. The distribution is skewed to the right, due to the outliers to the higher values of annual passes some few (2-6) days.

The box plot shows the concentration of what could be a usual number of new users per day. The median is 132 and the mean is 133, so during that year (2014) that is the number of new users per day of this service. It could be useful to compare it with datasets of other years and other kind of information to see what variables make people decide to hop on bikes as a way of transportation in the city.

## 2. Second case: Comparing education expenditure (%) with GINI Index in the last years in Argentina

I was born in Argentina, and there we have been having not very good official statistics in the last years in terms of transparency, so getting good analysis on that kind of data is usually extremely complicated.

So I used data of the World Bank on three variables: total government expenditure on education, school enrollment primary private and GINI index. I know GINI is made of several indicators and not only education but I wanted to give it a try and see how it correlates.

I used data from 1980 to 2015. The highest expenditure in education in general was in 2015, with 5.875 % of the GDP. In 1980 there is an outlier point with 2,6 % of GDP expended before a dark period of 15 years where there are no registry or the data we have goes below 2,6 %.

From 1996 the line rises and shows a positive evolution until the last year in the series (2015), with some hiccup between 2002 and 2005, the years of the default crisis and political unstability in Argentina. The trend overall is positive, with a rank correlation of 0.86  (using Spearman’s Rank Correlation).

The GINI index is the most commonly used measurement of inequality. A Gini coefficient of 1 (or 100%) expresses maximal inequality among values. So if the GINI index goes down it’s best in terms of equality for the country. For OECD countries, in the late 20th century, considering the effect of taxes and transfer payments, the income Gini coefficient ranged between 0.24 and 0.49.

When I added the GINI index using the colors in the values, I found that there’s a positive correlation, as in the last years where the expenditure on education is higher, the GINI index goes down (which means that Argentina gets closer to equality). There are some quite interesting periods of time, anyway, when this correlation does not happen.

One is during 1980-1990 the expenditure was lower, quite less than 2,6%, and the GINI index kept below 45. It should be said that we have some missing values those years, and we should investigate further to reach any conclusion.

The other is an outlier in 2001, when the government expenditure on education is 4.833740234, the highest in the period until 2009, but the GINI index in that year is the highest of the total number of observations, that is very bad for the equality in the country. I find this observation interesting as 2001 is one of the worst years of the crisis, when Argentina went into financial default.