La inteligencia artificial como el fuego

AI is one of the most important things humanity is working on. It is more profound than electricity or fire… We have learned to harness fire for the benefits of humanity but we had to overcome its downsides too. AI is really important, but we have to be concerned about it.

La inteligencia artificial es una de las cosas  más importantes en las que está trabajando la humanidad. Es más profunda que la electricidad o el fuego… Hemos aprendido a domesticar el fuego para beneficio de la humanidad pero también tuvimos que superar sus desventajas. La inteligencia artificial es realmente importante, pero tenemos que ocuparnos de ella.  

Sundar Pichai, CEO of Google, at NBC News

Newtral

En el medio de este auténtica jauría de talentos estoy yo desde esta semana, hablando con unos y otros, escuchando ideas y dibujando proyectos para que la información veraz llegue más lejos que el ruido. Esto en periodismo hoy es un lujo.

Estoy muy feliz de volver a una redacción que además es una startup tecnológica con el foco puesto en la información y el fact-checking. Estaré coordinando el área digital. Gracias Ana Pastor y maravilloso equipo de Newtral por contar conmigo. Habrá novedades, por ahora tenemos mucho trabajo por delante.

Measles: When it’s rich countries that don’t get immunized

This entry is part of some findings in the exercises for the MOOC
Data visualization for storytelling and discovery. 

 

In the last few years there’s been some raising numbers in the spreading of viral illnesses that are completely avoidable by vaccines. Measles is one of them and I’ve downloaded the dataset of the World Bank for the last years to analize that information by country and by groups of them. The last data is from 2015. 

Measles is a highly contagious infectious disease caused by a virus, and it can lead to complications including pneumonia and encephalitis. In 2016, there were 89,780 measles deaths globally – marking the first year measles deaths have fallen below 100,000 per year.

The World Health Organization has recommended that to achieve herd immunity, more than 95 % of the community must be vaccinated. As a result of widespread vaccination, the disease was declared eliminated from the Americas in 2016. It, however, occurred again in 2017 and 2018 in this region. 

Studies have shown that if an unvaccinated minority (around 5-10%) remains small, herd immunity can still be effective. A problem arises when the minority begins to grow.

A view to the world map

Graph: Measles world map

The world map shows the countries in a sequencial colour scale where in the vivid red shows the areas where the percentage of children immunized runs under 85 %. In the orangish medium tone we can see those countries where this ratio sits between 85-95 % which is not enough to prevent spreading of measles. Only those countries with more than 95 % of the children vaccinated are safe from measles, they are the lightest hue in the map. 

Most of the lowest numbers of countries where children are protected against measles are in Africa, with many in Oceania as well. But also a continent with traditionally good healthcare policies as Europe is not completely safe

Ukraine, Bosnia, Serbia and Macedonia are under 85 %, and countries as France, United Kingdom, Ireland, Italy, The Netherlands, Denmark, Croatia, Slovenia, Switzerland, Finland, Estonia, Latvia, Lituania, Belarus, Moldovia, Romania and Bulgaria stay behind the 95 % of vaccination. In Europe, eighteen countries — Austria, Belgium, Iceland, Luxembourg, the Netherlands, Spain and Sweden among them — reported more cases of measles during the first half of 2017 than during the same period in 2016, according to the European Centre for Disease Prevention and Control. 

Also rich countries in America, such as United States and Canada don’t get a 95 % of immunization. 

The distribution shows a median of 92, which falls apart from the recommendation of the WHO for 3 points. There’s a definite outlier with only 20 % of vaccinated children, South Sudan. It’s a new country that has suffered ethnic violence and has been in a civil war since 2013, and is acknowledged to have some of the worst health indicators in the world. 

Equatorial Guinea is the next outlier, with 27 % of vaccination, and in spite of being one of sub-Saharan Africa’s largest oil producers the wealth is distributed extremely unevenly. The country’s authoritarian government is cited as having one of the worst human rights records in the world. Less than half of the population has access to clean drinking water and that 20 % of children die before reaching the age of five.

Countries – Measles & health expenditure per capita

Graph: Measles & health expenditure per capita

If we consider the variable of health expenditure per capita in USD we can explore some interesting cases. There’s an outlier also in this case: San Marino. Health expenditure media of all countries is USD 1,005 per capita. San Marino spends 3,243 USD in public health, almost 3 times more than the media and still has very low numbers of immunized population against measles. It does not seem to be a problem of money. 

The correlation between public expenditure on health and vaccination is interesting because it shows that most of the countries above the levels of immunizations recommended by the WHO don’t necessary spend higher levels on public health. Tanzania, with the lowest amount, only 37 USD reaches a 99 % immunization, and there’s a similar correlation in other countries: Russia, Mexico, Turkey, Vietnam, Georgia, Latvia, Poland, El Salvador, Rwanda, Seychelles, Nauru, and others that stay below the mean and still make the WHO achievement of immunization. 

Cuba is perhaps the most cited example of efficiency in health public policies, and in this case can be it too: with only 817 USD got 99 % of it’s population immunized. As I said before, it’s really compelling that rich countries with higher levels of GDP and also higher health expenditure per person as Canada, the United States, Denmark or France don’t get a 95% of immunization. 

Exploring regions

Graph: Measles & health expenditure per capita per regions 

Click on the graph to open interactive scatterplot

 

In a scatterplot that shows groups of countries or continents there are other observations that we can remark or take as a clue for further research. The mean of the whole world in these variables is 1001,66 USD on health expenditure per capita, and a 84 % of children vaccinated. So we can see that there’s still work to do in this area, cause it’s 11 points below of what WHO recommends. 

South Asia and Sub-Sahara Africa are the less immunized groups of countries. Fragile and conflicted affected situations states, low income, and heavily indebted poor, and least developed countries as per UN cualification, are those in which we can see a strong correlation with less percentage of children vaccinated. 

No continent is completely enough immunized, though Europe and Central Asia have the closest percentages to 95 %, without reaching it. The OECD members have a 94.48 %. The countries that reach the measles vaccination goal of the WHO have only one group in common: they are all upper middle income countries.

These explorations are the first observations and are intended to bring up some clues on to keep doing research. More variables should be considered in a big study like this, as well as getting into the particular economic, demographic and social situation in each country. An interesting variable could be to try to track somehow the anti-vaccines groups in some countries or states and their influence in media or social networks. I couldn’t find this kind of data but I guess that education and information should be an interesting variable to take into account here. 

Other findings in this series:

Body mass index: not (only) a matter of income

Exploring datasets: Bikes in Madrid and education expenditure in Argentina

El clic de los niños

“Los niños ya nacen sabiendo usar internet”, oímos a menudo. Lo que seguramente no saben es cómo proteger su privacidad de la personalidad que aún están formando, porque ya hay aplicaciones y trackers juntando datos de su navegación. Esto es lo que descubrió un grupo de investigadores del International Computer Science Institute (ICSI) de la Universidad de Berkeley.

¿Para qué querría alguien hacer un perfil del comportamiento de un niño pequeño? fue lo primero que le pregunté a Narseo Vallina, uno de los investigadores, y sus respuestas no fueron muy tranquilizadoras. Disney ya ha tenido juicios por temas similares y hace poco YT Kids se enfrentó a un inmenso debate en redes por padres preocupados. Había una serie de vídeos producidos por algoritmos programados para crear vídeos terriblemente atractivos para menores, aunque ciertamente extraños. Lo peor de todas estas apps es que la mayoría enviaba los datos a terceros, en la mayoría de los casos, empresas bastante oscuras. Todo esto y más lo he contado en este reportaje para El País: Más de la mitad de aplicaciones infantiles envía datos a terceros.

Foto: Petras Gagilas

Body mass index: not (only) a matter of income

This entry is part of some findings in the exercises for the MOOC
Data visualization for storytelling and discovery. 

Excess body weight is an important risk factor for mortality and morbidity from cardiovascular diseases, diabetes, cancers, and musculoskeletal disorders. It’s the cause of nearly 3 million annual deaths worldwide. Several studies on diferent levels show that adiposity, as measured by body mass index (BMI, calculated as weight in kg over m2), has increased in recent decades in many populations although BMI seems to have been stable or even decreased in some groups.

Body mass index is a value derived from the mass (weight) and height of an individual. The BMI is defined as the body mass divided by the square of the body height. 

Commonly accepted BMI ranges are: underweight: under 18.5 kg/m2, normal weight: 18.5 to 25, overweight: 25 to 30, obese: over 30. Also the World Health Organization adheres to this classification.  So those are the line highlights in the Y axis of the graphs, to see which and how many countries fall into them. 

Correlation with income

I used the Gapminder 2012 dataset to explore a bit. 

The mean BMI provides a simplified measure of the comparative weight of populations on a country by country basis, and my first hunch was to compare the mean BMI of each country with the income per person to see how it correlates. Maps didn’t show well the gradients, as the countries that have higher values of BMI are scarce and very small in dimensions in the map. So I used a scatterplot to see countries and also continents by colour, and see the trends. 

Graph 1: BMI vs. income (men)

 

All the countries with an obese population (Nauru, Tonga, Samoa, Palau, French Polynesia) belong to the Polynesia, which may pose the question for an ethnic condition or if it’s considered necessary to use diferent parameters when studying this area. 

 

Graph 2: BMI vs. income (women)

 

Considering the data for women, there are more countries with more BMI index for women, and also more into the category of obesity. Besides the mentioned before, there’s Kiribati, Marshall Islands for the Polynesian, Egypt and Kuwait for Middle East and Puerto Rico, Saint Kitts and Nevis, and Bermuda for America. This may have some kind of relationship with climate and hot temperatures (?), as all of them are located near the parallel of latitud 0. Some possible clue to keep on searching. 

We can see that the BMI and the income doesn’t show a clear correlation in general, so I thought it would be better to filter and to analyse by continent and country more in detail. 

There are several studies stating that wealth doesn’t have a direct correlation with BMI as there are more factors involved. “The persistence and emergence of income gradients suggests that disparities in weight status are only partially attributable to poverty and that efforts aimed at reducing disparities need to consider a much broader array of contributing factors”, as per Wang and Lauderdale.

In a study of the University of North Carolina, they employed microdata from China to provide the theoretical examination and empirical test of the predictions linking household income to adult BMI using both cross-sectional and panel data analysis. The results show an inverted-U shaped relationship between BMI and family income. Additional income brings about higher BMI and higher possibility of being overweight or obese for the poor than for the rich.

 

The median of the income per person in the Gapminder data for 2012 is only 14,460, and most of the African countries are under that median. But the rest of the countries are quite dispersed, specially in the case of East Asia and Pacific and South Asia.

The discrepancy with Asia has some particular condition. The WHO has determined that at any given BMI, Asians, including Singaporeans, generally have a higher percentage of body fat than do Caucasians. The BMI cut-off levels for Singaporeans have been revised such that a BMI 23 kg/m2 or higher marks a moderate increase in risk while a BMI 27.5 kg/m2 or more represents high risk for diabetes and cardiovascular diseases. 

 

Besides that, and coming back to all the continents data, a histogram showed that the median for BMI is 25.56, similar to the mean, 25.14. 

So in our analysis, most of the countries fall into the calification of overweight or obese, and as per several experts that’s the biggest problem in terms of alimentation that we have. More than underweighted we are eating bad food and not keeping a good metabolism balance. Also if you are poor and don’t have education you cannot resolve this situations to get the best nutrients and sustainable food at your hand. Education is one of other many variables that can have incidence in the causes of a higher BMI, such as ethniticity, and we cannot establish a serious correlation without searching deeper in other variables.  

Correlation with urban population

So I wanted to watch how urban population could correlate with the BMI index. Some studies at the national level find the lifestyle of urban people as one of the main causes of higher levels of obesity in cities, independently of income. It’s the case of a study in Brazil that found that urbanization and the more developed geographic regions were positively associated with the prevalence of overweight/obesity and negatively associated with the prevalence of underweight.

Graph: Body mass index vs. urban population 

In the grid of scatterplots by continent, we can effectively see a positive correlation for every group. The Asian look still very spread out, anyway. I’d study them in particular, after revising more papers on their specificities, and wouldn’t include them in a general analysis like this. But for the rest, the correlation is positive. 

There are a number of reasons for the association between obesity and economic growth in many economies. Technological changes that lead to the lower food prices and increased food consumption are some of the factors that explain economic growth and obesity, as a study by Finkelstein and Ruhm proved. Those factors increase working hours, which makes more people eat in restaurants and fast food joints.

I find this kind of explorations makes us pose more and more questions every time, and I could go on an on trying to find papers on each region and different variables, as I mentioned before, such as education, urban growth (not only total population), differences per latitudes, and so on.

Exploring datasets: Bikes in Madrid and education expenditure in Argentina

During last weeks I’ve been doing a MOOC on Data visualization for storytelling and discovery with Alberto Cairo, which I intensely recommend. I’ll post here some of the findings I’ve got from there. The studies are not totally finished as they would need more work to be presented as a journalistic piece, so shouldn’t be taken as more than an exercise in the learning process. 

 

 1. Dataset BiciMad

First, I wanted to go local, and I live in Madrid. In my city we have a relatively new public bike rental service, and they have their datasets available, so I got a dataset with the data on the new daily users.

 

In the histogram I can see the concentration and the spread of the data. There’s a curious outlier that corresponds with the maximum value of the dataset: 1446 and there’s another isolated value around 700. I find those two points like something worth of more research. Probably they correspond to the day that the service started or went open to the public. 

The x axis represents the number of new users of annual tickets per day. The y line represents the number of days that those users where registered. The distribution is skewed to the right, due to the outliers to the higher values of annual passes some few (2-6) days. 

The box plot shows the concentration of what could be a usual number of new users per day. The median is 132 and the mean is 133, so during that year (2014) that is the number of new users per day of this service. It could be useful to compare it with datasets of other years and other kind of information to see what variables make people decide to hop on bikes as a way of transportation in the city. 

 

2. Second case: Comparing education expenditure (%) with GINI Index in the last years in Argentina

I was born in Argentina, and there we have been having not very good official statistics in the last years in terms of transparency, so getting good analysis on that kind of data is usually extremely complicated. 

So I used data of the World Bank on three variables: total government expenditure on education, school enrollment primary private and GINI index. I know GINI is made of several indicators and not only education but I wanted to give it a try and see how it correlates. 

 

I used data from 1980 to 2015. The highest expenditure in education in general was in 2015, with 5.875 % of the GDP. In 1980 there is an outlier point with 2,6 % of GDP expended before a dark period of 15 years where there are no registry or the data we have goes below 2,6 %. 

From 1996 the line rises and shows a positive evolution until the last year in the series (2015), with some hiccup between 2002 and 2005, the years of the default crisis and political unstability in Argentina. The trend overall is positive, with a rank correlation of 0.86  (using Spearman’s Rank Correlation). 

The GINI index is the most commonly used measurement of inequality. A Gini coefficient of 1 (or 100%) expresses maximal inequality among values. So if the GINI index goes down it’s best in terms of equality for the country. For OECD countries, in the late 20th century, considering the effect of taxes and transfer payments, the income Gini coefficient ranged between 0.24 and 0.49.

When I added the GINI index using the colors in the values, I found that there’s a positive correlation, as in the last years where the expenditure on education is higher, the GINI index goes down (which means that Argentina gets closer to equality). There are some quite interesting periods of time, anyway, when this correlation does not happen. 

One is during 1980-1990 the expenditure was lower, quite less than 2,6%, and the GINI index kept below 45. It should be said that we have some missing values those years, and we should investigate further to reach any conclusion. 

The other is an outlier in 2001, when the government expenditure on education is 4.833740234, the highest in the period until 2009, but the GINI index in that year is the highest of the total number of observations, that is very bad for the equality in the country. I find this observation interesting as 2001 is one of the worst years of the crisis, when Argentina went into financial default. 

Blog Widget by LinkWithin