11 Feb 2015

Excel Statistics for 20Minutos Blog Awards 2014


#32 Excel Statistics for 20Minutos Blog Awards 2014

NB: Si prefieres leer este blog en español, recuerda que con Google-Blogger (el herramiento usado para este blog) es posible traducir este blog de ingles a español. El 'Google Traductor' esta en el menu a la derecha.

The Spanish newspaper 20Minutos, the 2nd newspaper in Spain, with a daily (free) edition of approx. 1 million, organizes the 9th edition of its Blog Awards ("Premios 20Blogs"), the biggest blog-contest in the Spanish speaking world. The 2014 edition has 7223 participating blogs from (mostly) Spanish speaking bloggers, and they are not only from Spain but e.g. also from Latin America (where BTW 20Minutos is also published in 2 countries: USA and Mexico). Until 2nd of March everybody with a 20Minutos account can vote 20 times, one time for every blog-category, and the blog with the most votes wins the public prize, although the jury decides who will be the winner of the 1st prize: 5000 euro.
My blog exists over 1 year now, and although it´s not in Spanish, I also signed-up my blog, and 20Minutos accepted it. Maybe they accepted that my blog is in English because English is so popular in Spain nowadays, there are so many people doing courses, even MP Rajoy seems to have done the course 'learn the 1000 most used English words'. Or maybe my blog was accepted because I have several post which have something to do with Spain, or Madrid where I live, e.g.:

* 2015 year-calendar with holidays in Spain and Madrid ('calendario laboral'), with Spanish names of weekdays and months, see:
http://worktimesheet2014.blogspot.com.es/2015/02/month-calendar-for-january-2015-in-excel.html

* timesheet to track worked hours (´plantilla de control de horas trabajado', 'los partes'), with integrated Spanish calendar, and e.g Spanish summer-schedule (´horario intensivo'), see:
http://worktimesheet2014.blogspot.com.es/2015/01/timesheet-for-worked-hours-2015.html

* Excel with statistics results 10K run 'Corre por el niño' (organized by children hospital Niño Jesus), see:
http://worktimesheet2014.blogspot.com.es/2014/11/statistics-result-10-km-run-corre-por.html

* Infographic win/loss record of Rafa Nadal, see:
http://worktimesheet2014.blogspot.com.es/2014/10/infographic-winloss-record-rafa-nadal.html

* Match schedule 2014 FIBA Worldcup Basketball Spain (´Mundial baloncesto') , see:
http://worktimesheet2014.blogspot.com.es/2014/08/worldcup-basketball-2014-match-schedule.html

20Minutos allows you to promote your own blog to attract votes in the Blog Awards, so: if you like my blog, you can vote for it here:

http://lablogoteca.20minutos.es/excel-ejemplos-para-tu-trabajo-deportes-y-mas-50022/0/

To do so, you have to register here:

http://www.20minutos.es/usuarios/registro/

But enough self-promotion, let´s start with what this blog is about: MS Excel. And I thought it could be interesting for all the participants of the Blog Awards to see some statistics about it. Of course 20Minutos also made one, about the number of participating blogs per year, see:

http://blogs.20minutos.es/premios20blogs/vota-ya-por-tus-favoritos-en-los-premios-20blogs-tienes-7-221-candidaturas-entre-las-que-elegir-y-20-votos-para-repartir/

And this is 1(of 4) statistics in my Excel, see fig.1.
NB: in their graphic, the total of 2013 is not correct (6790), or better said, not corresponding to their data on this page (my source):           
http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/

This graph shows that the number of blogs participating in the Blog Awards increased almost every year, from a bit more than 2000 in 2006 to over 7000 in 2014.

                                               fig.1


                                      fig.2


The chart in fig.2 shows for the 2014 Blog Awards the number of blogs for all 20 blog-categories (except 'Best Blog' and 'Public Prize', because of course all blogs participate in these 2 'categories'). My blog is in category 'Blogosfera' (a 'Rest'-category, so for blogs which don´t fit in any other category) with 933 blogs, although I thought I signed up for category 'Internet etc.', with 502 blogs, so with 'less' competition then in category 'Blogosfera', but still, 501 competitors... Maybe next year I´ll participate with a blog about motors .. (this category has only 48 blogs).


fig.3


fig.4

The charts in fig.3 and 4 are similar, both have the 'dimensions' (axis) time and blog-category, but they differ on which axis (x or y) these dimensions are placed. Both have a filter: Rank blog-category <= 3, so it shows per year only the top 3 blog-categories (excluding the 'Rest-category'). In fig.3 you can see that there are 5 categories which every year have the most blogs, and that in the beginning the 'Personal-category' was the most popular but this decreased during the years. And in fig.4 you can e.g. see that before 2011 the 'Personal-category' was the one with ranking 1, but that since then, the 'Culture-category' took over.

How did I make this Excel? I used this page as my data-source:
http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/

Maybe I could have used the 'copy/paste' method to get the data from this (tabbed) HTML-table in Excel, but I thought it would be nicer to use the Power BI option of Excel, and use the function "Get external data from website", but this didn´t work. But from this site:

https://eagereyes.org/data/scrape-tables-using-google-docs

I learned that Google-Sheets (Google´s Excel..) is better in 'scraping' data from websites then Excel. So I created a spreadsheet with Google-Sheets on Google-Drive (so in the cloud) and this function did the job perfectly:

=IMPORTHTML("http://lablogoteca.20minutos.es/premios-20blogs/ediciones-anteriores/2013/", "table",1)

I shared this Google-spreadsheet here:

http://goo.gl/At8RkU

see also fig.5 

fig.5:  Google-spreadsheet with data imported from website 20Minutos

And then I exported (downloaded) this cloud-spreadsheet from Google-Sheets to my 'good-old' desktop Excel (version 2013). And this Excel is the database (´server´) for the Excel with the statistics (´client´), and to 'bind' them, I used an Excel-query, and to get the data of every year, with 1 worksheet (and 'named range') per year in 1 table, I used SQL´s union-operator:

--
SELECT R_2006.Categorias, R_2006.Ganador, R_2006.Concursantes, R_2006.Rank, R_2006.Año
FROM `D:\projects\20Minutos_BlogCompetition2014_v4_input.xlsx`.R_2006 R_2006
union
SELECT R_2007.Categorias, R_2007.Ganador, R_2007.Concursantes, R_2007.Rank, R_2007.Año
FROM `D:\projects\20Minutos_BlogCompetition2014_v4_input.xlsx`.R_2007 R_2007
union


ETC
--

For more info about using Excel as a database, see my post:
http://worktimesheet2014.blogspot.com.es/2014/04/excel-and-relational-databases.html

The 4 statistics in my Excel are made with pivot-tables, about which I wrote in several posts, e.g.:
http://worktimesheet2014.blogspot.com.es/2014/01/timesheet.html


And to conclude this post, I´d like to say good luck to all my fellow-bloggers in the "Premios 20Blogs" Blog Awards.


Downloads:

#Mirror 1: Scribd.com (PDF file):
https://es.scribd.com/doc/255471225/20Minutos-Blog-Awards-2014-some-Statistics

#Mirror 2: MS Onedrive (Excel file):
NB: this site has 'Excel-Online', so you can view my Excel-doc if you don´t have MS Excel on your PC

-server:
http://1drv.ms/1z8iMEn

-client (with statistics):
http://1drv.ms/1MdCR5o

#Mirror 3: Google Drive (1 zip file with Excel and PDF files):
http://goo.gl/30xyB1

No comments: