19 Dec 2016

Carrera de la Empresas 2016 Madrid Statistics made with R

#59: 10K Carrera de las Empresas 2016 Madrid: Statistics made with R + included in Power BI

Note 20/12/2016: I updated this post with a Power BI solution (with R-Stats and Q&A), see part 2 of this post.

Part 1: Statistics with R 

Today the 'business run' ("Carrera de las Empresas") was held here in Madrid. I participated with Raet, a Dutch ISV of a SAAS-solution for Human Resources, which since almost 2 years now has a software development centre in Madrid.

For an earlier post of this run (with more details), see:

This time, there were over 14.000 runners, from 800 companies, for more details, see:

The results of the 10K run (in which I participated) I got from here:

and for the statistics, see this page (a social/runners network): http://www.runedia.com/cursa/201628906/carrera-de-las-empresas-10k-actualidad-economica/2016/

The results (10K-Men, 5563 runners)  are in PDF format, which I converted to Excel, with help of this nice (free) online tool:

After some 'cleansing' (see added yellow-columns in Excel file) I had the data ready for analysis, for which I used R, the language for statistical computing.
NB: For an earlier post in which I used R, see:


The R-script I made looked basically like this:

sv10km2 <- read.csv("10Km2.csv")
plot(density(Tfinish,na.rm=TRUE),lwd=3,xlab="Finish Time (mins)", main="Madrid Business Run 2016 10k Results")
boxplot(Tfinish, data=csv10km2)
newdata <-  sqldf('select * from csv10km2 where Team like %Raet%')

Input for the script has to be in the MyDocument-folder, in my case: D:\Persoonlijk\maartenree\Documents\10Km2.csv

See figures below for the result of my analysis.

And for those who don't have R, but want to do the same as I did for my company (in fig.3), so filter the results, I also added the Excel where the finish-results are in table-format (so easy to filter).

Fig.1: Plot with finish-times and Five-number summary

Fig.2: Boxplot  with finish-times

Fig.3: Filter dataset with finish-times for (teams of) company Raet

Part 2: Power BI + R Stats/Charts

Power BI has an option to include R Stats/Charts, and so I wanted to add the above R-chart 'Plot finish-times' (fig.1) to Power BI, and the filtering-option in Power BI would make it easy to 'slice-and-dice' this chart, in this case: filtering for a team, and showing the runners in that team. To be able to do this, I had to convert the Excel-column with the finish-times in 'duration-format' ([h]:mm:ss) to seconds, see column Tfinish2 in attached file 10Km3.xlsx and fig.4.

Fig.4: Excel with new column Tfinish2 (time in seconds)

And for the final result in Power BI (Desktop), see fig.5.
Note that slicer for team 'dimension' has a pre-filter (in this case with value 'Raet') to limit the number of items in the slicer (very helpful when there are over 1000 teams). And to filter for >1 team (like I did, with 3 Raet-teams 'checked'): press Control + click in checks for filter values. And note that the chart is reacting on the filter (so Power BI filter and R-chart are connected).

Fig.5 Power BI Desktop: report finish-times (in second) filtered for teams of Raet

I also uploaded the Power BI Desktop report to Power BI online, see:


To be able to use R-Stats/Charts in Power BI online, you need to have a Power BI Pro account. About how to get a free trial-account, see:


With Power BI Online it is also possible to do Q&A in natural language (English). See fig.6-7. For this example to work, I had to rename the Excel-columns Runner -> RunnerName and Team -> TeamName, see:


Fig.6 Power BI and Q&A 1/2

Fig.7 Power BI and Q&A 2/2





v1: https://goo.gl/gdjPRk



v2b: (better chart than v2)

9 Dec 2016

PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year

#58 PowerBI Dashboard for Messi´s worldrecord of 91 goals in one year

- This post is not about an Excel I made (but about a Microsoft Power BI dashboard I made, which is (almost) a copy of an Excel workbook I blogged about earlier), so I tagged it with the 'More' category.

Once I made an Excel with a PowerView dashboard for Messi's worldrecord of #goals in one year (91), see:


In this post, I made a new version of this dashboard, but now in Microsoft PowerBI, and also with some improvements, e.g. the treemap-view for all matches in which Messi scored, aggregated by the team and competition where he made his goals. NB: For more info about the Treemap-chart, see my previous post:


See fig.1 for the end-result, in PowerBI Desktop.

Fig.1 Messi-Dashboard in PowerBI Desktop

The nice thing of PowerBI-dashboards is that they are suited for any device, so apart from PC also smartphone or tablet, see fig.2, in PowerBI-online (cloud-service), so this is a site for sharing your dashboards, for mine, see:


This URL Power BI gave me after choosing menu-option: "File > Publish to web", which should make the dashboard 'public', see:

Or you can use this URL: https://app.powerbi.com/groups/me/dashboards/729276ac-365c-4473-9d42-1969ddd16dac

There is also a Power BI Community page which has a 'Data Stories Gallery', and I posted my 'Messi-dashboard' here:

And here I found another nice Messi-dashboard:

So this is similar to this page of the Tableau-community:

Fig.2: Messi-Dashboard in PowerBI (online) on PC and smartphone

2 Nice build-in features of Power BI are:
- Quick Insight in the dataset (source of dashboard), which generates several 'data-profile' charts, see fig.2b
- Natural language Q&A, user types his question (in English), using some 'keywords' like 'by' to specify the 'slice-and-dice' dimensions, and Power BI answers (with charts) , see fig.2c.

For more details about these features (and Power BI in general), see:

Fig.2b: Quick Insight

Fig.2c: Natural language Q&A

As said in my previous blog-post, my Messi-dashboard was inspired by one made with BI/Dashboard-tool Tableau  (for which PowerBI was Microsofts answer/alternative), see fig.3 for the this version.

Fig.3: original Messi-dashboard in (free trial version of ) Tableau

To see another really nice example of a PowerBI-report with the results of the Spanish soccer League made by a friend, see fig.4 and his site: http://Ligasantanderbi.azurewebsites.net

Fig.4: PowerBI (with Azure) example with results Spanish soccer League

So with Power BI you can make really nice dashboards, and with the frequent updates/releases of Power BI, much more frequent than that of MS Office (Excel), one could wonder if for 'self-service BI' Power BI is the way to go (and not Excel with its 'Power-plugins' like Power View, Power Map etc.) Some say it is, see e.g.:


while others say you can/should use the 'pack' 'Power BI + Exel', see:

And what do you think? Comments are welcome (but please not anonymously, because I get a lot of 'fake' comments (publicity) from anonymous users, so I don't read anonymous comments.

Note 12/12/2016:
Although I haven't got any comments on this post (yet), apparently the 'editor' of The #PowerBI Daily (on www.paper.li), liked this post, because he published it here today, see fig.5.
NB: Paper.li is a site to create your own (virtual) newspaper of tweets of persons you follow, for more info, see:

Fig.5: Blog-post referenced on  The #PowerBI Daily (on www.paper.li) 


-1: Good explanation of what is "Business Intelligence (BI): Multidimensional Analysis"

-2: A 'BI-solution' actually consists of 2 components: a Data Warehouse (DW), the (invisible) back-room ('the database') and BI, the (visual) front-room (the dashboard), where DW is normally about 80% of the solution, and BI 20 %, so it is the top of the ice-berg, as described here:

Pbix-file, so PowerBI-report:


17 Jul 2016

Treemap and Sunburst Excel2016 charts for Tour De France

#57 Treemap and Sunburst Excel2016 charts for Tour De France

In 2015 Tour de France contracted the company Dimension Data to make the event as high tech as possible. As DD said themselves: 

It has completed its big data analytics and digital delivery platform from which the company will deliver real-time information on individual riders for the first time in the history of professional cycling. This major innovation was made possible by Amaury Sport Organisation (ASO) in partnership with the 22 teams participating in the 2015 Tour de France. Highly accurate data will be collected through the use of live trackers under the saddle of each rider. Dimension Data will then process and analyse the data, and make it available to cycling fans, commentators, broadcasters and the media.

When the Tour de France begins on Saturday, the viewing public around the world will be able to follow all 198 riders in 22 teams real-time, and be able to track the speed at which each cyclist is riding, exactly where he’s positioned in the race in relation to other cyclists, and the distance between each rider - all via a beta live tracking website.

And I think they did an excellent job, and also in this year's edition.
On one of their Twitter-posts:
 I once saw a graphic (data: total % of time a cyclist was leading in the breakaway group) which I didn't know, and which was something which could not be made with Excel (2013), but a bit later, which the launch of Excel-2016, I learnt that this graph was a Treemap, so now supported by Excel, and I tried to reproduce, see e.g. fig.1

fig.1 Treemap chart wit data from teams and cyclists

Besides the Treemap chart, Excel 2016 had another new graph, the Sunburst chart, see fig.2 for an example

fig.2 Sunburst chart wit data from countries and cyclists

Both the Treemap and the Sunburst-chart are for displaying hierarchical data, so e.g. in Tour de France, cyclists are part of a team, so you can compare the teams performances (total times) and breaking it down to the indivual cyclists performances, as done in fig.1. Another possible grouping is cyclists per country as in fig.2, although this is more artificial, so not something which is considered by the Tour de France, while grouping by teams is of course (there is a prize for the best team). 

And if fig.3 you can see the equivalent of fig.2, but now in Treemap format. Maybe this format is handier if you have less space, like e.g. Dimension Data had, where this chart was part of a dashboard with other charts.

 fig.3 Treemap chart wit data from countries and cyclists

And in fig.4 you can see the original Dimension Data Treemap, which in fact did not have any hierarchical data, just data of cyclists. In this chart it is of course the easiest to compare data of 2 riders, e.g. the 2 best riders in the breakaway-group were Roland and Rodriguez (both in left of chart, with same square-size).

fig.4: Treemap charta with data from cycists.

If you want to read more about Big Data brought by Dimension Data to Tour de France, see e.g.:

And for a video explaining more about Treemap charts in Excel, see: 

For me, the new high-tech dimension (also by using onboard cameras) makes the Tour de France much more interesting to follow, and of course the fact that the Dutch are doing very well in TDF2016 (Tom Dumoulin 2 stage victories and Bouke
Mollema #2 in general ranking).




5 Jun 2016

Month-calendar for June 2016 in Excel

#56 Month-calendar for June 2016 in Excel

In this post the new month-calendar for June, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is UEFA Euro 2016 France, hold on 10/6 - 10/7. Unfortunately the Netherlands did not qualify, so this time I was not so motivated to make a new version of Match for this Eurocup, or a new Excel for an office-pool, like I did for last FIFA Worldcup. But of course you can do it by yourself, by adapting these 2 files (Access and Excel):



For a very nice Excel with a EURO 2016 tracker, which automatically calculates the points, who go to next round etc (by using formulas), see:


fig.1: Month calendar 201606



2 May 2016

Month-calendar for May 2016 in Excel

#55 Month-calendar for May 2016 in Excel

In this post the new month-calendar for May, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is Roland Garros, the French Open ATP tennis tournament, 22/5-5/6.

 fig.1: month-calendar May 2016



2 Apr 2016

Month-calendar for April 2016 in Excel

#54 Month-calendar for April 2016 in Excel

In this post the new month-calendar for April, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is 'El Clasico', Barcelona vs Real Madrid, Sat. April 2, in the stadium of Barca, Nou Camp, in which there will be a 'remembrance-mosaic' with the Barca-shirt with number 14´, the number of Johan Cruyff, the Dutch legendary player and coach of this club, who died last March 24. Cruyff was besides for his football also famous for his oracle-like aphorisms, like "every disavantage has its advantage" and his logic like : I do not believe. In Spain  all 22 players make the sign of the cross before they enter the field, and if it would work, it would always be a draw." , see:


Fig.1: calendar April 2016

12 Mar 2016

Month-calendar for March2016 in Excel

#53 Month-calendar for March2016 in Excel

In this post the new month-calendar for March2016, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is la Volta Ciclista a Catalunya 2016 (in Spain). For more info about this event , see:

fig.1: calendar March 2016


29 Feb 2016

20Minutos Blog Awards 2015 Statistics with Excel and R

#52: 20Minutos Blog Awards 2015 Statistics with Excel and R

NB: Si prefieres leer este blog en español, recuerda que con Google-Blogger (el herramiento usado para este blog) es posible traducir este blog de ingles a español. El 'Google Traductor' esta en el menu a la derecha.

The Spanish newspaper 20Minutos, organizes the 10th edition of its Blog Awards ("Premios 20Blogs"), with 7652 participating blogs from (mostly) Spanish speaking bloggers. Until 2nd of March everybody with a 20Minutos account can vote 20 times, one time for every blog-category, and the blog with the most votes wins the public prize, although the jury decides who will be the winner of the 1st prize. And like last year, I'm participating with this blog. 20Minutos allows you to promote your own blog to attract votes in the Blog Awards, so: if you like my blog, you can vote for it here:


To do so, you have to register here:


But enough self-promotion, let´s start with what this blog is about: MS Excel. For the 2014-blog-competition I made an Excel with some statistics, see:

and for this year's edition I did the same, but this time I made other statistics, and not only with Excel but also with R. My idea was to get into 1 Excel all the blog-records (with data like URL, Author, Title, Description and Category in which blog is participating), and to get all this data, I used the (free) tool https://www.import.io/ for web-scraping the 20Min. website for the blog-contest: http://lablogoteca.20minutos.es/premios-20blogs/

For more info about this great data-extract tool, see e.g.:



The result of the import had some 300 duplicate records, but with Excel it's easy to eliminate duplicates, see e.g.:


To this dataset I added some extra calculated fields like country of blog (only posible if domain-name of website has a country-code like "es" (Spain) and blog-platform (Blogger, Wordpress etc), see fig.2a.
NB: My (V)lookup-table with the mapping country-code - country I got from here:

And with this dataset, I made the statistics which you can see in the figures 2b, 3 etc below, using pivot-tables and pivot-charts. But before getting into more detail about these stats I present in fig.1 a Word Cloud for which I used R (the script is included in the Download-section below). I used R to make the world cloud (instead of using a site like http://www.wordle.net/) to learn a bit about R, and because of it's extensive Text Mining capabilities. Some sites which helped me with this are:



To make this word cloud, I used the column Description (of a blog), a 'memo-field'.
Not suprisingly, the most used word is... 'Blog', and then comes 'World', 'Life'', 'Ínformation'.


fig.1 Word Cloud of Blog-descriptions of all blogs participating in 20Min. Blog-contest.

fig.2a: Data imported in Excel from 20Min. (formula-columns which are not interesting for user are hidden).

fig.2b: total Blogs per category'.

NB: Note that the totals in this chart correspond almost 100% to the ones on the 20Min.site (http://lablogoteca.20minutos.es/premios-20blogs/ ), as it should, but with a difference of 1 (in my stats, category Viajes has 404 records, not 405 as on the 20Min. site, I wander if that is because of an error in my data-extraction from this site or maybe an error by 20Min.?)

fig.3: Total Blogs per Platform

NB: Google's Blogger (domain Blogspot) appear more popular then Wordpress.

fig.4: Total Blogs per Country

NB: for those of you who have never heard of Tokelau: this (is)land offers a 1 year free (tk) domain (for everyone, so not only for the people of TK), after that you have to pay, or they close your site, which happened to my site, http://www.eigersoftware.tk ..

fig.5: Total Blogs per Author

- this pivot-table report filters authors with >= 7 blogs, for how you can do this filtering, see:

- 'Grupo LM' has a total of 28 blogs in the competition, and I found out that this is a company which pays bloggers if they send them a blog-post. To see all 28 blogs, double-click on measure-value '28' in the pivot-table, and then Excels opens a new worksheet with the 28 records.
I wonder why the blog-authors who work for LM did not sign up for the blog-contest with their own name.

And to conclude this post, I´d like to say good luck ('mucho suerte') to all my fellow-bloggers in the "Premios 20Blogs" Blog Awards.
BTW: To decide on which blog to vote, I also searched for Éxcel' in my Excel and found several blogs, e.g. this excelent blog, from Argentina (on which I voted):



#Mirror 1: Google Drive (zip file with Excel and R-script files):


#Mirror 2: Microsoft Onedrive (Excel file):


#Mirror 3: Scribd.com (PDF file):


2 Feb 2016

Month-calendar for February 2016 in Excel

#51 Month-calendar for February 2016 in Excel

In this post the new month-calendar for February 2016, see fig.1. The Excel also has the year-calendar for 2016 and a 2016-year-planner. Event of this month is Lillehammer 2016 Youth Olympic Games (YOG), the Wintergames for children from 15-18 years. For more info about this event , see:


fig,1 Calendar for month February 2016



10 Jan 2016

Timesheet for worked hours 2016 and PowerView report with stats

#50 Timesheet for worked hours 2016 and PowerView report with stats

 This post is about the new version (V3) of the Excel I made  to track your worked hours, the "Timesheet 2016". For my post about the previous version (V2), see:


Timesheet-V3 does not have any new features comparing with V2, the only difference is that V3 has the 2016-calendar (V2 was for 2015). So in this post I won´t repeat the explanation about the timesheet, for this see the post of Timesheet-V2 (2015).

But there is one thing new for the 2016-Timesheet: I made a new Excel with a PowerView report with statistics about worked days and holidays at year-, month- and week-level. The Excel with the Timesheet also has a worksheet with statistics, but these are using the 'traditional' pivot-tables, while the new Excel with PowerView uses PowerPivot Tables which has some new nice features which I´ll show below.

To show how this Timesheet works, I made a demo in which I registered check-in and check-out times for all (working-)days in 2016, and also the days off (vacations, sick, personal days).
If you want to use this Excel to track your worked hours, you should delete all my input for this demo (green columns in sheet-2: 'Tabla').

In fig.1 you can see the Excel with the PowerView report.
- this report shows Month-stat. for Dec. and Week-stat. for week 50 (which starts on Monday (Spanish: Lunes) 22 (Dec.) (see filter-pane, to the right of report-canvas)
- the report-titles are generated, and have some Spanish:
'por' = by
 'y' = and

And in fig.2 you can see the Excel with the Timesheet-data (fig.2) and -statistics (fig.3).

In the rest of this post, the figures are from the Excel with the PowerView report.

fig.1 Excel with Powerview-report for Timesheet

fig.2 : Excel with Timesheet (table)

fig.3 : Excel with Timesheet statistics

In fig.4 you can see that if you filter in the bar-chart Hours Worked per Month for 1 month , e.g. Dec., then also the other report-element at year-level, the pie-chart Days Off, is filtered for this month.

fig.4: filtering in report (month = Dec.)

In fig.5 you can see the result after I clicked in fig.1 in pie-chart Days Off on the blue-part of the pie-chart, labeled Code2: 'VA' (= vacations) and in fig.6-7 you can see how to drill-down in this pie-chart to Code3: VA0 and VA1 (VA0: vacations-days of last year, VA1: vacations-days of actual year).

fig.5: Pie-chart Days Off (not worked)

fig.6: Pie-chart Days Off , config. drill-down Code2-> Code3 (see field: Color)

fig.7: Pie-chart Days Off , result drill-down Code2-> Code3

To get the data from the Timesheet-Excel (worksheet-2, data-table 'Tabla2')  into the PowerView Report-Excel I used PowerQuery (see fig. 8) and I imported the data in the PowerPivot datamodel. After that I added some DAX-formulas in this datamodel, e.g. to calculate the worked time in hours, with decimals (in the Timesheet-Excel, the time is registered in minutes):

HoursWorked := ROUND([Ttotal3]/60;2)

and to get some more 'friendly names' for fields and field-values in the pivot-table, and in English rather than Spanish, I used e.g. this formula:


And in the PowerPivot datamodel I also renamed some fields to more 'friendly names' and marked some fields as 'not visible in client', so that the final-user (you) only has the fields which are usefull in pivot tables (e.g. WorkedHours, but not 'Notes').

fig.8: PowerQuery to import Timesheet data  

For the resulting PowerPivot datamodel, see fig.9-10.

fig.9: PowerPivot datamodel with DAX-formula

fig.10: PowerPivot datamodel with friendly field-names and hidden fields

As you can see, the new Timesheet statistics, with PowerView, are better then the old ones. Some people call Excel with the PowerBI-plugins (PowerPivot, PowerView, PowerQuery and PowerMap) 'Power Excel', and say 'Power Excel' is the best tool for all kinds of data-analysis, which results in an interesting discussion, see:


Some references: 

* For a (free) course about data-analysis with MS Excel, by my old university, TU Delft (Delft University of Technoloy), see:


* For a video with more details about how to create a timesheet with Excel, see:


*For a good video and blog about MS Excel and Powerview (and more Microsoft software), see:




NB: The Excel with the Timesheet-data, which is input for the Excel with the PowerView-report, should be in this dir:


#Mirror 1: Google Drive (zip file with 2 Excel files):

zip-file with 2 Excel files:


#Mirror 2: Microsoft Onedrive (1 Excel file: Timesheet):

NB: this site has 'Excel-Online', so you can view my Excel-doc if you don´t have MS Excel on your PC


#Mirror 3: Scribd.com (1 PDF file: PowerView report):


2 Jan 2016

2016 Year calendar and planner in Excel

#49 2016 Year calendar and planner in Excel

In this post the new year calendar and planner for 2016 and the month-calendar for January, see fig.1-3. For the Spanish readers: the calendar includes all holidays in Madrid-capital, so it´s the 'calendario laboral 2016', see:


or this site, which also has a year-calender (in PDF):


My calendar also has the (ISO) week-numbers (week 1 is the week with the 1st Thursday of the year). And it is multi-language (English, Spanish, French, German, Dutch, Italian, Korean, Chinese).
NB: For more about the Korean and Chinese variant, see earlier 'Calendar-posts' (see my blog-posts with tag/label 'Calendar' at the end of this post).

The graphic I used in this Excel, this time about the new year 2016, which is the year of the monkey in the Chinese horoscope, is from this site :


And for more info about the Monkey- (leap) year 2016, see:


Happy 2016!

fig.1 Month Calendar January 2016

fig.2: Year Calendar 2016


#Mirror 1: Google Drive (zip file with  Excel and PDF files):

#Mirror 2: Microsoft Onedrive (1 Excel file):
NB: this site has 'Excel-Online', so you can view my Excel-doc if you don´t have MS Excel on your PC


#Mirror 3: Scribd.com (1 PDF file):