Free eBook: Bayesian Reasoning and Machine Learning

bayesian-learning-ebookWhile studying for the Coursera Machine Learning lecture I attended last year, my learning partner Dimitris L. recommended we should use the Bayesian Reasoning and Machine Learning book by Prof. David Barber as complementary literature. David Barber is currently a professor in Information Processing in the department of Computer Science UCL where he develops novel information processing schemes, mainly based on the application of probabilistic reasoning. As the title of the book suggests, it is all about the concepts and techniques behind Bayesian reasoning and machine learning:

Abstract

Machine learning methods extract value from vast data sets quickly and with modest resources. They are established tools in a wide range of industrial applications, including search engines, DNA sequencing, stock market analysis, and robot locomotion, and their use is spreading rapidly. Continue reading ‘Free eBook: Bayesian Reasoning and Machine Learning’

What is the Machine Learning class by Prof Ng on Coursera like? My experiences

machine-learning-logoSometime last year in October, I decided to learn more about big data, machine learning and predictive analytics. I gave Coursera a try and enrolled in the 10 weeks  Machine Learning class by Prof Andrew Ng. from Stanford University [1-4]. Prof Ng. is one of the world renowned experts in the field of machine learning, the director of the Stanford AI Lab,  a truly amazing teacher and one of the co-founder of Coursera.

For those who do not know Coursera: Coursera is an educational technology company which is offering free massive open online courses. It has cooperations with universities all around the globe and offers courses in computer science, engineering, physics, humanities, medicine, biology, social sciences, mathematics and business.

Continue reading ‘What is the Machine Learning class by Prof Ng on Coursera like? My experiences’

Using OpenRefine to gain insights into, cluster, clean and enrich messy data

OpenRefine logoImagine the following scenario: You get this file (Excel, CSV, Text, XML,…) containing a list with lots of customer, vendor or project data and you want to structure and clean the data before you can use it to do some analytics, reporting, or other processing steps on it . There are a lot of duplicate entries, names are spelled in different ways, everything is a big mess and a manual clean up will cost you a few hours of your precious time…

Solution

OpenRefine (formerly Google Refine) is a free and open source application which allows you to explore data (generate insights), clean and transform it using powerful scripting possibilities and to reconcile or match it with data from any kind of webservice or databases like Freebase. The possibilities are endless since it is possible to extend your dataset with all kind of data available through webservices. In addition to the core OpenRefine product, a growing list of extensions and plugins  is available. [2]

Continue reading ‘Using OpenRefine to gain insights into, cluster, clean and enrich messy data’

Using SQL WITH clause to create temporary static tables at query time

A few days ago, I came across the following problem: I currently work on a project where I am the responsible of an application which generates entries to a log table every time a job is executed. This table contains a lot of information on statuses of jobs, possible problems, exceptions, duration, aso. I was working on some analytics on this data and needed to enrich the data by the version of the software which generated the log entry (since we were not capturing this in the log table). From our configuration management tool, I was able to extract the dates when which versions of the software was deployed in production

Problem

My intention was to create a temporary table to join onto the  logged entries, but I didn´t want to create the tables on the Oracle server (mainly because they would have been just temporary tables and because the schema-user I was using didn´t have the rights to create tables).

Continue reading ‘Using SQL WITH clause to create temporary static tables at query time’

MySQL: group_concat allows you to easily concatenate the grouped values of a row

Last week I stumbled over a really useful function in MySQL: group_concat allows you to concatenate the data of one column of multiple entries by grouping them by one field field. You can choose the separator to use for the concatenation. The full syntax is as follows:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
             [ORDER BY {unsigned_integer | col_name | expr}
                 [ASC | DESC] [,col_name ...]]
             [SEPARATOR str_val])

According to the MySQL documentation, the function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. To eliminate duplicate values, use the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause.

Continue reading ‘MySQL: group_concat allows you to easily concatenate the grouped values of a row’

The influence of software quality requirements on the suitability of software cost estimation methods

Today I was giving a speech at the 24th International Forum on COCOMO and Systems/Software Cost Modeling held at MIT in Cambridge, MA. I presented the intermediate results I achieved in the research for my master thesis, which is supervised by Dr. Stefan Wagner from TUM and Dr. Ricardo Valerdi from MIT.  Here are the slides, as well as the abstract of the work.

Download

Abstract

4th of July 2009 in Boston – Neil Diamond Sweet Caroline

This week has been rather busy for me, because I moved from my temporary place in Cambridge to my new place in Somerville. Additionally, this Saturday was the US holiday, the 4th of July.  On the eve of the holiday, I went to the Boston Pops Concert 2009 of the Boston Symphony Orchestra and famous artists; among others Neil Diamond.

On the 4th of July I was lucky to see the fireworks from a roof garden of Stefano’s place; who’s also a visiting student at MIT. I took a few videos of the fireworks and Neil Diamond performing Sweet Caroline.

Neil Diamond performing Sweet Caroline on 4th of July 2009 in Boston

Continue reading ’4th of July 2009 in Boston – Neil Diamond Sweet Caroline’