Archive for February, 2014

What is the Machine Learning class by Prof Ng on Coursera like? My experiences

machine-learning-logoSometime last year in October, I decided to learn more about big data, machine learning and predictive analytics. I gave Coursera a try and enrolled in the 10 weeks  Machine Learning class by Prof Andrew Ng. from Stanford University [1-4]. Prof Ng. is one of the world renowned experts in the field of machine learning, the director of the Stanford AI Lab,  a truly amazing teacher and one of the co-founder of Coursera.

For those who do not know Coursera: Coursera is an educational technology company which is offering free massive open online courses. It has cooperations with universities all around the globe and offers courses in computer science, engineering, physics, humanities, medicine, biology, social sciences, mathematics and business.

Continue reading ‘What is the Machine Learning class by Prof Ng on Coursera like? My experiences’

Using OpenRefine to gain insights into, cluster, clean and enrich messy data

OpenRefine logoImagine the following scenario: You get this file (Excel, CSV, Text, XML,…) containing a list with lots of customer, vendor or project data and you want to structure and clean the data before you can use it to do some analytics, reporting, or other processing steps on it . There are a lot of duplicate entries, names are spelled in different ways, everything is a big mess and a manual clean up will cost you a few hours of your precious time…

Solution

OpenRefine (formerly Google Refine) is a free and open source application which allows you to explore data (generate insights), clean and transform it using powerful scripting possibilities and to reconcile or match it with data from any kind of webservice or databases like Freebase. The possibilities are endless since it is possible to extend your dataset with all kind of data available through webservices. In addition to the core OpenRefine product, a growing list of extensions and plugins  is available. [2]

Continue reading ‘Using OpenRefine to gain insights into, cluster, clean and enrich messy data’

Using SQL WITH clause to create temporary static tables at query time

A few days ago, I came across the following problem: I currently work on a project where I am the responsible of an application which generates entries to a log table every time a job is executed. This table contains a lot of information on statuses of jobs, possible problems, exceptions, duration, aso. I was working on some analytics on this data and needed to enrich the data by the version of the software which generated the log entry (since we were not capturing this in the log table). From our configuration management tool, I was able to extract the dates when which versions of the software was deployed in production

Problem

My intention was to create a temporary table to join onto the  logged entries, but I didn´t want to create the tables on the Oracle server (mainly because they would have been just temporary tables and because the schema-user I was using didn´t have the rights to create tables).

Continue reading ‘Using SQL WITH clause to create temporary static tables at query time’