Tag Archive for 'data'

What is the Machine Learning class by Prof Ng on Coursera like? My experiences

machine-learning-logoSometime last year in October, I decided to learn more about big data, machine learning and predictive analytics. I gave Coursera a try and enrolled in the 10 weeks  Machine Learning class by Prof Andrew Ng. from Stanford University [1-4]. Prof Ng. is one of the world renowned experts in the field of machine learning, the director of the Stanford AI Lab,  a truly amazing teacher and one of the co-founder of Coursera.

For those who do not know Coursera: Coursera is an educational technology company which is offering free massive open online courses. It has cooperations with universities all around the globe and offers courses in computer science, engineering, physics, humanities, medicine, biology, social sciences, mathematics and business.

Continue reading ‘What is the Machine Learning class by Prof Ng on Coursera like? My experiences’

Using OpenRefine to gain insights into, cluster, clean and enrich messy data

OpenRefine logoImagine the following scenario: You get this file (Excel, CSV, Text, XML,…) containing a list with lots of customer, vendor or project data and you want to structure and clean the data before you can use it to do some analytics, reporting, or other processing steps on it . There are a lot of duplicate entries, names are spelled in different ways, everything is a big mess and a manual clean up will cost you a few hours of your precious time…

Solution

OpenRefine (formerly Google Refine) is a free and open source application which allows you to explore data (generate insights), clean and transform it using powerful scripting possibilities and to reconcile or match it with data from any kind of webservice or databases like Freebase. The possibilities are endless since it is possible to extend your dataset with all kind of data available through webservices. In addition to the core OpenRefine product, a growing list of extensions and plugins  is available. [2]

Continue reading ‘Using OpenRefine to gain insights into, cluster, clean and enrich messy data’