Tag Archive for 'software'

Using OpenRefine to gain insights into, cluster, clean and enrich messy data

OpenRefine logoImagine the following scenario: You get this file (Excel, CSV, Text, XML,…) containing a list with lots of customer, vendor or project data and you want to structure and clean the data before you can use it to do some analytics, reporting, or other processing steps on it . There are a lot of duplicate entries, names are spelled in different ways, everything is a big mess and a manual clean up will cost you a few hours of your precious time…

Solution

OpenRefine (formerly Google Refine) is a free and open source application which allows you to explore data (generate insights), clean and transform it using powerful scripting possibilities and to reconcile or match it with data from any kind of webservice or databases like Freebase. The possibilities are endless since it is possible to extend your dataset with all kind of data available through webservices. In addition to the core OpenRefine product, a growing list of extensions and plugins  is available. [2]

Continue reading ‘Using OpenRefine to gain insights into, cluster, clean and enrich messy data’

Using SQL WITH clause to create temporary static tables at query time

A few days ago, I came across the following problem: I currently work on a project where I am the responsible of an application which generates entries to a log table every time a job is executed. This table contains a lot of information on statuses of jobs, possible problems, exceptions, duration, aso. I was working on some analytics on this data and needed to enrich the data by the version of the software which generated the log entry (since we were not capturing this in the log table). From our configuration management tool, I was able to extract the dates when which versions of the software was deployed in production

Problem

My intention was to create a temporary table to join onto the  logged entries, but I didn´t want to create the tables on the Oracle server (mainly because they would have been just temporary tables and because the schema-user I was using didn´t have the rights to create tables).

Continue reading ‘Using SQL WITH clause to create temporary static tables at query time’

MySQL: group_concat allows you to easily concatenate the grouped values of a row

Last week I stumbled over a really useful function in MySQL: group_concat allows you to concatenate the data of one column of multiple entries by grouping them by one field field. You can choose the separator to use for the concatenation. The full syntax is as follows:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
             [ORDER BY {unsigned_integer | col_name | expr}
                 [ASC | DESC] [,col_name ...]]
             [SEPARATOR str_val])

According to the MySQL documentation, the function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULL values. To eliminate duplicate values, use the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause.

Continue reading ‘MySQL: group_concat allows you to easily concatenate the grouped values of a row’

Cost/Benefit-Aspects of Software Quality Assurance

As software becomes more and more pervasive, high software quality as well as the ability to perform good software cost estimates become more and more important. It is obvious that business owners want the software to run smoothly, deliver value and obviously, they want to know what building or adapting a software system costs upfront.

This is why, in summer 2008, I took part in a seminar on software quality at the chair of Univ.-Prof. Dr. Dr. h.c. Manfred Broy, Technische Universität München. I did extensive research on software quality in general and wrote a paper on the Cost/Benefit aspects of Software Quality Assurance, which I want to present you. The paper points out several interesting aspects on how to optimize investments into various software quality assurance techniques and thus into software quality.

Because of the high quality of the papers written by the seminar participants, the seminar supervisors decided to officially publish the results as working paper of the Technische Universität München. You can find the link to the publication in the links-section at the end of this article.

Please feel free to share your thoughts on this paper.

Cost/Benefit-Aspects of Software Quality Assurance – Abstract:

Along with the ever more apparent importance and critically of software systems for modern societies, arises the urgent need to deal efficiently with the quality assurance of these systems. Even though the necessity of investments into software quality should not be underestimated, it seems economically unwise to invest seemingly random amounts of money into quality assurance. The precise prediction of the costs and benefits of various software quality assurance techniques within a particular project allows for economically sound decision-making.

This paper presents the cost estimation models COCOMO, its successor COCOMO II and COUALMO, which is a quality estimation model and has been derived from COCOMO II. Furthermore an analytical idealized model of defect detection techniques is presented. It provides a range of metrics: the return on investment rate (ROI) of software quality assurance for example. The method of ROI calculation is exemplified in this paper.

In conclusion an overview on the debate concerning quality and cost ascertaining in general will be given. Although today there are a number of techniques to verify the cost-effectiveness of quality assurance, the results are thus far often unsatisfactory. Since all known models make heavy use of empirically gained data, it is very important to question their results judiciously and avoid misreadings.

Download the software cost estimation and quality assurance paper:

Cost/Benefit-Aspects of Software Quality Assurance

Continue reading ‘Cost/Benefit-Aspects of Software Quality Assurance’

SQL Connection Strings

In the last couple of weeks I have been working a lot with different databases which I had to connect to from Java. It is sometime stressful to look up the right format of the connection string to the database. Even though, these strings should, or are meant to be standardized, they are not.

I found this very helpful website ConnectionStrings.com which lists the connection strings for open-source as well as professional commercial databases. This list includes among others, the connection strings for Microsoft SQL Server 2008, MySQL, Oracle, IBM DB2, Informix, Postgre SQL, Caché, SQLite, …

As an example, if you want to connect to a server in a replicated server configuration without concern on which server to use, use the following connection string:

Server=serverAddress1, serverAddress2, serverAddress3;Database=myDataBase;
Uid=myUsername;Pwd=myPassword;

I want to share this information with you, because it can save you a lot of time, looking up those strings in tutorials or in the documentation of the different databases. Continue reading ‘SQL Connection Strings’

Nokia E51 – VPN Installation Tutorial

Nokia E51 vpn tutorialWie versprochen werde ich auch mal auf Deutsch bloggen, also hier mein erster deutscher Blogeintrag: Diese Woche habe ich mir ein neues Spielzeug gegönnt, das Nokia E51. Ich will an dieser Stelle nicht unbedingt über die Vorzüge des Nokia E51 referieren und das Handy im Detail erklären, vielmehr will ich ein kleines Tutorial bereit stellen, wie man einen VPN Client auf dem Nokia E51 installiert und wie man sich mit einem VPN Netz verbindet. Da ich für die Installation und Konfiguration des VPN-Clients keine ordentliche Beschreibung im Internet gefunden habe und einige Zeit investiert hab, dachte ich mir ich schreib dieses Turorial inkl. Screenshots um Ihnen die Installation zu erleichtern.

Continue reading ‘Nokia E51 – VPN Installation Tutorial’

Evaluating the Architectural Coverage of Runtime Traces

  • This post contains a downloadable version of my Bachelor Thesis I wrote to complete my studies in computer science at the Technical University Kaiserslautern. The thesis was conducted externally at the Product-Line Engineering Department of the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern.

    Continue reading ‘Evaluating the Architectural Coverage of Runtime Traces’