Archive for the 'tutorial' Category

Using a regular expression to specify stop words in Weka Machine Learning from Java

In the following article, I want to share some Java code with you on how to use stop words based on a regular expression in Weka. Weka is a collection of machine learning algorithms for data mining tasks written in Java. The algorithms can either be applied directly to a dataset or called from your own Java code [1].  This article refers to algorithms being called directly from Java – not from the Weka Explorer.

Problem: Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary. These words are called stop words. [2]  Weka offers several options to specify Stopwords – but a single regular expression is not part of the default implementations of the StopwordsHandler.

Implementations of StopwordsHandler
Solution: The following simple implementation of the StopwordsHandler solves the problem:

import weka.core.stopwords.StopwordsHandler;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExStopwords implements StopwordsHandler {
    private final Pattern pattern;
    public RegExStopwords(String regexString) {
        pattern = Pattern.compile(regexString);
    }
    @Override
    public boolean isStopword(String s) {
        Matcher matcher = pattern.matcher(s);
        return matcher.find();
    }
}

You can then add the regular expression based stopwords to different Filters – In this case a StringToWordVector:

       StringToWordVector filter = new StringToWordVector();
       filter.setStopwordsHandler(new RegExStopwords("([0-9]|@|n\\/a|[\\%\\€\\$\\£])"));
       ...
       filter.setIDFTransform(true);
       filter.setTFTransform(true);
       ...

Version: This code has been tested with the following development version of Weka. (Use the following Maven dependency)

<dependency>
    <groupId>nz.ac.waikato.cms.weka</groupId>
    <artifactId>weka-dev</artifactId>
    <version>3.7.13</version>
</dependency>

References
[1] Weka 3 – Data Mining with Open Source Machine Learning Software in Java –  http://www.cs.waikato.ac.nz/ml/weka/
[2] Stop Words http://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html

Nokia E51 – VPN Installation Tutorial

Nokia E51 vpn tutorialWie versprochen werde ich auch mal auf Deutsch bloggen, also hier mein erster deutscher Blogeintrag: Diese Woche habe ich mir ein neues Spielzeug gegönnt, das Nokia E51. Ich will an dieser Stelle nicht unbedingt über die Vorzüge des Nokia E51 referieren und das Handy im Detail erklären, vielmehr will ich ein kleines Tutorial bereit stellen, wie man einen VPN Client auf dem Nokia E51 installiert und wie man sich mit einem VPN Netz verbindet. Da ich für die Installation und Konfiguration des VPN-Clients keine ordentliche Beschreibung im Internet gefunden habe und einige Zeit investiert hab, dachte ich mir ich schreib dieses Turorial inkl. Screenshots um Ihnen die Installation zu erleichtern.

Continue reading ‘Nokia E51 – VPN Installation Tutorial’