The main results on the limiting distributions of incomplete U-statistics were developed in Blom (1976)1 and Janson (1984)2; Lee (1990)3 gives a summary. However, for my taste, the proofs in Janson (1984) are somewhat hard to read. Lee (1990) improves upon those but has some inaccuracies---the main …
When preparing a table with experimental results for publication, one often wishes to highlight the output of extreme cells, for example by putting them in bold.
However, pandas does not readily support this use case. The closest
one gets with stock pandas is table.style.highlight_max(axis=1)
which highlights …
Here is just a short note on how to do an affine transformation in python succinctly.
Code
import numpy as np
import matplotlib.pyplot as plt
data = np.ones((1000,3)) # needed for the transformation
data[:,0:2] = np.random.uniform(low=-1,high=1,size=(len(data),2)) # create …
This article explains the well-known LOF-algorithm. We provide intuition for density-based outlier detection, show the problems inherent to this task and then take a look at how LOF solves these problems.
Motivation
Outlier detection is important for many real world applications - typical examples include fraud detection, network intrusion detection or …
This post summarizes how one uses the repertoire method (as presented in Concrete Mathematics by Graham, Knuth and Patashnik). First we look at the repertoire method without the need for a radix-based solution and afterwards we discuss the solution given in the book for Exercise 16.
General method
Suppose we …
Professionally I am a heavy eclipse user. I heard that IntelliJ is better by now but I have yet to make the switch. However, the focus of eclipse clearly is Java programming. For my own projects I tend to use python - and while vim is awesome I prefer the support …
For my thesis I want the plots from jupyter notebook to integrate well with the rest of the LaTeX document. The article shows the settings necessary to achieve this look consistently by modifying the matplotlibrc
. Using this approach the correct settings are always used and I can't forget them.
As part of a practical course from IPD we took part in this years' Data Mining Cup (DMC) sponsored by prodsys.
The DMC is a yearly competition where teams from universities around the world try to solve a data mining task. To quote the task from the official website:
The …
Wie schreibt man einen ausfallsicheren Chat für ein Uni-Projekt? Der Post detailliert die mögliche Umsetzung eines verteilten Chat-Systems auf Basis von Redis mit einem Java-Client. Der Load Balancer bleibt als Single Point of Failure - eine mögliche Lösung sind Elastic IP Adressen.
A short summary of the steps necessary to install vagrant on a linux box and to start the first virtual machine with a precise32 base image.
As I am using vagrant for a university project I try to put the virtual machine onto a ramdisk for faster startup times. The post shows that when loading the machine from a ramdisk the startup time does improve. However, the machine itself does not run faster.