Articles

4. April 2024
#statistics

Ny rekord på Oslo Børs

(Norwegian) Medier rapporterer gjerne om tidsserier: dødsfall i trafikken, strømpriser, arbeidsledighet, EU-oppslutning, fødselstall, valutakurser, renter, og så videre. Ofte blir det meldt om nye rekorder, men hvor uvanlig er det egentlig at nye rekorder blir satt?

1. March 2024
#datascience

Visualizing software development skills with embeddings

We study a dataset consisting of software developers and their skills. The data is a large binary matrix, which we factorize to predict the probability that each developer has each skill. From the model we obtain an embedding, which we use to visualize skills in a two-dimensional space.

12. February 2024
#datascience

An opinionated guide to scikit-learn

A complete notebook with Python code, showing how to write advanced scikit-learn code, using custom Transformers, custom metrics, cross validators, etc.

21. January 2024
#optimization, #datascience

Ridge regression with a link function

Suppose you have a target variable that’s bounded, such as a score between 0 and 1. How can Ridge regression be generalized to predict values in this interval? We create some optimization routines and present full Python code.

3. January 2024
#datascience

The ten commandments of data science

Here are the ten commandments of data science, encapsulating what I’ve learned in my career so far.

3. December 2023
#datascience

Betting on football - results

Throughout 2023 I’ve been using math to make bets on football matches. I created one statistical model to predict outcome probabilities and one optimization model to size bets.

3. December 2023
#optimization

Betting on football with the Kelly criterion

Given a set of football (soccer) matches and three outcomes per match to bet on, how should we place our money for maximal long-term growth? We generalize John Kelly’s idea from 1956 and solve it with modern optimization methods.

2. November 2023
#datascience

Lønna til norske utviklere i 2023

(Norwegian) Hvilke faktorer bestemmer lønna til en utvikler? Vi analyserer et datasett fra kode24. Utviklerne med høyest lønn har lang erfaring, høy utdannelse, er konsulenter og jobber i Oslo.

11. October 2023
#datascience, #strength

Relative strength - Wilks, IPF GL and allometry

In powerlifting, formulas are used to assess and compare lifters with different body weights. In this article we evaluate two such formulas: Wilks and IPF GL. Then we generalize them and propose an alternative. We also create a formula that accounts for age and training experience.

19. September 2023
#statistics

Wanna see my collection of random numbers?

Presentation slides, YouTube recording and source code for a talk about Monte Carlo simulation and computation-based statistics.

24. August 2023
#datascience

Politikk og meningsrommet

(Norwegian) Vi ser på data fra NRKs valgomat. Hvor like er partiene? Hvilke partier kan grupperes sammen? Hvilke retninger beskriver politikkens meningsrom best?

23. July 2023
#datascience

Fair ticket controls

A public transportation company in Norway uses AI in their ticket control process. They claim to have put special focus on the ethical use of AI. But does an obvious ethical solution to such a problem exist? I don’t think so.

1. July 2023
#datascience

Fiktive norske navn

(Norwegian) Digfrid, Emmund, Harbjørg, Joannica, Olfine og Trestina høres kanskje ut som norske navn - men det er de ikke. I denne artikkelen laster vi ned 5703 norske navn og trener en språkmodell til å generere nye, fiktive navn.

3. June 2023
#statistics

Å forklare gruppeforskjeller

(Norwegian) Vi undersøker forskjellen mellom gutter og jenters grunnskolepoeng. Det er et stort overlapp mellom gruppene, men samtidig er det store deler som ikke overlapper. Hvordan kan vi få en intuisjon for hvor stor forskjellen er?

6. May 2023
#datascience, #statistics

Ranking doctors

The Norwegian website Legelisten.no contains reviews and ratings for doctors (general practitioners). How should doctors be ranked, considering that some of them have very few ratings?

13. April 2023
#statistics

Bayesianske fiskevann

(Norwegian) I Oslomarka er det hundrevis av fiskevann. Kan vi bruke rapportert fangst til å finne ut hvilke vann som har størst fisk, selv om det er svært få observasjoner per vann?

11. March 2023
#datascience

Random Rotation Ensembles in Python

An implementation of the Random Rotation Ensemble in Python, using scikit-learn. The idea is to train an ensemble of trees on rotations of the data set, then average their predictions.

3. February 2023
#statistics

Three Monte Carlo permutation tests

We Monte Carlo permutation tests to investigate three problems: (1) rats on restricted and free diets (differences between groups), (2) crime rates (differences between paired observations) and (3) the relationship between height and pulse (correlation between paired observations).

5. January 2023
#statistics

En pose twist

(Norwegian) Jeg åpnet en pose twist og fikk én lakris og seks karamell med sjokoladefyll. Er det fremdeles rimelig å anta at det er like stor sannsynlighet for å få hver type?

13. December 2022
#mathematics

Cracking a puzzle

We use group theory to solve a coding competition puzzle.

1 of 3
Next