Learning and Model Fitting

Optimization Problems

1. Classification

Anomaly Detection Classification is the problem of correctly identifying a class label to an observation based only on a set of input variables. Machine learning algorithms learn to complete this task based on a training set of data containing observations with known class labels (outputs), which are described by a set of (input) attributes. Our studies have focused on building an instance space for benchmark classification problems (from the UCI and OpenML repositories) using a comprehensive set of features to characterise variations in instances that may present challenges for a set of 10 classification algorithms. Our instance space shows that the existing benchmarks are not as diverse as desired for insightful analysis of algorithm strengths and weaknesses, and we have proposed some new ideas for generating more diverse test instances.

Research Publications Downloads Instance Space Analysis
Muñoz, M. A., Villanova, L., Baatar, D. and Smith-Miles, K. A., "Instance Spaces for Machine Learning Classification", Machine Learning, vol. 107, no. 1, pp. 109-147, 2018. Metadata
Instances
Code for Feature Selection
  1. Statistical and Information Theory Features
  2. DCoL

2. Regression

Anomaly Detection Regression is a machine learning approach based on supervised learning that aims to predict a continuous-valued target dependent variable based on a set of independent input variables or attributes. A variety of statistical, mathematical and computer science methods are available, each of which makes different assumptions about the underlying relationship between the dependent and independent variables. Our studies have focused on building an instance space to show whether the existing benchmarks can adequately explain variation in approaches, and converting problems from other fields into regression problems to augment the diversity of the instance space.

Research Publications Downloads Instance Space Analysis
Coming Soon Coming Soon

3. Anomaly Detection

Anomaly Detection Anomaly detection methods are used to identify unusual patterns that do not conform to expected behavior, called outliers. Our studies have focused on characterising the benchmark instances using novel features, and exploring the impact of normalisation schemes on the success of various methods for outlier detection.

Research Publications Downloads Instance Space Analysis
Kandanaarachchi, S., Muñoz, M.A., Hyndman, R.J. et al. "On normalization and algorithm selection for unsupervised outlier detection" Data Min Knowl Disc (2019) doi:10.1007/s10618-019-00661-z Metadata (Used in the paper)
Metadata (Complete)
Code for Feature Selection

4. Time Series Forecasting

Anomaly Detection A time series is a sequence of discrete-time data. Time series forecasting builds a model to predict future values based on previously observed values. Our studies of time series forecasting have focused on developing useful features to globally characterise time series, and then using these features to construct an instance space of the well-studied M3 competition time series. We have filled the instance space with 10,000 new time series exhibiting a wide range of characteristics, enabling the strengths and weaknesses of forecasting methods to be better described.

Research Publications Downloads Instance Space Analysis
Kang, Y., Hyndman, R. and Smith-Miles, K., "Visualising Forecasting Algorithm Performance using Time Series Instance Spaces", International Journal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

Wang, X., Smith, K. A., Hyndman, R., “Characteristic-based Clustering for Time Series Data", Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 335-364, 2006.
Metadata
Instances (M3)

Metadata (Evolved Time Series)
Instances (Evolved Time Series)
Features Extraction Code

5. Facial Age Estimation

Facial images contain much information about an individual: their identity, gender, mood, and their age. Various methods have been proposed for estimating the age of a person based on their face, using databases with known age labels including FG-NET, MORPH and MORPH2. Our early study in 2007 focused on developing new facial age estimation methods and comparing to state-of-the-art approaches including tailored methods and generic machine learning approaches. We are currently revisiting this study in light of instance space analysis to understand how the performance of algorithms depends on the characteristics of the face.

Research Publications Downloads Instance Space Analysis
Geng, X., Zhou, Z.-H., and Smith-Miles, K. A., “Automatic Age Estimation Based on Facial Aging Patterns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 29, no. 12, pp. 2234-2240, 2007. Metadata
Datasets, Feature and Algorithm Code
Optimization Problems