Learning and Model Fitting

Optimization Problems Search-based Software Testing

1. Classification

Anomaly Detection Classification is the problem of correctly identifying a class label to an observation based only on a set of input variables. Machine learning algorithms learn to complete this task based on a training set of data containing observations with known class labels (outputs), which are described by a set of (input) attributes. Our studies have focused on building an instance space for benchmark classification problems (from the UCI and OpenML repositories) using a comprehensive set of features to characterise variations in instances that may present challenges for a set of 10 classification algorithms. Our instance space shows that the existing benchmarks are not as diverse as desired for insightful analysis of algorithm strengths and weaknesses, and we have proposed some new ideas for generating more diverse test instances.

Research Publications Downloads Instance Space Analysis
Muñoz, M. A., Villanova, L., Baatar, D. and Smith-Miles, K. A., "Instance Spaces for Machine Learning Classification", Machine Learning, vol. 107, no. 1, pp. 109-147, 2018. Metadata
Code for Feature Selection
  1. Statistical and Information Theory Features
  2. DCoL

2. Regression

Anomaly Detection Regression is a machine learning approach based on supervised learning that aims to predict a continuous-valued target dependent variable based on a set of independent input variables or attributes. A variety of statistical, mathematical and computer science methods are available, each of which makes different assumptions about the underlying relationship between the dependent and independent variables. Our studies have focused on building an instance space to show whether the existing benchmarks can adequately explain variation in approaches, and converting problems from other fields into regression problems to augment the diversity of the instance space.

Research Publications Downloads Instance Space Analysis
Muñoz, M. A.,Yan, T.,Leal, M. R., Smith-Miles, K. A., Lorena, A. C., Pappa, G. L., Rodrigues, R. M., "An Instance Space Analysis of Regression Problems", ACM Transactions on Knowledge Discovery from Data Metadata

3. Anomaly Detection

Anomaly Detection Anomaly detection methods are used to identify unusual patterns that do not conform to expected behavior, called outliers. Our studies have focused on characterising the benchmark instances using novel features, and exploring the impact of normalisation schemes on the success of various methods for outlier detection.

Research Publications Downloads Instance Space Analysis
Kandanaarachchi, S., Muñoz, M.A., Hyndman, R.J. et al. "On normalization and algorithm selection for unsupervised outlier detection" Data Min Knowl Disc (2019) doi:10.1007/s10618-019-00661-z Metadata (Used in the paper)
Metadata (Complete)
Code for Feature Selection

4. Time Series Forecasting

Anomaly Detection A time series is a sequence of discrete-time data. Time series forecasting builds a model to predict future values based on previously observed values. Our studies of time series forecasting have focused on developing useful features to globally characterise time series, and then using these features to construct an instance space of the well-studied M3 competition time series. We have filled the instance space with 10,000 new time series exhibiting a wide range of characteristics, enabling the strengths and weaknesses of forecasting methods to be better described.

Research Publications Downloads Instance Space Analysis
Kang, Y., Hyndman, R. and Smith-Miles, K., "Visualising Forecasting Algorithm Performance using Time Series Instance Spaces", International Journal of Forecasting, vol. 33, no. 2, pp. 345-358, 2017.

Wang, X., Smith, K. A., Hyndman, R., “Characteristic-based Clustering for Time Series Data", Data Mining and Knowledge Discovery, vol. 13, no. 3, pp. 335-364, 2006.
Instances (M3)

Metadata (Evolved Time Series)
Instances (Evolved Time Series)
Features Extraction Code

5. Facial Age Estimation

Facial images contain much information about an individual: their identity, gender, mood, and their age. Various methods have been proposed for estimating the age of a person based on their face, using databases with known age labels including FG-NET, MORPH and MORPH2. Our early study in 2007 focused on developing new facial age estimation methods and comparing to state-of-the-art approaches including tailored methods and generic machine learning approaches. We are currently revisiting this study in light of instance space analysis to understand how the performance of algorithms depends on the characteristics of the face.

Research Publications Downloads Instance Space Analysis
Geng, X., Zhou, Z.-H., and Smith-Miles, K. A., “Automatic Age Estimation Based on Facial Aging Patterns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 29, no. 12, pp. 2234-2240, 2007.

Smith-Miles, K. A. and Geng, X., “Revisiting Facial Age Estimation with New Insights from Instance Space Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi:10.1007/s10618-019-00661-z
Datasets, Feature and Algorithm Code

6. Clustering

clustering The definition of a cluster in the literature is not unique, and each algorithm may adopt a different clustering criterion. These criteria create biases for different algorithms, affecting their suitability for identifying clustering structures in datasets, depending on the dataset properties. The challenge for Instance Space Analysis (ISA) is to explain how an algorithm's clustering criterion affects performance on a variety of datasets with various cluster structures. Here we have developed a set of 20 meta-features aiming to reveal different types of structures within a dataset. Since there are multiple cluster definitions, we have selected 10 popular partitional and hierarchical clustering algorithms employing different clustering criterion. In order to evaluate algorithm performance, noting the absence of ground truth for clustering results, we have combined 12 validation indexes into a ranking to score each algorithm's success. In this work, two ISA experiments were carried out. In the first, 380 artificial datasets were tested while in the second, 219 real datasets were added to the meta-data.

Research Publications Downloads Instance Space Analysis
L. H. d. S. Fernandes, A. C. Lorena, K. Smith-Miles, "Towards Understanding Clustering Problems and Algorithms: an Instance Space Analysis", Special Issue on Benchmarking, Selecting and Configuring Learning and Optimization, Algorithms, vol. 14, no. 3, article 95, 2021.

Metadata (Artificial Instances)
Features and Algorithms Code
Optimization Problems Search-based Software Testing