1. Classification
Classification is the problem of
correctly identifying a class label to an observation based only
on a set of input variables. Machine learning algorithms learn
to complete this task based on a training set of data containing
observations with known class labels (outputs), which are
described by a set of (input) attributes. Our studies have
focused on building an instance space for benchmark
classification problems (from the UCI and OpenML repositories)
using a comprehensive set of features to characterise variations
in instances that may present challenges for a set of 10
classification algorithms. Our instance space shows that the
existing benchmarks are not as diverse as desired for insightful
analysis of algorithm strengths and weaknesses, and we have
proposed some new ideas for generating more diverse test
instances.
Research Publications |
Downloads |
Instance Space Analysis |
Muñoz,
M. A., Villanova, L., Baatar, D. and Smith-Miles, K. A.,
"Instance Spaces for Machine Learning Classification",
Machine Learning, vol. 107, no. 1, pp. 109-147, 2018. |
Metadata
Instances
Code
for Feature Selection
- Statistical and
Information Theory Features
- DCoL
|
|
2. Regression
Regression is a machine learning
approach based on supervised learning that aims to predict a
continuous-valued target dependent variable based on a set of
independent input variables or attributes. A variety of
statistical, mathematical and computer science methods are
available, each of which makes different assumptions about the
underlying relationship between the dependent and independent
variables. Our studies have focused on building an instance
space to show whether the existing benchmarks can adequately
explain variation in approaches, and converting problems from
other fields into regression problems to augment the diversity
of the instance space.
Research Publications |
Downloads |
Instance Space Analysis |
Muñoz, M. A., Yan, T., Leal, M. R., Smith-Miles, K., Lorena, A. C., Pappa, G. L. and Rodrigues, R. M., "An Instance Space Analysis of Regression Problems", ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 2, article 28, 2021.
Dos Santos Fernandes, L. H. , Lorena, A. C. and Smith-Miles, K. "Towards Understanding Clustering Problems and Algorithms: an Instance Space Analysis", Algorithms, vol. 13, no. 3, article 95, 2021.
|
Metadata
|
|
3. Anomaly Detection
Anomaly detection methods are
used to identify unusual patterns that do not conform to
expected behavior, called outliers. Our studies have focused on
characterising the benchmark instances using novel features, and
exploring the impact of normalisation schemes on the success of
various methods for outlier detection.
Research Publications |
Downloads |
Instance Space Analysis |
Kandanaarachchi, S., Muñoz, M. A., Hyndman, R. and Smith-Miles, K., "On normalization and algorithm selection for unsupervised outlier detection" , Data Mining and Knowledge Discovery, vol. 34, pp. 309-354, 2020.
|
Metadata (Used in the paper)
Metadata (Complete)
Code
for Feature Selection
|
|
4. Time Series Forecasting
A time series is a sequence of
discrete-time data. Time series forecasting builds a model to
predict future values based on previously observed values. Our
studies of time series forecasting have focused on developing
useful features to globally characterise time series, and then
using these features to construct an instance space of the
well-studied M3 competition time series. We have filled the
instance space with 10,000 new time series exhibiting a wide
range of characteristics, enabling the strengths and weaknesses
of forecasting methods to be better described.
Research Publications |
Downloads |
Instance Space Analysis |
Kang,
Y., Hyndman, R. and Smith-Miles, K., "Visualising
Forecasting Algorithm Performance using Time Series Instance
Spaces", International Journal of Forecasting, vol. 33, no.
2, pp. 345-358, 2017. Wang,
X., Smith, K. A., Hyndman, R., “Characteristic-based
Clustering for Time Series Data", Data Mining and Knowledge
Discovery, vol. 13, no. 3, pp. 335-364, 2006. |
Metadata
Instances (M3)
Metadata
(Evolved Time Series)
Instances
(Evolved Time Series)
Features
Extraction Code
|
|
5. Facial Age Estimation
Facial images contain much information about an
individual: their identity, gender, mood, and their age. Various
methods have been proposed for estimating the age of a person
based on their face, using databases with known age labels
including FG-NET, MORPH and MORPH2. Our early study in 2007
focused on developing new facial age estimation methods and
comparing to state-of-the-art approaches including tailored
methods and generic machine learning approaches. We are
currently revisiting this study in light of instance space
analysis to understand how the performance of algorithms depends
on the characteristics of the face.
Research Publications |
Downloads |
Instance Space Analysis |
Geng,
X., Zhou, Z.-H., and Smith-Miles, K. A., “Automatic Age
Estimation Based on Facial Aging Patterns”, IEEE
Transactions on Pattern Analysis and Machine Intelligence,
vol 29, no. 12, pp. 2234-2240, 2007.
Smith-Miles, K. A. and Geng, X., “Revisiting Facial
Age Estimation with New Insights from Instance Space
Analysis”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, doi:10.1007/s10618-019-00661-z
|
Metadata
Datasets, Feature and Algorithm
Code
|
|
6. Clustering
The definition of
a cluster in the literature is not unique, and each algorithm
may adopt a different clustering criterion. These criteria
create biases for different algorithms, affecting their
suitability for identifying clustering structures in datasets,
depending on the dataset properties. The challenge for Instance
Space Analysis (ISA) is to explain how an algorithm's clustering
criterion affects performance on a variety of datasets with
various cluster structures. Here we have developed a set of 20
meta-features aiming to reveal different types of structures
within a dataset. Since there are multiple cluster definitions,
we have selected 10 popular partitional and hierarchical
clustering algorithms employing different clustering criterion.
In order to evaluate algorithm performance, noting the absence
of ground truth for clustering results, we have combined 12
validation indexes into a ranking to score each algorithm's
success. In this work, two ISA experiments were carried out. In
the first, 380 artificial datasets were tested while in the
second, 219 real datasets were added to the meta-data.
Research Publications |
Downloads |
Instance Space Analysis |
L. H. d. S.
Fernandes, A. C. Lorena, K. Smith-Miles, "Towards
Understanding Clustering Problems and Algorithms: an
Instance Space Analysis", Special Issue on Benchmarking,
Selecting and Configuring Learning and Optimization,
Algorithms, vol. 14, no. 3, article 95, 2021.
|
Metadata
Metadata
(Artificial Instances)
Features and
Algorithms Code
|
|