Machine Learning-Based Classification of Market Phases


The experience of the recent years as well as research results and regulatory requirements suggest the consideration of market regimes. Nevertheless, the largest part of today’s financial risk management is still based on the assumption of constant market conditions.
Currently, neither “stressed” market phases nor potential bubbles are determined in an objective way.
Machine learning procedures, however, enable a grouping according to risk aspects and a classification of the current market situation.
RiskDataScience has already developed procedures to identify market phases.
Market regimes can be determined on the basis of flexible criteria for historical time series. The current market conditions can be assigned to the respective phases. Thus, it is possible to determine if the current situation corresponds to past stress or bubble phases. In addition, historic stress scenarios can be detected in a systematic way.

Market Phases

In contrast to the efficient market theory, markets are characterized by exaggerations and panic situations (new economy, real estate bubbles,…).
Crises exhibit their own rules – like increased correlations – and behave differently from “normal” phases. In the curse of the crises since 2007/2008, the situation has changed dramatically several times (negative interest rates, quantitative easing,…).

Regulators have realized that market situations can differ in a significant way and require the consideration of stressed market phases e.g. in the

  • determination of “stressed VaR” periods
  • definition of relevant stress scenarios

In the conventional market risk management of financial institutions, however, still only uniform market conditions are considered (e.g. in conventional Monte Carlo simulations).
Historic simulations implicitly consider market phases, but they don’t provide assertions which pase applies to specific situations.
Finally, models like GARCH or ARIMA could’t establish themselves outside academic research.

The neglection of market phases implies several problems and risks.
First, a non-objective determination of stressed market phases for regulatory issues can lead to remarks and findings by internal and external auditors. Thus, eventually sensible capital relief can be denied since a less conservative approach can’t be justified in an objective way.
Also, ignoring possibly dangerous current market situations increases the risk of losses by market price fluctuations. In addition, bubbles are not detected in a timely manner and the “rules” of crises (like increased correlations) are not considered in an appropriate way.
On the other hand, a too cautious approach may result in missed opportunities.

Machine Learning Approaches

For the analysis of the relevant market data, several data science / machine learning algorithms can be considered and implemented with tools like Python, R, Weka or RapidMiner. Here, the following groups of algorithms can be discerned:

  • Unsupervised learning algorithms: These algorithms can be used for the determination of “natural” clusters and the grouping of market data according to predefined similarity criteria. This requires appropriate algorithms like kmeans or DBSCAN as well as economic and financial domain expertise. Also, outlier algorithms can be used to detect anomalous market situations, e.g. as basis for stress test scenarios.
  • Supervised learning algorithms: The algorithms (e.g. Naive Bayes) are “trained” with known data sets to classify market situations. Then, new data – and especially the current situation – can be assigned to the market phases.

For a risk-oriented analysis, market data differences (e.g. in the case of interest rates) or returns (e.g. in the case of stock prices) must be calculated from the market data time series as a basis for the further analysis. Further, a “windowing” must be conducted, viz. the relevant values of the previous days must be considered as additional variables.

Use Case: Analysis of Illustrative Market Data

The analysis described below was based on a market data set consisting of the DAX 30 index, the EURIBOR 3M interest rate, and the EURUSD FX rate. The time period was end of 2000 till end of 2016. For the calculations, consistenly daily closing prices were used as basis for the return (DAX 30, EURUSD) and difference calculations (EURIBOR 3M). Eventual structural breaches were adjusted and missing return values were replaced by zeros. The windowing extended to the last 20 days.

Time series of analyzed market data

The data set was analyzed with the clustering algorithms kmeans and DBSCAN. As a result, most points in time could be assigned to a large “normal cluster”. The rest of the data points fell into a smaller “crisis” cluster.
Since – as it was observed – crisis phases often precede “real” crashes, the procedure could be helpful as “bubble detector”.

Identified market phases

The main identified outliers were the

  • spring of 2001: Burst of the dotcom bubble
  • autumn 2001: September 11
  • autumn 2008: Lehman insolvency
    The current time period is not classified as crisis, the extraordinary situation of negative interest rates counsels caution, however.

Based on a training set of 3,000 points of time, the classification algorithms were trained and applied on a test set of 1,000 points.
An appropriate simple algorithm was Naive Bayes; with this algorithm accuracies of over 90% were reached in in-sample as well as out-of-sample tests.

Hence, an efficiend distinguishing of market phases is already realized and a usage as bubble detector possible after economically and financially sound validations.


The methods can be enhanced to capture more complex cases and issues, e.g. for specialized markets like the electricity market as well as patterns and rules characteristic for the high-frequency trading (HFT).

We are developing respective methods and tools and support our customers in obtaining an overall perspective of the data in use.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience