Nowcasting of Covid-19 Cases with Alternative Data?

At present, the novel coronavirus SARS-CoV-2 has a firm grip on large parts of the earth. The number of cases in individual countries is still rising or has stabilised after drastic measures in some cases.
However, these case numbers are distorted and often outdated by up to two weeks! The reasons for this are manifold and range from overloaded authorities, slow systems and tests to cyber attacks. In addition, many people probably only call the doctor when they have been ill for a while.

This raises the question to what extent it is possible to obtain the most up-to-date data possible and perhaps even “predict” the number of cases. “Alternative data” could provide one possibility for this; especially in the financial sector, this means data from outside the actual financial markets, such as entries in social media, search interest in Google or satellite images.

In the present case, the extent to which searches in Google can be helpful was investigated. Through Google Trends, Google offers the possibility of determining the search interest for possible terms and topics over time and for individual countries. This makes it easy to see how the interest in a topic is developing in a particular country.

First of all, the term “coronavirus” would of course be obvious here; however, it is distorted by the enormous amount of reporting and would probably provide little new information.
Instead, the term “fever” was analysed. Fever is a main symptom of Covid-19 and is by far not googled as often as “coronavirus”. One might even assume that many people who google “fever” actually have fever (or relatives of them).

We have investigated for four different countries – Italy, Spain, Germany and the USA – to what extent Google’s search interest in “fever” is a pre-indicator of reported corona case numbers.
To do this, we firstly obtained the search interests from Google Trends for the respective countries. We compared these with the case numbers provided by Johns Hopkins University; from these we determined the new cases per day and scaled them (division by the maximum and multiplication by 100).
The individual curves per country could then be displayed in diagrams. As it turned out, the Google Trends curves always had a lead time of at least a few days – a trend that was visible for many countries (see charts below). In addition, at this point in time the interest in “fever” seems to have declined again in many places. To draw hasty conclusions here would be negligent, of course!

Overall, procedures such as this could help to improve the information situation and put decisions on a broader basis.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Sound AI Startup Guide


Artificial intelligence (AI) methods and processes fundamentally transform the economy and society. The potential future business potential of AI startups is correspondingly high. It is to be expected that many of them will turn out to be “unicorns”.

Thus it is very important for venture capitalists, banks & insurances to deal timely with often capital-intensive AI startups.
However, the procedures in use are new, complex, and often intransparent. It is of correspondingly high importance to be able to judge their correctness.

We have long-year experience in the areas Deep Learning / Artificial Intelligence and have supported well-known banks and insurances in these topics.
On this basis, we have developed a guide enabling the efficient pre-audit of AI start-ups concerning the most relevant topics.
With that, its is possible to assess the solidity of AI start-ups in an effective questionnaire-based way and to identify potentially critical issues.
Risks can be mitigated and losses from “wrong“ engagements correspondingly be reduced.

Transformative Developments

The processes known as “artificial intelligence” (“AI”) are already in use in many ways and transform the economy and society in a fundamental way. Their fields of application are as diverse as they are promising for the future and include, among others

  • recommendation systems for customer-specific products
  • autonomous driving for cars and drones
  • automated recognition of clinical pictures
  • automated high quality translation services

Despite progress already made, the number and quality of new developments remains strong and the number of start-ups founded for this purpose continues to be high.

High Complexity

Though the components often available “out-of-the-box”, the complexity of AI procedures is generally very high and requires relevant expertise. Thus, e.g., it is imperative to take care

  • which AI model to use
  • how to select and prepare the data
  • how to ensure that potentially self-generated data leads to realistic results
  • how to ensure that the quality of the results is sufficient

This is made more difficult by the fact that many procedures are ultimately “black boxes” and are difficult to validate.

In addition, some special features must also be taken into account in the implementation and the chosen infrastructure (like the use of GPUs, Cloud solutions, etc).

Potential Issues

The necessity to enter into fast engagements in order not to get left behind and the at the same time high complexity of the issue means high risks for venture capitalists as well as for banks and insurance companies. These are reinforced by the facts that, e.g.

  • many startups take advantage of the hype and (let) call themselves “AI startups”, although this is not always the case. (According to a study 40% of all European “AI start-ups” have nothing to do with AI.)
  • many procedures are still very new and their effectiveness is doubtful
  • complex AI procedures are often used, although they are not necessary and/or the data situation is not sufficient; often “classical Machine Learning” is preferable to Deep Learning
  • many founders are very inexperienced and therefore initiate fragile processes
  • unauthorized data or at least data from sources that are not bound by contract to provide these data is used

For potential investors and contractual partners, there is thus a considerable risk of possible misinvestments and unrecognised financial losses as well as damage to their reputation.

Our Range of Services

We bring in our guide, which covers the following relevant topics

  • data for training, testing & production
  • methodology use and validation
  • processes in place
  • used systems and hardware
  • license situation

For each of these topics, we have identified relevant issues for which we provide a list of questions and risks as well as short descriptions.

In addition, we can introduce the guide in the scope of a workshop and support in overall questions concerning AI / Machine Learning / Deep Learning.

If desired, we also offer advice on special cases and support you in identifying possible risks and formulating and evaluating questions to be asked.

Your Benefits

  • insight into the critical points and the questions to be asked regarding AI start-ups
  • effective prior clarification of critical points
  • identification of start-ups where commitment is out of question
  • use of freed-up time resources to review more appropriate cases
  • make quicker decisions and stay connected to promising technologies
  • if required, in-depth analysis of specific start-up in question
  • reduction of risks of bad investments and damage to reputation
  • saving of money and resources

Depending on customer requirements, a flexible procedure is possible.

You are welcome to contact us, preferably via email.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

AI as a Module – A New Kind of Service

Motivation and Concept

In order to meet the customers’ need for effective solutions in the field of Artificial Intelligence, various approaches – and combinations of them – are possible

  • Consulting: the complete development takes place on site at the customer, who also receives all rights
  • Software Development (“Software”): the customer aquires licences of a product and operates it on his own infrastructure
  • Software as a Service (SaaS): the customer aquires licences of a product that is operated on the remote infrastructure of the SaaS provider

For the customers, each of these approaches has specific advantages and disadvantages.

Today’s technological possibilities make a further prototypical approach – AI as a Module (AIaaM) – possible

  • the client acquires a pre-trained model (such as neural network) for specific tasks (such as translation of texts), which he operates independently
  • the model is not a software in the narrow sense, but requires e.g. a Python environment. The procedures and data for the training remain with the supplier

As shown below, this approach combines several advantages of consulting, software development, and SaaS for customers.

Advantages and Disadvantages from the Customer Point of View

For the customers, each of the mentioned service types has specific pros and cons

  • Consulting
    • Pros
      • Very high flexibility: Consulting projects are extremely tailored to customer needs
      • Know-how build-up: Customers can acquire knowledge from the consultants and use it for further tasks
      • Contact persons: There are contacts available for every requirement at least throughout the project
    • Cons
      • Time comsumption: Consulting projects are often extremely time-demanding and can take up to years
      • Expensiveness: Correspondingly, the costs are also very high – and can even continue to rise
  • Software
    • Pros
      • Low price: In general, software is relatively cheap – in some cases even free
      • Standardization: Software for specific tasks is often standardized and allows the customer to use best practice approaches
    • Cons
      • “Black box“: Often, customers have no possibility to examine how the used procedures in the software really work. (This is not the case for Open Source software, of course.)
      • Lengthy approval processes: Companies in regulated industries – like banks – have often lengthy approval processes in place. The effort can even reach proportions of projects.
      • Security risks: Complex software may have unknown security vulnerabilities, and open e.g. the door for hackers
  • SaaS
    • Pros
      • Low price: In general, SaaS is relatively cheap
      • Resource saving: Customers need only very limited resources for operating the service
    • Cons
      • “Black box“: As with software, customers often have no possibility to examine how the used procedures of the service really work
      • Supplier dependency: Customers depend on the service of third parties and are directly affected, e.g. in the case of an insolvency
      • Hurdles due to outsourcing: SaaS is often seen as a form of outsourcing. Depending on industry and legislation, legal hurdles can arise
  • AIaaM
    • Pros
      • High flexibility: Pre-trained models can be used extremely flexible inside an organization’s processes. E.g., a translator module can easily be a connected upstream of a classification routine
      • Build-up of relevant know-how: Customers can learn how to apply the modules with own, customized procedures
      • Lower regulatory hurdles: Since AIaaM is no full software in the narrow sense, hurdles should be significantly lower
      • Transparent tools: Customers can integrate the modules into their own and maintain an overall transparency
      • Low price: The price of a module is generally even lower than that of a commercial software since there is no there is no superstructure.
      • Efficiency: Customers are not forced to purchase unnecessary features that are already covered by other tools
    • Cons
      • No access to training procedures: Customers just acquire the trained model and not the training data or the training procedures. This is not necessarily a disadvantage, however, if only limited resources are available and the core business is a different one
      • Medium implementation effort: The installation needs some minimum programming, e.g. in Python. The knowledge for that should be available in each medium or large organization, however

In summary, AaaM combines several advantages of consulting, software, and SaaS – and avoids most disadvantages.

We are happy to support our customers with related issues.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Association Rules Analyzer

Our free association rules analyzer can be accessed via this link

General information

The main task of the association analysis is the identification and validation of rules for the common occurrence of variables on the basis of past observation histories (“item lists”).
The variables can be of a variety of types, such as jointly purchased products in (online) commerce (“market basket analysis”). Accordingly, the determined rules can be used in a variety of ways, such as for buying recommendations of books or shelves compilations in supermarkets.

Association analysis has become established in recent years, especially in online and retail trade. In addition, however, it can also be applied to countless other areas ranging from the analysis of co-occurring characters in television series to the identification of cause-and-effect relationships of operational loss events.

The basis of popular association analysis methods are powerful algorithms for rule determination, such as the rule-finding “Apriori” algorithm.
In addition, some helpful metrics have been established to further investigate the rules found. The most common are:

  • Support. This is the frequency of common occurrence of variables. For example, 15% support for the combination (milk, eggs) means that milk and eggs were purchased in 15% of all observed purchases. The support does not depend on the order of the variables and can be between 0% and 100%.
  • Confidence. This means the security of a determined rule. An 80% confidence for the rule “Milk -> Eggs” would mean, for example, that in 80% of the cases where milk was bought, eggs were also purchased (for example, to bake cakes). Confidence is directional and can range from 0% to 100%.
  • Lift. This refers to the factor by which the common occurrence of variables is more frequent than would be expected if they were independent of each other. A lift of e.g. 3 for the combination (milk, eggs) would mean that this combination is three times more common than would be expected by chance. The lift can in principle assume any value greater than or equal to zero. A value greater than one implies that variables tend to coexist, a value less than one implies that they are more likely to be mutually exclusive.

Since the number of determined rules is often very large, it is usually essential to limit the rules in advance to a reasonable number. In particular, this can be done by setting lower limits for support and confidence and thus determining only the most important rules. The remaining rules can then be tabulated or graphically examined.

Our App


The goal of this free web app is to enable simple association analysis using uploadable item lists and adjustable measure thresholds.
From this, the corresponding rules are automatically determined and provided as a downloadable table and graphically.

We offer all of the methods available here also offline and in various extensions to B2B customers, for example for the analysis of operational loss events. We are happy to assist you with related questions.


The use of our association rules website is straight-forward.

After the reCAPTCHA test, an own CSV file (comma-separated UTF-8 file with no special characters, no header, and no index) can be uploaded and the minimum values for support (in %), confidence (in %) as well as lift boundaries for common and uncommon co-occurences (in absolute numbers) can be set. If no CSV file is uploaded, a sample file is used instead.

After clicking on “RUN” the calculation is started and the association rules are obtained and displayed.

Please note that the values of the support, confidence and lift boundaries have a significant effect on the number of the determined rules. Too low figures may result in a time-out due to a too high number of the obtained rules – too high figures may result in no rules at all. Therefore it is recommended to test different parameter combination for new datasets until the required results are obtained.

As an output of the calculation, the top rules are obtained, viz with the highest support and confidence (see below). The rule components “antecedents” and “consequents” are shown as separate columns, as well as the according support, confidence and lift.

In order to obtain more “interesting” rules, it is also possible to obtain exclusively the rules with a minimum lift (“Lift 1”). These rules are shown in a separate table (see below) and provide an indication for items that occur much more often commonly than it would be expected by chance.

The rules for the common co-occurences are also displayed graphically (see below) to enable a more efficient analysis.

Similarly, rules for rare common occurrence (i.e., elements that normally exclude each other) are also calculated and displayed. These rules are filtered by setting an upper limit for “Lift 2”.

The rules can also be downloaded as an XLS file.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Data Science-based identification of co-occurring operational damage events

Overview Challenge and Offer

Operational risk is as great a threat as it is hard to analyze for both financial services and industrial companies.
In spite of complex models in practice, connections between different OpRisk events can hardly be identified in practice, and underlying causes often remain unrecognized.
On the other hand, data science methods have been already established for similar questions and allow the analysis of large amounts of different data in order to identify interdependencies, e.g. in the buying behavior of customers in online trading.

RiskDataScience  has adapted existing data science methods to the requirements of operational risk management and has developed algorithms to identify interdependencies between operational losses.
Herewith, companies are able to identify causal relationships between damages and spend less time in the search for common causes. The entire accumulated knowledge can be used efficiently in order to prevent future damage as far as possible or to anticipate it at an early stage.

Operational Risks


Operational risks can be assigned to the following categories, depending on the cause

  • People: e.g. fraud, lack of knowledge, employee turnover
  • Processes: e.g. .g. transaction errors, project risks, reporting errors, valuation errors
  • Systems: e.g. programming errors, crashes
  • External events: e.g. lawsuits, theft, fire, flooding


Usually, operational risks are categorized according to extent of damage and probability. Accordingly, suitable management strategies are:

  • Avoidance: for big, unnecessary risks
  • Insurance: for big, necessary risks
  • Mitigation: esp. for smaller risks with a high probability of occurrence
  • Acceptance: for risks that are part of the business model

Methods and Problem

The handling of operational risks is strictly regulated, especially in the financial services sector. For example, under Basel II / III, banks must underpin operational risks with equity capital. There are compulsory calculation schemes such as the Standardized Approach (SA) based on flat-rate factors and the Advanced Measurement Approach (AMA). The latter is based on distribution assumptions and will in future be replaced by the SA.

In terms of methodology, the following distinction is made among others between the treatment of operational risks:

  • Questionnaires and self-assessment: probablities and extents are determined in a rather qualitative way
  • Actuarial procedures: these are based on distribution assumptions based on past damage
  • Key risk indicator procedures: easily observable measures are identified that serve for early warning
  • Causal networks: interdependencies are mapped using Bayesian statistics

Interdependencies between and causes of operational risk can either not be determined at all or only in a very complex and error-prone manner.

Detecting relationships using data science techniques

Association analysis

For the analysis of the connections of several different events (“items“) methods from the field of association analysis are recommended.
The respective “market basket analysis” methods have already been established for several years and are used in particular in online commerce (for example, book recommendations in online commerce), search engine proposals or in retail (products on shelves).
Using association analysis, the common occurrence of different events can be identified directly and without distributional assumptions.
The enormous number of possible conclusions can be efficiently and properly limited by means of specially developed measures such as support, confidence and lift.
The analyses require programs based on appropriate analysis tools, e.g. Python, R or RapidMiner.

In addition, we offer a free web app for simple association analysis based on CSV files.

Analysis preparation

First, the damage data must be brought into a usable format for the analysis.
Depending on the type of damage, temporal aggregations (for example on a daily, weekly basis) must also be carried out.
Too often occurring or already explained types of damage have to be removed on the basis of expert assessments.

Analysis conduction

Before the start of the analysis, the criteria for the relevant inference rules should be set according to support and confidence. The determination of the criteria can be supported by graphics.
Subsequently, the conclusions of experts must be made plausible.
The steps should be repeated for all relevant time aggregations.

Use Case: analysis of a fictitious damage database

As an application example, a fictitious loss database of a bank was constructed for an entire year.
There were a total of 23 possible types of damage, including e.g. a flu epidemic, late reports, wrong valuations, and complaints about wrong advice. The following assumptions underlie the test example:

  • Bad transactions are very common
  • Deficiencies in the outsourcer hotline become apparent through requests for PC head crashes
  • Reporting staff usually drive by car and are affected by a snowstorm
  • After a valuation system crashes, wrong valuations occur
  • Thefts occur during work after fire in the meeting room
  • Staff shortages at suppliers lead to failed projects
  • Massive customer complaints after experienced employees leave customer service

Because the wrong transactions were very frequent and incoherent, they were removed first:

Damage frequency

First of all, all determined rules were graphically displayed to find the relevant support and confidence measurements.

Display of the rules on a daily basis

The restriction of the confidence to a minimum of 0.6 gives the list shown below.

Indentified interdependencies on a daily basis

Of the found coincidences, the green ones turn out to be valid after plausibility check.

On a weekly and monthly basis, the procedure was analogous:

Display of the rules on a weekly basis


Identified interdependencies on a weekly basis


Possible interdependencies on a monthly basis

After a plausibility check of possible causal relationships, all assumptions used in the preparation could be identified in the data.

Offer levels for using association analysis in OpRisk

RiskDataScience enables customers to use and develop the described processes efficiently and company-specifically. According to the respective requirements, the following three expansion stages are proposed.

Stage 1: Methodology

  • Introduction to the methodology of association analysis
  • Handover and installation of existing solutions based on Python, R and RapidMiner – or, depending on customer requirements, support of the on-site implementation
  • Transfer and documentation of the visualization and evaluation techniques

Customer is able to independently use and develop methodology.

Stage 2: Customizing

  • Stage 1 and additionally
  • Adaptation and possibly creation of criteria for rule selection according to circumstances of the respective customer
  • Analysis of specific risks, processes and systems to identify optimal applications
  • Development of a process description for an efficient use
  • Communication and documentation of results to all stakeholders

Customer has custom procedures and operational risk analysis processes.

Stage 3: IT Solution

  • Stage 1, Stage 2, and additionally
  • Specification of all requirements for an automated IT solution
  • Suggestion and contacting of potential providers
  • Support in provider and tool selection
  • Assistance in planning the implementation
  • Professional and coordinative support of the implementation project
  • Technical support after implementation of the IT solution

Customer has automated IT solution for efficient association analysis of operational risks.

Depending on customer requirements, a flexible design is possible. We are happy to explain our approach as part of a preliminary workshop.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Phone: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Machine Learning-Based Credit Rating Early Warning

Overview Challenge and Offer

As an important type of risk, credit risks are quantified using sophisticated rating procedures. Due to the time-consuming preparation and lack of up-to-date balance sheet data, ratings are only delayed. Banks have therefore already introduced market data-based early-warning systems for current credit risk signals, but these can not provide any indications in the event of missing market data.
On the other hand, corporate news and press articles often provide important information about problems and imbalances .
RiskDataScience has developed algorithms for the automatic detection and classification of news texts with regard to bankruptcy relevance (News-Based Early Warning).
This allows banks to extract valuable additional information about imminent insolvencies from news sources. An early recognition of credit risks is thus also possible for non-listed companies without direct market data.

Credit Risk Measurement


Credit risk is the risk of credit events such as default, late payment, credit downgrade or currency freeze.
Another distinction relates to the classification into issuer (for bonds), counterparty (for derivative transactions) and the – in the following considered – credit default risk of borrowers.
Credit risks are often the biggest bank risk and, in addition to market and operational risks, must be backed by equity under Basel II / III.

A frequently used indicator for quantifying credit risks is the expected loss of a loan. This results in the simplest case as a product

  • PD: Probability of Default
  • LGD: Loss Given Default
  • EaD: Exposure at Default

External and internal credit ratings mainly measure the PD (and LGD, for example) and are determined using complex procedures.

Determination and Early Detection

The methods for determining PD require well-founded statistical analyzes based on

  • quantitative balance sheet ratios such as debt ratio, equity ratio and EBIT
  • qualitative analyst key figures such as quality of management, future prospects and market position
  • general market data such as interest rates, inflation and exchange rates.

The rating models must be regularly validated against actual credit events and adjusted if necessary.
Credit ratings are therefore usually delayed – often only annually.
To address this issue, market-data-based early-warning systems have been introduced that provide signals based on significant changes in stock prices, credit spreads or other market-related correlated data. In general, however, only systematic or risks of listed companies can be identified.

Information from News Texts


The reasons for bankruptcies are often company-specific (idiosyncratic) and can not be derived from general market developments. examples for this are

  • Fraud cases by management
  • Bankruptcy of an important customer or supplier
  • Appearance of a new competitor

Negative events such as plant closures, short-time work, investigations and indictments are sometimes several months ahead of the actual bankruptcy.

In the case of non-listed companies, however, no market-data-based early warning is possible. On the other hand, news also provides up-to-date and often insolvency-relevant information in these cases.
News articles, blogs, social media and in particular local newspapers inform online and offline about problems of companies.
The efficient use of online texts makes it possible to extend the early warning to non-listed companies.

Efficient News Analysis

Methods for the efficient analysis of texts are a prerequisite for identifying the relevant news and, based on this, anticipating possible bankruptcies. For this are necessary

  • a timely identification of hundreds of data sources (newspapers, RSS feeds, etc.) taking into account the legal aspects
  • an automatic reading of the relevant messages about all customers based on given mandatory and exclusion criteria
  • a timely classification of the relevant texts on the basis of possible insolvency risks
  • an immediate analysis and visualization of the risk identification results

Already implemented machine learning algorithms serve as a basis for this seemingly impossible task.

Knowledge use through machine learning procedures

Automated Reading

As a first step, all relevant news sources (e.g., newspaper articles from specialized providers) must be identified on the basis of a sufficiently large sample of companies to be examined and irrelevant sources must be excluded wherever possible.

The messages are to be filtered according to relevance. In order to avoid confusion due to the name or erroneous parts of the text (for example regarding equities), word filters and possibly complex text analyzes are necessary.


For the classification of the extracted message texts different text mining methods from the field of data science / machine learning are considered. Supervised learning is done as follows

  • first, the words that are irrelevant for the classification are determined manually (“stop words”)
  • the algorithms are then “trained” with known data records to associate texts with categories
  • new texts can then be assigned to known categories with specific confidences

Methodically, the following steps are to be carried out

  • from the filtered texts, significant word stems / word stem combinations (“n-grams“) are determined
  • the texts are mapped as points in a high-dimensional space (with the n-grams as dimensions)
  • machine learning procedures identify laws for separating points into categories. For this purpose, dedicated algorithms such as naive Bayes, W-Logistic or Support Vector Machine are available

The analyzes require programs based on appropriate analysis tools, e.g. R or Python

Sample Case

For about 50 insolvent companies and 50 non-insolvent reference companies, (German) message snippets were collected for a multi-month time horizon (3M-3W) before the respective bankruptcy.
The illustrated tag clouds provide an exemplary overview of the content of the texts.
With a RapidMiner prototype, the message texts were classified for possible bankruptcies and the results were examined with in and out-of-sample tests.

Tagcloud news for companies gone bankrupt
Tagcloud news for companies not gone bankrupt

Already on the basis of the tagclouds a clear difference between the news about insolvent and not bankrupt companies can be seen.

The RapidMiner solution was trained with a training sample (70% of the texts) and applied to a test sample (30% of the texts).
Both for the training sample (in-sample) and for the test sample resulted in accuracy rates (accuracy) of about 80%. The Area Under the Curve (AUC) was also 90% in the in-sample case.
Based on the RapidMiner licenses and the actual insolvencies, a PD calibration could also be performed.

Even with the relatively small training sample, a significant early detection of insolvencies could be achieved. Further improvements are to be expected with an extension of the training data.

Cost-Effective Implementation

Starting Position

Since there has not yet been a single market for Internet news deliveries, prices are often inconsistent. Different requirements for the cleaning routines and different technical approaches lead to large price ranges.
On the other hand, high-quality analysis tools such as R or RapidMiner (Version 5.3) are currently available. even available for free.
In addition, about half of all online newspapers offer their headlines in the form of standardized RSS feeds.

Cost Drivers

The implementation and ongoing costs of message-based early warning systems may be limited in part to the following reasons, in particular: increase significantly:

  • An evaluation of news texts requires royalties to collecting societies (e.g. VG Wort in Germany) or a direct purchase
  • A automatied reading is technically complicated
  • Maintaining advanced NLP (Natural Language Processing) algorithms to identify relevant text is costly

It is therefore necessary to examine to what extent the points mentioned are actually necessary, at least for a basic implementation.

Cost-Efficient Basic Solution

The already developed cost-efficient RiskDataScience basis solution is based on the following assumptions.

  • information contained in headings and short snippets is sufficient for bankruptcy warnings
  • there are enough free RSS feeds that provide a sufficiently good overview of the situation (medium-sized) companies
  • the relevance of the news snippets can be determined by simple text searches

Hundreds of news sources can be searched and bankruptcy signals can be identified to potentially thousands of companies within minutes.

Copyright Issues

When implementing message-based early-warning systems, it is imperative to comply with the legal requirements that arise, in particular, from copyright law (e.g. UrhG in Germany).

This places narrow limits on the duplication and processing of news texts.
In particular, in the case of databases and further publications problems may occur in some jurisdictions.

On the other hand, there are many exceptions, especially with regard to temporary acts of reproduction, newspaper articles and radio commentary.

Although the processing of message snippets should be generally safe, due to the high complexity of the relevant laws legal advice is recommended.

Offer levels for using machine learning techniques for credit risk detection

RiskDataScience enables banks to use and develop the described procedures efficiently and institution-specifically. According to the respective requirements, the following three expansion stages are proposed.

Stage 1: Methodology

  • briefing in text classification methodology
  • transfer and installation of the existing solution for tag cloud generation
  • handover and installation of the existing solution – or, depending on customer requirements, support of the on-site implementation
  • transfer and documentation of the visualization and evaluation techniques
    Bank is able to independently use and develop methodology

Stage 2: Customizing

  • stage 1 and additionally
  • adjustment and possibly creation of reference groups according to portfolios of the respective bank
  • performing analyzes and method optimization based on the portfolios and customer history of the bank
  • adaptation of RSS sources
  • development of a process description for an efficient use
  • communication and documentation of results to all stakeholders
    Customer has customized procedures and processes for analyzing message texts

Stage 3: IT Solution

  • stage 1, stage 2 and additionally
  • specification of all requirements for an automated, possibly web-based IT solutions
  • suggestion and contacting potential providers
  • support in provider and tool selection
  • assistance in planning the implementation
  • professional and coordinative support of the implementation project
  • technical support after implementation of the IT solution
    Bank has an automated IT solution for message-based early detection of insolvency signals.

Depending on customer requirements, a flexible design is possible. We are happy to explain our approach as part of a preliminary workshop.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Phone: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Forex Risk Calculator

Our free forex risk calculator can be accessed via this link

General information

Especially in today’s internationally connected business world, it is often unavoidable even for smaller companies to engage in a wide variety of transactions in foreign currencies. Many orders are, for example, only possible or significantly cheaper in other currencies.

However, for each position entered into in a foreign currency – for assets as well as for liabilities – there are considerable risks from fluctuations in exchange rates which are characteristic for the foreign exchange or forex market.

Several procedures from Financial Risk Management have proven their worth in quantifying these so-called foreign exchange rate risks (FX risks) and have been established with financial service providers and larger companies for years.

Two key indicators are sensitivity and Value at Risk (VaR).

The sensitivity indicates how much the value of a foreign currency account changes if the exchange rate of the corresponding foreign currency increases by (here) 1%.

The VaR, on the other hand, indicates the extent of the loss that will not be exceeded at a certain confidence level (e.g. 95%) and a certain time horizon (“holding period”; e.g. 10 days). A key challenge in determining VaR is also to take account of correlations between exchange rates; in the case of uncorrelated exchange rates, this tends to lead to diversification and thus to a reduction in the overall risk compared with pure addition. Two following common methods are used here for determining VaR:

  • Delta Normal Approach: Variances and correlations are determined on the basis of the history and calculated using the normal distribution assumption of VaR. This approach is easy to implement, but underestimates unlikely events.
  • Historical simulations: The historically observed changes are used as simulation scenarios. This method implicitly takes correlations and possible shocks into account, but its quality depends strongly on the underlying history.

For our app we use the histories of the last 1000 days for both methods.

Our App

Our FX Risk calculator enables the determination of exchange rate risks for portfolios of up to 19 foreign currency positions from the perspective of 5 local currencies — thus considering cross-currency dependencies and correlations for long as well as short positions.

With or app, even smaller companies with no sophisticated financial risk procedures can obtain an indication about possible FX risks of planned or actual deals or transactions.

A batch run determines the currency pair exchange rates of the latest day for which data were obtained (value date); please note that all the calculations refer to this day. After selecting the local currency and entering the foreign currency positions (each in foreign currency units), the holding period (in days), and the confidence level (in %) for the value at risk, the calculation can be started.

Forex risk calculation results: PV, sensitivity, VaR, simulation scenarios
Exemplary calculation results

The results are as shown above and include

  • the total cash value (money amount) in local currency
  • the delta normal VaR and the historical VaR

In addition,

  • the present values of the foreign currency position in local currency
  • the sensitivities of the foreign currency position (1% increase in foreign currency) in local currency
  • and the total NPV scenarios from the historical simulation

are graphically displayed.

We support our customers with these as well as further financial risk methods e.g. for calculating interest rate or credit risks and with questions regarding more current data sources.



Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Regulytics®-Demo Bauordnungen


Die Applikation Regulytics® von RiskDataScience ermöglicht die automatisierte Analyse juristischer und interner Texte nach inhaltlichen Gesichtspunkten.
Bei der auf der Website “Bauordnungen” eingerichteten App handelt es sich um eine frei bedienbare und im Umfang eingeschränkte Version von Regulytics. Der Fokus liegt hierbei exemplarisch auf den Bauordnungen der deutschen Bundesländer.
Diese können damit objektiv auf Ähnlichkeiten untersucht werden. Ähnliche Paragraphen in den verschiedenen Texten können ebenso ermittelt werden wie Unterschiede.

Nähere Erläuterungen zu Hintergrund und Motivation semantischer Analysen finden Sie hier.

B2B-Kunden können einen kostenlosen Test-Account für die erweiterte Online-Version von Regulytics beantragen, dessen Schwerpunkt auf für den Finanzdienstleistungssektor relevanten Regelungen liegt.
Die Offline-Version von Regulytics bietet darüber hinaus die Möglichkeit beliebige weitere Texte mit zu berücksichtigen.

Desweiteren unterstützen wir unsere Kunden gerne im Rahmen von Beratungsprojekten bei der Einführung semantischer Analysen komplexer wie auch umfangreicher Texte.


Die Lösung ist einfach und unkompliziert zu bedienen. Sprachen sowie Start- und Zieltext können über Dropdown-Menüs ausgewählt werden.
Eine weitere Eingabe ist die Anzahl der Themen für das natürliche Sprachverarbeitungsmodell (wir empfehlen Werte zwischen 100 und 500).
Basierend auf diesen Eingaben werden – nach einigen Sekunden Rechenzeit – die allgemeinen Ähnlichkeiten berechnet und grafisch als Radar-Diagramm angezeigt (siehe Diagramm unten); das Maß hierfür ist dabei die Kosinus-Ähnlichkeit.

Zu Bauordnung Bayern ähnlichste Bauordnungen

Nach Auswahl eines “Startparagraphen” im Starttext wird die vollständige Ähnlichkeitsmatrix für alle Absätze berechnet. Helle Bereiche entsprechen hierbei ähnlichen Paragraphen zwischen Start- (x-Koordinate) und Zieltext (y-Koordinate; siehe Bild unten).

Paragraphen-Vergleich der Bauordnungen Bayerns (x-Achse) und Baden-Württembergs (y-Achse)

Außerdem werden Auszüge der Top 10 der ähnlichsten Absätze (des Zieltextes) zum Startabsatz angezeigt. Es werden auch Auszüge der Paragraphen des Starttextes mit den höchsten und niedrigsten Ähnlichkeiten mit dem Zieltext gezeigt. So können Überschneidungen sowie Unterschiede zwischen Vorschriften leicht identifiziert werden.

Wichtiger Hinweis: Bei der Bestimmung von ähnlichen und unterschiedlichen Absätzen sollte das quantitative Kosinus-Ähnlichkeitsmaß immer im Auge behalten werden.
Wenn z.B. der ähnlichste Paragraph zu einem bestimmten Paragraphen eine Ähnlichkeit von weniger als 0,5 aufweist, kann in den meisten Fällen angenommen werden, dass es keine tatsächliche Ähnlichkeit gibt, d.h. kein ähnlicher Paragraph existiert.
Gleiches gilt für die Bestimmung von Unterschieden.
Eine Bestimmung von unteren (für Ähnlichkeiten) und oberen Kosinus-Ähnlichkeitsgrenzen (für Unterschiede) ist möglich und sinnvoll, hängt aber von der jeweiligen Fragestellung ab.

Weitere, auf diesem Verfahren aufbauende Analysen sind problemlos möglich.

Wir unterstützen unsere Kunden bei diesbezüglichen Fragestellungen. Gerne erläutern wir unseren Ansatz auch im Rahmen eines Vorab-Workshops.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Regulytics® ist eine eingetragene Marke beim Deutschen Patent- und Markenamt.


Ein automatisierter Abgleich der Polizeigesetze verschiedener Bundesländer


Das Polizeirecht wird in Deutschland je nach Bundesland unterschiedlich geregelt. Dennoch sind aufgrund des analogen Aufgabenspektrums große Ähnlichkeiten zu erwarten.

Ziel der vorliegenden Analyse war es – unter Nutzung der Möglichkeiten der Digitalisierung – mittels der Verfahren des Machine Learning und des Natural Language Processing (NLP) semantische Ähnlichkeiten und Unterschiede zwischen den verschiedenen Gesetzen (Stand 11.05.2018) zu identifizieren.
Ausgangspunkt hierfür war das bayerische Polizeiaufgabengesetz (BayPAG), das mit entsprechenden Gesetzen der übrigen Bundesländer automatisiert abgeglichen wurde.
Neben allgemeinen Ähnlichkeiten zu den übrigen Gesetzen erfolgte darüber hinaus ein Paragraphen-spezifischer Abgleich mit den Polizeigesetzen von Thüringen (PAG), Baden-Württemberg (PolG) und Hamburg (SOG).

Methoden und Tools

Die Verfahren des NLP im hier verwendeten Sinne ermöglichen semantische Analysen von Texten anhand darin vorkommender Themen (Topics) zur Identifizierung von Ähnlichkeiten bei beliebiger Granularität.
Bei der verwendeten Methode Latent Semantic Analysis erfolgt eine Reduktion der betrachteten Begriffe auf eine vorgegebene Anzahl von Themen und hierdurch eine Abbildung von Texten auf einen „semantischen Raum“.
Neue Texte und Text-Komponenten können anschließend auf semantische Ähnlichkeiten hin untersucht werden.
Die Analysen erfordern Programme auf der Basis entsprechender Analysetools, wie z.B. Python oder R.


Zunächst werden die Einheiten bestimmt anhand derer die Texte zu untersuchen sind (Sätze, Paragraphen, usw.) .
Mittels eines „Trainings-Textes“ wird eine Abbildung auf eine vorgegebene Anzahl von Topics ermittelt („Modell“).
Die zu untersuchenden Texte werden ebenfalls mittels des Modells abgebildet und anschließend quantitativ auf Ähnlichkeiten hin untersucht.
Das Verfahren lässt sich automatisieren und auf eine große Anzahl von Texten anwenden.

Vorteile einer automatisierten Untersuchung

Automatische Analysen können per definitionem sehr schnell und standardisiert durchgeführt werden.
Selbst mit herkömmlichen Laptops können binnen Minuten semantische Ähnlichkeiten in Dutzenden komplexer Gesetze analysiert werden.

Der personelle Aufwand für die Nutzung und ggf. Weiterentwicklung ist äußerst gering und von der Anzahl der betrachteten Gesetze weitgehend unabhängig.

Die Ähnlichkeiten zwischen den Gesetzen auf Gesamt- und Paragraphen-Ebene liegen quantitativ vor und sind jederzeit reproduzierbar.
Unterschiede durch subjektive Präferenzen sind damit praktisch ausgeschlossen.
Analysen lassen sich nachvollziehbar dokumentieren.

Ergebnisse der Analyse

Die Analyse auf Gesamt-Ebene ergibt unterschiedlich große Ähnlichkeiten des bayerischen Polizeigesetzes zu dem anderer Bundesländer (s. Radar-Plot unten).
Die größte Ähnlichkeit besteht hierbei zum Polizeigesetz Thüringen, während sich das entsprechende Gesetz Hamburgs am stärksten von dem Bayerns unterscheidet.

Ähnlichkeit des bayerischen Polizeigesetzes zu dem anderer Bundesländer

Ein Abgleich auf Paragraphen-Ebene zwischen den Gesetzen Bayerns und Thüringens demonstriert die große Ähnlichkeit auf eindrucksvolle Art.
Die helle Diagonale der Ähnlichkeitsmatrix (Spalten: Polizeigesetz Bayern; Zeilen: Polizeigesetz Thüringen) weist auf starke Ähnlichkeiten für den größten Teil der Gesetze sowie auf eine fast identische Gesamtstruktur hin.
So sind BayPAG, Art. 66 (Allgemeine Vorschriften für den Schußwaffengebrauch) und § 64, PAG (Allgemeine Bestimmungen für den Schußwaffengebrauch) fast identisch.
Eine relative Ausnahme stellt hingegen BayPAG, Art. 73 (Rechtsweg) dar, zu dem es im POG keinen unmittelbar semantisch auffindbaren Paragraphen gibt.
Download: Liste ähnlicher Paragraphen BayPAG – POG

Ähnlichkeitsmatrix der Polizeigesetze Bayerns und Thüringens

Stärkere Unterschiede weist, wie erwartet, die Ähnlichkeitsmatrix zwischen dem BayPAG und dem baden-württembergischen PolG auf. Hier ist die Hauptdiagonale zwar noch erkennbar, aber unterbrochen und teils versetzt, was auf einen unterschiedlichen Gesamt-Aufbau schließen lässt.
Als ähnlichste Pragraphen hier wurden BayPAG, Art. 41 (Datenübermittlung an Personen oder Stellen außerhalb des öffentlichen Bereichs) sowie PolG, § 44 (Datenübermittlung an Personen oder Stellen außerhalb des öffentlichen Bereichs) identifiziert.
Kein Gegenstück wurde u.a. für BayPAG Art. 57 (Ersatzzwangshaft) ermittelt.
Download: Liste ähnlicher Paragraphen BayPAG – PolG

Ähnlichkeitsmatrix der Polizeigesetze Bayerns und Baden-Württembergs

Die großen Unterschiede zwischen dem BayPAG und dem hier untersuchten Polizeigesetz Hamburgs (SOG) sind auch in der Ähnlichkeitsmatrix gut erkennbar. Die Hauptdiagonale ist nur bruchstückhaft erhalten, mit großen Bereichen ohne nennenswerte Übereinstimmung.
Ähnliche Paragraphen hier sind insb. BayPAG, Art. 24 (Verfahren bei der Durchsuchung von Wohnungen) sowie SOG, § 16 a (Verfahren beim Durchsuchen von Wohnungen) und BayPAG, Art. 64 (Androhung unmittelbaren Zwangs) sowie SOG, § 22 (Androhung unmittelbaren Zwanges).
Download: Liste ähnlicher Paragraphen BayPAG – SOG

Ähnlichkeitsmatrix der Polizeigesetze Bayerns und Hamburgs

Abschließend lässt sich festhalten, dass mit den uns verfügbaren Verfahren des Machine Learning und des NLP auf einfache Weise Ähnlichkeiten zwischen den länderspezifischen Polizeigesetzen auf Gesamt- wie auf Paragraphen-Ebene identifiziert werden konnten.
Die Gesamtstruktur ausgewählter Gesetze konnte grafisch gegenübergestellt werden, ähnliche Paragraphen konnten ebenso effizient ermittelt werden wie Unterschiede.

Weitere, auf diesem Verfahren aufbauende Analysen sind problemlos möglich.

Wir unterstützen unsere Kunden bei diesbezüglichen Fragestellungen. Gerne erläutern wir unseren Ansatz auch im Rahmen eines Vorab-Workshops.

Zudem bieten wir mit Regulytics® eine Web-Applikation zur automatisierten Analyse von Regularien auf Gesamt- und Paragraphen-Ebene.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience

Regulytics® ist eine eingetragene Marke beim Deutschen Patent- und Markenamt.

Automated Semantic Analysis of Regulatory Texts


The extent and complexity of regulatory requirements pose significant challenges to banks already at their analysis. In addition, banks must – often ad-hoc – process inquiries of regulators and comment on consultation papers.
On the other hand, procedures from the areas NLP and Machine Learning enable the effective and efficient usage of available knowledge resources.

Our application Regulytics® enables the automated analysis of regulatory and internal texts in terms of content.
The app does not provide cumbersome general reports, but concise, tailored and immediate information about content-related similarities.
Thereby, regulatorily relevant texts can be classified into the overall context. Similar paragraphs in regulations and internal documents can be determined as well as differences in different document versions.
Financial service providers can request a free trial account for the online version of Regulytics.

Regulatory Challenges

Regulations like IFRS 9, BCBS 239, FTRB or IRRBB require fundamental changes of the banks’ methods, processes and/or systems.
Many regulations have far-reaching impacts on the risks, the equity capital and thereby the business model of the affected banks.
Also, the large amount of the final and consultative regulations renders a monitoring of the requirements and the effects difficult.

Regulations can generally affect different, interconnected areas of banks, like risk, treasury, finance oder IT.
In addition, there exist also connections between the regulations; gaps to to one requirement generally correspond to additional gaps to further requirements.
The diverse legislation in different jurisdictions increases the complexity once again.

Inside banks, several impact analyses and prestudies are conducted in order to classify the relevance of regulations.
Several consulting firms conduct prestudies as well as the actual implementation projects which are often characterized by large durations and high resource requirements .
Projects bind considerable internal resources and exacerbate bottle necks.
External support is expensive and increases the coordination efforts, especially in the case of several suppliers.
Errors in prestudies and project starting phases can be hardly corrected.
Due to the high complexity, there exists the risk that impacts and interdependencies are not recognized in time.

Available Knowledge Ressources

Original texts of the regulations and the consultation papers are normally freely available in the Internet and are – in the case of EU directives – present in several languages.
Regulators and committees provide further information, e.g. in form of circular letters and consultation papers.
Several institutes, portals, and consulting firms supply banks with partially free articles, white papers and news letters.

In addition, banks have collected extensive experiences due to already finalized or ongoing projects (project documentations, lessons learned).
Banks also have available documentations of the used methods, processes, and systems as well as the responsibilities and organizational circumstances.
Internal blogs, etc. focus the expertise of the employees.

Advantages of an Automated Analysis

Speed Increase

Automated analyses can be done per definition in a very fast and standardized way.
Even with usual laptops, semantic similarities of dozens regulations can be analyzed within minutes.
Thereby, responsibilities and impacts can be recognized – e.g. in the case of consultation papers – in time and included into statements.

Resource Conservation

Our solution runs without expensive hardware and software requirements.
The human effort for usage and eventual enhancements is extremely low and practically independent from the number of the considered regulations.
Bottlenecks are reduced and experts can focus on the demanding tasks.
Thus, project costs can be minimized.


The similarities between regulations on total and paragraph level are quantitatively available and at any time reproducible.
Discrepancies caused by subjective preferences can be practically ruled out.
Analyses can be documented in a comprehensible way.
Prestudy results and statements of external suppliers can be checked without bias.

Error Reduction

Automated analyses pose an efficient additional control.
Non-trivial – and potentially ignored – interdependencies between regulations can be identified and considered.
Especially clerical errors and the overlooking of potentially important paragraphs can be minimized.
Also, potentially ignored gaps and impacts can be detected.

Knowledge Usage via Topic Analysis

Methods and Tools

The methods of Natural Language Processing (NLP) enable a semantic analysis of texts on the basis of the topics contained therein for an identification of similarities at any required granularity.
In the here used method “Latent Semantic Analysis” (LSA or “Latent Sementic Indexing”, LSI), the considered terms are mapped onto a given number of topics; accordingly, texts are mapped onto a “semantic space”.
The topic determination is equivalent to an unsupervised learning process on the basis of the available documents.
New texts and text components can then be analyzed in terms of semantic similarities.
The analyzes require programs on the basis of appropriate languages, like e.g. Python or R.


At first, the levels are determined by which the texts are to be analyzed (sentences, paragraphs, etc.).
Via a training text, a mapping onto a given number of topics is determined (”model”).
The texts to be analyzed are also mapped with the model and then quantitatively analyzed in terms of similarities.
As shown in the right sketch, the process can be automated and efficiently applied on a large number of texts.

Identification of similar paragraphs

Approach for an Analysis of Regulation-Related Texts

The approach for the analysis at total and paragraph level is determined by the bank’s goals.  We support you in detail questions and in the development of specific solutions.

Analysis of regulation-related texts


In the following, three possible analyses of regulation texts are drafted which differ in their objective. The analyses can be easily conducted also with internal texts.

Use Case 1: Identification of Similarities

In the analysis, the regulation Basel II and the regulation Basel III: Finalising post-crisis reforms (often called “Basel IV”) were considered.
The general comparison already indicates a strong cosine similarity between the two texts (s. radar plot).
The matrix comparison over all paragraphs yields high similarities over wide areas (bright diagonal, s. matrix plot).
The analysis at paragraph level yields numerous nearly identical sections concerning credit risks (s. table).

Radar plot at total level
Similarity plot at paragraph level
Similar paragraphs

Use Case 2: Determination of Differences

A comparison between the German regulations MaRisk of the years 2017 and 2012 was conducted.
As already seen at the general level (s. radar plot) and in the matrix plot over all paragraphs (bright diagonal), the texts are nearly identical.
However, disruptions in the main diagonal (red arrow, matrix plot), indicate some changes.
A respective analysis over all paragraphs yields the section „AT 4.3.4“ (stemming from BCBS 239) as biggest novelty.

Radar diagram at total level
Similarity matrix at paragraph level
“Novel” paragraphs

Use Case 3: Finding Similar Paragraphs

The regulations Basel III: Finalising post-crisis reforms (“Basel IV”) and Basel III (BCBS 189) were considered.
Despite differences, an area of relatively high similarities can be recognized at paragraph level (red arrow, matrix plot).
For an analysis of this area, a respective paragraph from “Basel IV” was selected and the most similar paragraphs from Basel III to this paragraph were determined.
As shown in the table, the respective paragraphs from the texts refer to the Credit Value Adjustments (CVA).

Similarity matrix at paragraph level
Similar target paragraphs

Our Offer – On-Site Implementation

RiskDataScience enables banks to use and enhance the described procedures in an efficient and institute-specific way. According to the requirements, we propose the following three configuration levels.

Level 1: Methodology

  • Introduction into the Latent Semantic Indexing methodology with a focus on regulatory texts
  • Handover and installation of the existing Python solution for the automated loading and splitting of documents as well as the semantic analysis via LSI – or, depending on customer requirements, support of the on-site implementation
  • Handover and documentation of the visualization and analysis methods

Bank has available enhanceable processes for the analysis of regulatory requirements.

Level 2: Customization

  • Step 1 and additional
  • Adaptations of analysis entities (e.g. document groups) according to the analysis goals of the bank
  • Analysis of the concrete regulations, projects, methods, processes, and systems for an identification of the optimal use possibilities
  • Development of processes for the achievement of the document goals
  • Documentation and communication of the results to all stakeholders

Bank has available customized processes for the analysis of regulatory requirements, e.g. in terms of responsibilities or methods.

Step 3: IT Solution

  • Step 1, Step 2 and additional
  • Specification of all requirements for a comprehensive IT solution
  • Proposal and contact of possible suppliers
  • Support in the supplier and tool selection
  • Support in the planning and implementation
  • Methodological and coordinative support during the implementation
  • Contact for methodological questions after the implementation

Bank has available an automated IT solution for an efficient semantic comparison of regulatorily relevant text components.

According to the customers’ requirements, a flexible arrangement is possible.

In addition, with our web app Regulytics® we offer a solution for an automated analysis of regulatory texts on total and paragraph level.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience