Automated Semantic Analysis of Regulatory Texts


The extent and complexity of regulatory requirements pose significant challenges to banks already at their analysis. In addition, banks must – often ad-hoc – process inquiries of regulators and comment on consultation papers.
On the other hand, procedures from the areas NLP and Machine Learning enable the effective and efficient usage of available knowledge resources.

Our application Regulytics® enables the automated analysis of regulatory and internal texts in terms of content.
The app does not provide cumbersome general reports, but concise, tailored and immediate information about content-related similarities.
Thereby, regulatorily relevant texts can be classified into the overall context. Similar paragraphs in regulations and internal documents can be determined as well as differences in different document versions.
Financial service providers can request a free trial account for the online version of Regulytics.

Regulatory Challenges

Regulations like IFRS 9, BCBS 239, FTRB or IRRBB require fundamental changes of the banks’ methods, processes and/or systems.
Many regulations have far-reaching impacts on the risks, the equity capital and thereby the business model of the affected banks.
Also, the large amount of the final and consultative regulations renders a monitoring of the requirements and the effects difficult.

Regulations can generally affect different, interconnected areas of banks, like risk, treasury, finance oder IT.
In addition, there exist also connections between the regulations; gaps to to one requirement generally correspond to additional gaps to further requirements.
The diverse legislation in different jurisdictions increases the complexity once again.

Inside banks, several impact analyses and prestudies are conducted in order to classify the relevance of regulations.
Several consulting firms conduct prestudies as well as the actual implementation projects which are often characterized by large durations and high resource requirements .
Projects bind considerable internal resources and exacerbate bottle necks.
External support is expensive and increases the coordination efforts, especially in the case of several suppliers.
Errors in prestudies and project starting phases can be hardly corrected.
Due to the high complexity, there exists the risk that impacts and interdependencies are not recognized in time.

Available Knowledge Ressources

Original texts of the regulations and the consultation papers are normally freely available in the Internet and are – in the case of EU directives – present in several languages.
Regulators and committees provide further information, e.g. in form of circular letters and consultation papers.
Several institutes, portals, and consulting firms supply banks with partially free articles, white papers and news letters.

In addition, banks have collected extensive experiences due to already finalized or ongoing projects (project documentations, lessons learned).
Banks also have available documentations of the used methods, processes, and systems as well as the responsibilities and organizational circumstances.
Internal blogs, etc. focus the expertise of the employees.

Advantages of an Automated Analysis

Speed Increase

Automated analyses can be done per definition in a very fast and standardized way.
Even with usual laptops, semantic similarities of dozens regulations can be analyzed within minutes.
Thereby, responsibilities and impacts can be recognized – e.g. in the case of consultation papers – in time and included into statements.

Resource Conservation

Our solution runs without expensive hardware and software requirements.
The human effort for usage and eventual enhancements is extremely low and practically independent from the number of the considered regulations.
Bottlenecks are reduced and experts can focus on the demanding tasks.
Thus, project costs can be minimized.


The similarities between regulations on total and paragraph level are quantitatively available and at any time reproducible.
Discrepancies caused by subjective preferences can be practically ruled out.
Analyses can be documented in a comprehensible way.
Prestudy results and statements of external suppliers can be checked without bias.

Error Reduction

Automated analyses pose an efficient additional control.
Non-trivial – and potentially ignored – interdependencies between regulations can be identified and considered.
Especially clerical errors and the overlooking of potentially important paragraphs can be minimized.
Also, potentially ignored gaps and impacts can be detected.

Knowledge Usage via Topic Analysis

Methods and Tools

The methods of Natural Language Processing (NLP) enable a semantic analysis of texts on the basis of the topics contained therein for an identification of similarities at any required granularity.
In the here used method “Latent Semantic Analysis” (LSA or “Latent Sementic Indexing”, LSI), the considered terms are mapped onto a given number of topics; accordingly, texts are mapped onto a “semantic space”.
The topic determination is equivalent to an unsupervised learning process on the basis of the available documents.
New texts and text components can then be analyzed in terms of semantic similarities.
The analyzes require programs on the basis of appropriate languages, like e.g. Python or R.


At first, the levels are determined by which the texts are to be analyzed (sentences, paragraphs, etc.).
Via a training text, a mapping onto a given number of topics is determined (”model”).
The texts to be analyzed are also mapped with the model and then quantitatively analyzed in terms of similarities.
As shown in the right sketch, the process can be automated and efficiently applied on a large number of texts.

Identification of similar paragraphs

Approach for an Analysis of Regulation-Related Texts

The approach for the analysis at total and paragraph level is determined by the bank’s goals.  We support you in detail questions and in the development of specific solutions.

Analysis of regulation-related texts


In the following, three possible analyses of regulation texts are drafted which differ in their objective. The analyses can be easily conducted also with internal texts.

Use Case 1: Identification of Similarities

In the analysis, the regulation Basel II and the regulation Basel III: Finalising post-crisis reforms (often called “Basel IV”) were considered.
The general comparison already indicates a strong cosine similarity between the two texts (s. radar plot).
The matrix comparison over all paragraphs yields high similarities over wide areas (bright diagonal, s. matrix plot).
The analysis at paragraph level yields numerous nearly identical sections concerning credit risks (s. table).

Radar plot at total level
Similarity plot at paragraph level
Similar paragraphs

Use Case 2: Determination of Differences

A comparison between the German regulations MaRisk of the years 2017 and 2012 was conducted.
As already seen at the general level (s. radar plot) and in the matrix plot over all paragraphs (bright diagonal), the texts are nearly identical.
However, disruptions in the main diagonal (red arrow, matrix plot), indicate some changes.
A respective analysis over all paragraphs yields the section „AT 4.3.4“ (stemming from BCBS 239) as biggest novelty.

Radar diagram at total level
Similarity matrix at paragraph level
“Novel” paragraphs

Use Case 3: Finding Similar Paragraphs

The regulations Basel III: Finalising post-crisis reforms (“Basel IV”) and Basel III (BCBS 189) were considered.
Despite differences, an area of relatively high similarities can be recognized at paragraph level (red arrow, matrix plot).
For an analysis of this area, a respective paragraph from “Basel IV” was selected and the most similar paragraphs from Basel III to this paragraph were determined.
As shown in the table, the respective paragraphs from the texts refer to the Credit Value Adjustments (CVA).

Similarity matrix at paragraph level
Similar target paragraphs

Our Offer – On-Site Implementation

RiskDataScience enables banks to use and enhance the described procedures in an efficient and institute-specific way. According to the requirements, we propose the following three configuration levels.

Level 1: Methodology

  • Introduction into the Latent Semantic Indexing methodology with a focus on regulatory texts
  • Handover and installation of the existing Python solution for the automated loading and splitting of documents as well as the semantic analysis via LSI – or, depending on customer requirements, support of the on-site implementation
  • Handover and documentation of the visualization and analysis methods

Bank has available enhanceable processes for the analysis of regulatory requirements.

Level 2: Customization

  • Step 1 and additional
  • Adaptations of analysis entities (e.g. document groups) according to the analysis goals of the bank
  • Analysis of the concrete regulations, projects, methods, processes, and systems for an identification of the optimal use possibilities
  • Development of processes for the achievement of the document goals
  • Documentation and communication of the results to all stakeholders

Bank has available customized processes for the analysis of regulatory requirements, e.g. in terms of responsibilities or methods.

Step 3: IT Solution

  • Step 1, Step 2 and additional
  • Specification of all requirements for a comprehensive IT solution
  • Proposal and contact of possible suppliers
  • Support in the supplier and tool selection
  • Support in the planning and implementation
  • Methodological and coordinative support during the implementation
  • Contact for methodological questions after the implementation

Bank has available an automated IT solution for an efficient semantic comparison of regulatorily relevant text components.

According to the customers’ requirements, a flexible arrangement is possible.

In addition, with our web app Regulytics® we offer a solution for an automated analysis of regulatory texts on total and paragraph level.


Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience GmbH
Nördliche Münchner Straße 47, 82031 Grünwald
Telefon: +4989322096365
Twitter: @riskdatascience