Ráiteas Enterprise Server

It’s been a busy few months for us and the first successful application of Ráiteas Enterprise Analytics has been launched as part of a Nordea Bank AB data warehouse project.

And we are now excited to make Ráiteas Enterprise Server generally available http://bit.ly/1jMtuJv.

Ráiteas Enterprise Server will now be available in several forms:

  • Enterprise Solution
    • Bespoke Enterprise Solution
    • Client Side Installation
    • Support and Training
  • As-a-service
    • Data Warehouse Health-Check/Audit
    • Data Warehouse Team Insight

Ráiteas is a platform for doing analytics on the operational aspects of managing a data warehouses.

One of our first applications is akin to google analytics but for data warehouse teams.

Enterprise companies often employ off-site resources when developing data warehouse applications, this however does complicate communication and often lead to longer development iterations. By having a visual representation of the data model you are developing based on actual facts, you save development time and improve the inter-team communication quality which can lead to substantial savings and enable you to react quicker.

  • Save time and money by shortening the length of each development iteration.
  • Reverse engineer data lineage and model which automatically becomes part of your documentation.
  • Improve communication between development team members by visualising and share complicated data.
  • Enable project managers, technical leads and architects to reason about technical elements central to developing data warehouses.

Updated features for data lineage

Improved support for MS SQL Server.

Following extensive testing with SQL Server trace files containing 10s of thousands of records, we have identified a number of performance improvements, which we will release gradually during the coming weeks.

We are very proud of our release of column level data flow analysis (data lineage). The UI is soon to be updated with better navigation and more control over which elements are shown.

CTO, Henrik Gudbrand

Taking our own medicine

Taking our own medicine

While debugging our application I was looking for a case to investigate – and where better to start than with our own application? It’s fairly complex; it uses an RDBMS as the central datastore, and we make use of a large number of different SQL features, from the basics like selects and inserts, to more ‘complex’ features like views and temporary tables. Furthermore, this will be executed across a number of different processes and threads.

An Overview Of Our Application

Today our application will probably be categorised as a fairly ‘traditional’ application. It consists of a JavaScript client using AngularJS served by Nodejs which is backed by a database as well as a backend in Python and C++. The client in turn queries our data model – indirectly using a REST API – the queries are a mix of simple and more complicated analytical queries, all depending on which page you are on and what functionality is available.

A key feature of Ráiteas is uploading log files and/or raw SQL files – really this is a requirement to get started – no data, no analysis. After uploading a log file, the backend starts analysing and loading the data into our data model and notifies the client that a new data stream is available. Obviously, uploading a log file can result in quite a lot of rows being added, and we have tried to balance performance with the extensibility of the data model to make all analytical questions possible while keeping the data model performing well – not an easy balance.

Start Simple and Make a Plan

Taking my own medicine, I turned to our guide and started following the advice given on starting simple. I will start by doing a brief overview of what the data model looks like; this in turn will give me an idea of where to go next. I will start compiling a list of potential improvements and investigate each in turn. I will also split the analysis between the frontend and backend, one looking at the uploading and processing, and the other looking at a typical workflow.

While Ráiteas supports quite a few dialects (MSSQL, MySQL, Teradata, a.o.) we are using PostgreSQL,  so I turned to the guide page on how to get a log file from PostgreSQL on OS X  as this is my primary development environment.

Looking at The Backend

After enabling logging and cleaning up the local log files – to give me a clear snapshot – I uploaded a file to get the backend started. After completed processing I copied and zipped the log file now ready for analysis.

A top tip is to give the zip file a meaningful name as this becomes the name of your dataset, it will make it easier later to remember what you were doing.

Visual Inspection

The first thing to do is getting an overview of the data model we are dealing with. There are several options available, one of which is the Top Down Overview workflow – this will give us an understanding of the joins and other relationships. A quick look confirms that everything is connected the way I expected it to be. No surprises, and the temporary tables show up as belonging to the UNKNOWN default schema – also as we expect.

Queries Report

What I’m really interested in is the Complex Queries report which gives you an overview of the statements in your dataset, their relative complexity and the frequency of execution.

Right away I can tell something is a little off as we have a query that executes repeatedly just using different parameters. I can see that this is an insert, and the complexity is very low – more or less just a straightforward insert. Of course this puts a lot of pressure on the database slowing the backend down significantly.

Visualising the data model of the query shows that this insert could be batched without any problems to the rest of the application. Spinning up an extra thread to deal with these inserts without delaying the main thread is easy – and thanks to C++11 actually very easy.

After some testing this batching of simple inserts into larger chunks more than doubled our backend’s performance on larger datasets.

Conclusion

Having the right tools makes all the difference when you are trying to understand a complex application. Perhaps the most remarkable discovery is that you wouldn’t have to understand our backend to undertake this investigation and come to the same conclusion in the first place. And you don’t need to be a SQL wizard – Ráiteas does that for you.

And to top it all off, I now have the start of an important piece of the documentation in place. Our data flows are clearly showing, the data model with all the foreign key relationships and important joins are shown. All interactive and ready for exploration, and to update it I just have to import another log file.

CTO, Henrik Gudbrand

Ráiteas Ltd.

…Now I think there is a report in the frontend that needs looking at – more on that later.