Taking our own medicine
While debugging our application I was looking for a case to investigate – and where better to start than with our own application? It’s fairly complex; it uses an RDBMS as the central datastore, and we make use of a large number of different SQL features, from the basics like selects and inserts, to more ‘complex’ features like views and temporary tables. Furthermore, this will be executed across a number of different processes and threads.
An Overview Of Our Application
A key feature of Ráiteas is uploading log files and/or raw SQL files – really this is a requirement to get started – no data, no analysis. After uploading a log file, the backend starts analysing and loading the data into our data model and notifies the client that a new data stream is available. Obviously, uploading a log file can result in quite a lot of rows being added, and we have tried to balance performance with the extensibility of the data model to make all analytical questions possible while keeping the data model performing well – not an easy balance.
Start Simple and Make a Plan
Taking my own medicine, I turned to our guide and started following the advice given on starting simple. I will start by doing a brief overview of what the data model looks like; this in turn will give me an idea of where to go next. I will start compiling a list of potential improvements and investigate each in turn. I will also split the analysis between the frontend and backend, one looking at the uploading and processing, and the other looking at a typical workflow.
While Ráiteas supports quite a few dialects (MSSQL, MySQL, Teradata, a.o.) we are using PostgreSQL, so I turned to the guide page on how to get a log file from PostgreSQL on OS X as this is my primary development environment.
Looking at The Backend
After enabling logging and cleaning up the local log files – to give me a clear snapshot – I uploaded a file to get the backend started. After completed processing I copied and zipped the log file now ready for analysis.
A top tip is to give the zip file a meaningful name as this becomes the name of your dataset, it will make it easier later to remember what you were doing.
The first thing to do is getting an overview of the data model we are dealing with. There are several options available, one of which is the Top Down Overview workflow – this will give us an understanding of the joins and other relationships. A quick look confirms that everything is connected the way I expected it to be. No surprises, and the temporary tables show up as belonging to the UNKNOWN default schema – also as we expect.
What I’m really interested in is the Complex Queries report which gives you an overview of the statements in your dataset, their relative complexity and the frequency of execution.
Right away I can tell something is a little off as we have a query that executes repeatedly just using different parameters. I can see that this is an insert, and the complexity is very low – more or less just a straightforward insert. Of course this puts a lot of pressure on the database slowing the backend down significantly.
Visualising the data model of the query shows that this insert could be batched without any problems to the rest of the application. Spinning up an extra thread to deal with these inserts without delaying the main thread is easy – and thanks to C++11 actually very easy.
After some testing this batching of simple inserts into larger chunks more than doubled our backend’s performance on larger datasets.
Having the right tools makes all the difference when you are trying to understand a complex application. Perhaps the most remarkable discovery is that you wouldn’t have to understand our backend to undertake this investigation and come to the same conclusion in the first place. And you don’t need to be a SQL wizard – Ráiteas does that for you.
And to top it all off, I now have the start of an important piece of the documentation in place. Our data flows are clearly showing, the data model with all the foreign key relationships and important joins are shown. All interactive and ready for exploration, and to update it I just have to import another log file.
CTO, Henrik Gudbrand
…Now I think there is a report in the frontend that needs looking at – more on that later.