Wednesday 6 February 2019

Where should that code go (Back-end vs Front-end)


Introduction

 

In this post, we will be discussing the advantages and disadvantages of having code on the visualisation tool vs having the hard logic done by the source side or the back-end. The premise being, should all the complicated code such as creating complicated measures be moved to the back-end script or can these be written in the front-end. A simplified example would be if you have a long “if” statement, should this be moved to the back-end script and what is the performance contribution of having this in the front-end.

Front-end

 

For visualisation tools that act as a front-end, it is of the utmost importance that the tool runs smoothly, and performance is high. This is important because the visualisation tool is where the users interact with the data and thus, this needs to be streamed lined. The front-end should deliver data to the user as quickly as possible without lag because chances are, the tool will be used by all sorts of users who might be presenting as various platforms and therefore, it is imperative that this does not get delayed by performance hits.

Back-end

 

The back-end is generally where the data would sit. This could be in the form of databases or in the case of tools such as Qlikview, this would be in the form of QVD files. The front-end reads data from here and then is modelled into what would make the data more sensible. Once modelled, we can then make visuals from the data that would fulfil the business needs of the users. The back-end can be any storage format that makes sense for the tool to use. Storing data on the back-end has some advantages that will be listed below.

Advantages of Front-end

 

Below are the advantages of having complicated code in the frontend:

·        Back-end runs faster

·        Back-end becomes smaller


Disadvantages of Front-end

 

·        Complicated measure or calculated column could cause performance issues

·        Makes maintainability of code harder

·        Visualisation tools usually use the RAM to render visuals and keep queries in memory, therefore, a complicated measure/calculated column uses more memory


Advantages of Back-end

 

·        Is allowed to do the heavy lifting of complicated code

·        Depending on usage of data, is allowed to take time to refresh new data

·        Makes code maintainable

·        Acts as a custodian of data and allows for interchangeable visualisation tools


Disadvantages of Back-end

 

·        Depending on need, could slow down refreshing of data on live reporting

·        Adding extra columns could lead to over bloated tables which could affect performance when those tables are loaded into the visualisation tool

Conclusion

 

In the end, we would say that it depends on the situation that you are in. in many cases, it is not feasible to have the code in the back-end but in the front-end. But a general rule of thumb is to have code in the back-end so that the heavy lifting is done that side and visual tool does what it I supposed to do which is displaying data that can easily be sliced and diced for whatever purpose in a quick manner. Always keep in mind you audience.

Tuesday 13 March 2018

Qlikview Architecture - One Tier Solution

Introduction
Many things have been said about which architecture to follow when developing Qlikview Apps. The overall winner so far has been the 3-tier architecture. This is simple 3 apps that do different things. The first 2 apps act as an ETL process and the third app will be the visualisation layer. Therefore, the first app, Tier 1, will only get source data from wherever and store it into different QVD's. The second app uses the QVD's from Tier 1 and does all the modelling and transformations and stores this into a more star schema structure. Tier 3 will then just read these QVDs and display the analytics as we need.

The above process is the basic 3-tier process followed in developing Qlikview Apps that are reusable and mostly for long term projects that will need maintenance and are going to be used long term. but, what if you need to complete a quick POC for someone. One wonders if this process is still necessary. In this post, we will discuss the merits of using a 1-tier architecture as well as the situations whereby this would be needed.

When to Use
When is it advantageous to use a 1-tier architecture? The answer should be never as we are trying to build scalable, maintainable Qlikview Apps and we need to separate processes. It is also important that Apps are separated for debugging purposes as well as code refactoring for future use. However, the above mentioned are in the case where a project has been accepted and ready to be implemented. What happens if you must create a simple POC for a client?

This, changes everything because you should ask whether you would need a proper 3-tier architecture for this, especially is the source is simple excel sheets. Therefore, in this case, a 1-tier architecture is warranted especially if the code will not be re-used and you just need to show the client some of the benefits of Qlikview. This is the elevator speech of Qlikview. It also highlights the power of Qlikview should get an App up and working relatively quick.

Another place where a 1-tier architecture can be used is when the source is already in a good schema state. Basically, when we do not need to make any transformations to data and would like to just display the data in the database in a business-friendly manner.
So back to the question of using a 1-tier architecture, the answer should be, “Sometimes when needed”.

Advantages
·        To get a POC off the ground

·        When there is limited data to analyse

·        Perfect to use when dealing with excel data

·        When just loading from one source data (When transformations are not needed)

Disadvantages
·        Depending on the data, performance could be impacted

·        Maintainability and re-use of the script could become complicated

·        Many points of failure without being able to know what exactly failed

·        Script could become complicated if transformations are needed to be done on the data

·        Transformations will impact overall user experience

Conclusion
In conclusion, there is a use for a 1-tier architecture. It is useful when conducting a potential POC as well as when the database is already in a good schema – sits on top of the database for visualization purposes - when all that is needed is to import data and use it to answer business questions visually. It is not useful when we need to conduct an ETL process because this could make the process cumbersome and make the code unreadable and difficult to maintain.

Tuesday 30 May 2017

One Table To Rule Them All


Introduction


In Business Intelligence and data science there has always been this debate about whether a star schema is better than a snowflake schema or vice versa. This post, is not about that argument. This is a discussion that comes after the choice has been made. 

Once you have chosen whether to go star or snowflake, there is the next question which is: "Is there a case whereby lumping everything into the fact with a few dimensions is better than having a streamlined fact table with many dimensions? "

In this discussion, I will try arguing for both cases from my experience thus far. For my debate regarding star vs snowflake, you may want to look at:


Big Fact, few dimensions


Many people can argue to put everything into the fact table. This is not necessarily wrong per se. It takes all the hard work out of modelling correctly and making sure the data links together properly.
There is a case for this, the case to join everything into one gigantic, monstrous Fact table. 

For example: If the de-normalising of an ultra-normalised database makes it hard to de-normalise, the pressure is there to just lump almost everything into the Fact and have a few dimensions. There are advantages and disadvantages.

Advantages

-        All needed data is found in the Fact
-        It is easier to query one table for fields instead of having many hops through dimensions.
Disadvantages

-        Increases the size of the Fact table
-        Reduces performance of a dashboard
-        Reload time increases due to the huge volumes of the Fact table
-        Un used fields are loaded into the dashboard
-        Incorrect measures could be calculated due to potential duplicates arising from joins

Small Fact, Many Dimensions

 

As mentioned above, best practice is to have a Fact table with links/keys to dimensions and measures that will be used on the dashboard. The Business Intelligence tool, Power BI, does this well. Although other tools and cubes should also be built in this manner. Building your model like this also promotes a separation of concern. 

We should always keep in mind that dimensions are for describing entities. Just as we have in development, a class should describe an object so too we have in Data warehousing that dimensions describe entities and the Fact is used to link all of them together with its measures.

Advantages

-        The size of the Fact table is smaller
-        Performance of a dashboard is faster
-        Reload time decreases due to the low volumes of the Fact table
-        Un used fields are not loaded into the dashboard
-        Duplicates in Dimensions should not be allowed therefore more accurate measures

Disadvantages

-        Linking to the Fact table must be precise
-        Some measures could be incorrect if linking is not done well

Conclusion

 

In hindsight, it should be stated that that it is not best practice to do this. A Fact table should consist of several keys linking to different dimensions with many measures. This takes the burden off the fact table.
 

The dimensions should be used to describe an entity and the fact should be freed of this. The role of the Fact table should be to contain keys and measures. This measure will then be used to determine KPI’s on the front-end of the dashboard.

Modelling is 80% and creating the dashboard is only 20% of the whole work