A common issue for many organisations is how to streamline the production of their dashboards. Тheir data can easily be exported from whatever system holds it to a csv file or spreadsheet, and the relevant employees are usually fairly proficient at constructing KPIs and other metrics using spreadsheet formulae and the like. Moreover, their employees have experience and skills, in constructing dashboards using some dashboarding package, e.g., Tableau or Google Data Studio from these spreadsheets. What they commonly lack is a method to automate the production process.
Why it’s necessary to automate the production of dashboards
Many organisations are happy to continue with the manual approach of dashboard creation as outlined above. The spreadsheets have already been set-up; the data can be uploaded fairly easily from them to the respective dashboarding package. The results can be then viewed by all relevant parties. The tasks don’t take that long to accomplish each period.
However, the problem is not only the time it takes to perform the data extraction, reformatting and dashboard population. It’s the introduction of unnecessary error. For it is far too easy to make errors whilst performing manual data manipulation on a regular basis. No matter how many error catches you put in, errors invariably creep in. Also, over time, the spreadsheets gradually become unwieldy, and eventually what seemed a simple task becomes a data nightmare.
We discussed how we, at JumpData, developed a system for a client to automate the production of social media dashboards using Google, YouTube and Facebook Analytics in another blog. In this post, we will briefly explain how JumpData and Techccino developed another system that extracts bookings and point of sales data from different systems, cleans it, constructs the relevant metrics, then populates the company’s dashboards.
Data Sources
There were multiple data sources that the company used for their bookings, sales and inventory data. The most important were Roller – which stored most of the bookings data, and GoodTill – which stored most of their sales and inventory data, and these were the two we focussed most on.
Both of these packages had “friendly” GUIs that the company had been using to track all their bookings and other information. Moreover, they both permitted data exports into .csv files that the company could use to construct simple KPIs in Google Sheets (often with some additional manually inputted data). The company could then import these metrics from the worksheet into Google Data Studio and populate rudimentary dashboards.
The process
The company was proficient at producing dashboards in Google Data Studio; moreover, they knew exactly how they wanted these to look. Our task was then to automate the production of the rest of the dashboard creation process, namely the data acquisition, metric production and upload to the dashboarding software.
Naturally we had to choose some relational database in which to hold all relevant data (both current and historical; we chose to use a mySQL database held in Google Cloud.
Both Roller and GoodTill have APIs that allow authorised users to download the same data that is otherwise accessible via their web applications / GUIs. We programmed a Python-based server running in Google Cloud to retrieve the data from these APIs on a daily basis. Our application then cleaned the data, ensured it was validated by the server, before saving it in the mySQL database. This process was cost-effective as the sever itself consumes resources only when retrieving and processing the data, ensuring that the customer is billed only for the few minutes a day when that happens. (This is known as serverless hosting. It allows customers to minimise their cloud hosting bill especially when workloads are in short bursts or batches.)
We considered using Python Pandas to conduct the majority of the data manipulations / upload to the mySQL database. In the end, we decided to use native SQL in the final application. However, since Pandas is easier to manipulate data on-the-fly and produce summary outputs, we first produced all the relevant metrics using Pandas. This enabled error catching from ourselves, and moreover it proved much quicker and easier to discuss, change and enhance the various metrics with company before we productionised the entire process.
In summary, we:
- Used the APIs from Roller and GoodTill to download relevant data into .json file
- Manipulated the data from both sources via Python Pandas to construct required KPIs and other metrics from both sources
- Discussed outputs with client, enhance metrics, review processes
- Created a mySQL database stored in Google Cloud to hold all relevant metrics
- Modelled the Pandas manipulation from (2) in SQL for automation in a mySQL database held in Google Cloud
- Linked the mySQL database to the Google Data Studio to allow dashboard construction
- Scheduled all processes for weekly update of dashboards
The system operates smoothly with no user intervention required. In addition, with minor tweaks, the system can be applied to similar bookings and point of sales software.
Our client is now able to confidently track their various KPIs over all of their sites. The whole dashboard production process is automated, reliable and free from human errors. Moreover, we have given them easy access to all of their historical data, so they can create new custom-built reports as their business needs change.
Responsible for data analysis, modelling and automation