What’s in a name? : Renaming grouped data in tables with SAQL

Requirement: Our client wants to show a summary table grouped by the top 20 accounts, a group of those accounts outside the top 20 and a final total row for all accounts.

Summary Revenue
Top 20 12345
Others 12345
Total 12345

The groups our customer would like to summarise by are not defined as dimensions in the dataset and would need to be calculated at run time as the data is going to be updated regularly. But it’s ok because we have SAQL. Using SAQL, we can create custom groupings of our account revenue data and then display it in a table. Here’s how.

To help with this demonstration download the sample dataset and import this into your Einstein developer org.

Locate the Mock_Data dataset you've just uploaded in Analytics Studio and create a new lens from it. Save it as 'Account summary'.

Screen-Shot-2018-01-21-at-22.09.52

To save some time and get to the good stuff, launch the SAQL editor by clicking on the SAQL button.

saql

Delete any SAQL code in the editor. Then paste in the code below and click 'Run Query', this will produce a ranked table of Accounts by Revenue like the one below.

q = load "MOCK_DATA";
q = filter q by date('order_Date_Year', 'order_Date_Month', 'order_Date_Day') in ["1 year ago".."1 year ago"];

-- Create primary stream to base our subsequent streams on

result = group q by 'Account';
result = foreach result generate q.'Account' as 'Account', sum(q.'Revenue') as 'Revenue';
result = group result by 'Account';
result = foreach result generate 'Account', sum(Revenue) as 'Revenue', rank() over([..] partition by all order by sum(Revenue) desc) as 'Rank';
result = order result by ('Rank' asc);
result = limit result 2000;

Screen-Shot-2018-01-21-at-22.17.15

We have a stream of all accounts listed in order of revenue, so our first task is to create a row for our summary table that is just comprised of the Top 20 accounts. This is where we use SAQL to filter and group just those accounts. We achieve this by creating a new stream called q_top20.

We first have to project the 'result' stream as a foreach statement, before we can filter.

Copy and paste the following statements.

-- Aggregate the revenue for all rows ranked <= 20
q_top20 = foreach result generate 'Account', sum(Revenue) as 'Revenue', 'Rank' as 'Rank';

Now that we have data from our primary stream assigned to our q_top20 stream, we only want the results that are less than or equal to a rank of 20.

q_top20 = filter q_top20 by 'Rank' <= 20;

Then we need to aggregate all of the values in those filtered rows. Using a group statement. So for this we group by all.

q_top20 = group q_top20 by all;

Now this is the important part for our custom grouping. We need our dimension to be named something other than the name of the account to which the revenue belongs. So for this we can assign a string value to a dimension. As this is our Top 20 stream, then we name it "Top 20".

q_top20 = foreach q_top20 generate "Top 20" as 'Account', sum(Revenue) as 'Revenue';

Once you've added all the statements above click 'Run Query'. And you should have something like this:

Screen-Shot-2018-01-21-at-22.20.22

So that gives us our first custom grouping. Now our next grouping is a variation on that theme. Copy and paste the SAQL below into your editor underneath the q_top20 code block.

-- Aggregate the revenue for other rows ranked > 20

q_others = foreach result generate 'Account', sum(Revenue) as 'Revenue', 'Rank' as 'Rank';
q_others = filter q_others by 'Rank' > 20;
q_others = group q_others by all;
q_others = foreach q_others generate "Others" as 'Account', sum(Revenue) as 'Revenue';

So here we are doing much the same as we were in the previous stream, except filtering rows greater than 20 and stating that this row should be called "Others".

If you click 'Run Query' you should see an "Others" table like the one below:

Screen-Shot-2018-01-21-at-22.21.42

And for our final summary row, this is an aggegation of all account revenue values, so it's similar again but this time without a filter. And we set the group name to "Total".

-- Aggregate the revenue of all results to give a grand total

q_total = foreach result generate 'Account', sum(Revenue) as 'Revenue', 'Rank' as 'Rank';
q_total = group q_total by all;
q_total = foreach q_total generate "Total" as 'Account', sum(Revenue) as 'Revenue';

Screen-Shot-2018-01-21-at-22.23.30

With our three summary streams ready, next we have to combine them into one nice summary table. For this we use the union statement. This enables us to combine three streams of data to form one, provided they have the same dimensions and mesaures. So create a new stream called q_summary and union the summary data streams like this:

q_summary = union q_top20, q_others, q_total;

Then all that remains is to group the stream by the new custom 'Account' field values we've assigned.

q_summary = group q_summary by 'Account';

And then to give our colum a more suitable dimension label, we'll assign the value of "Summary".

q_summary = foreach q_summary generate 'Account' as 'Summary', sum(Revenue) as 'Revenue';

Then with those final queries in place, click 'Run Query'. And you should see a table with your summarised rows, in one neat table.

Screen-Shot-2018-01-21-at-22.28.26

But wait, there's something not quite right. We have all the right data but the rows are in the wrong order. And sadly there is no way to override this easily, as Einstein Analytics always sorts the dimensions alphabetically. But it doesn't have to end there, find out how you can take control of your dimension ordering.

Tip: You can extend this to grouping by any dimensions as long as they are available to you in your primary stream.

Show Comments

Get the latest posts delivered right to your inbox.