A first look to Azure Purview and how to connect local data sources

Azure Purview is the next generation of the Azure Data Catalog. Now we have a Public Preview of this service and there are some changes how to catalogue local data sources.
Yesterday I started with Power Bis is a good way to create a catalogue with all data source types. Actually Power BI is a little bit limited, because you only see the dataset on a import mode report without any tables, columns and so on. If you use a direct query dataset, you see the source, but also without tables and columns. Another things was, that you get every workspace and report regardless if it was deleted or active. But we talk about a preview and with the Data Catalog there was no way to get information about Power BI.

Okay, let’s go back to the main topic … With Purview you can register online and on premises data sources. For the online sources there is no need to install or register additional components. For on premises sources you need to install a self hosted integration runtime. It is the runtime which we know from the Data Factory. On the link for the runtime you will get more information about the behavior of this setup.
By registering the Runtime at the Purview portal, you will get the keys and the download link:

It is a little bit confusing by the download, because you can download different versions. I used the latest one. After starting the setup and providing the key, the setup finds the runtime which is configured at the portal.

Actually you can discover SQL Server sources, but no sources of the type of Analysis Services. I think a reason of this behavior is, that the runtime is for the data factory and this will write the sources? I’m now sure, but I think there will come a connector.
Next we need a SQL authenticated login to discover the SQL Server. The user will need high permissions. You can check this on the link above. After that, you need to store the login information of this account in your Azure Key Vault:

Is is so great that you can only use credentials from the Key Vault – no paswords stored in the code :)

After that I created on the Purview Page a new data source collection with a data source with the type of SQL  Server.

For the scheduling you need the Key Vault entry:

After a discovery of my source I get all tables of database:

It is a little bit confusing that my SQL Server name is MSSQLSERVER. I installed the server as an unnamed instance. This is the reason why. Now you can discover the tables, columns and setup classifications and so on.
I think this tool gives us more flexibility as the Data Catalog. To inventory local data sources and also online sources, it is a little bit different. Hopefully we will get the GA version very soon 😊

DirectQuery Realtime Datasets and the composite model

I really like the Realtime datasets in Power BI, especially why it is really easy to create one 😊 on the past you create on use it in the service or use it via the Desktop by using the Datasets connector. But there was not so much possibilities to make more with the data, e.g. to merge it with other sources. So you created “complex” processes via Azure, push the Realtime data into a SQL DB and then you are able to merge the data.
With the Direct Query mode for datasets it is more easier. You can connect to the Realtime Dataset, import other data and analyze them.

A small example:

Create a Streaming Dataset at Power BI

With the examples for PowerShell and so on it is very easy to add data to this dataset

Now you can connect to this dataset via the Power Desktop (you must enable the preview feature) and load other data from different datasources into your model:

The you can create relationships between the data and analyze them

Unfortunately it is not possible to modify the metadata, like to add translations via the Tabular Editor to the data from the Streaming Dataset. It is only possible from the data from other sources.

Building an Slowly Changing Dimension with Power BI

Okay, I think this is not a topic for Power BI, because normally I build a SCD in a Data Mart with T-SQL and not with M/PowerQuery for an Self Service Environment. But in some cases it should be very useful like an PoC. Another reason in an self service scenario can be, that you have a small amount of data and you will analyze them with an history and an correct mapping.

Well, let’s take a look to the data:

On the left you will see the employees and you see which employee are assigned to which department with an timestamp. On the right side you have the facts, so Wilma has an sales mount on the 5th of November. The correct department for this timestamp is the Sales Department. When you build up a data model with Power BI with this tables, you can only join the tables via the EmpId and you can not see which department do this sale, only the employee and you did not see the correct assignment. So, you need an surrogate key which is unique and created for the timestamp and can merge the two tables.

We need a data model like this:

… with this result:

The Counter Table

First we must import the Emp.-Table and group it by the count of the EmpId. We need this to check how much rows we have for which employee and mark the latest record.

The name of this table is “MaxIndex”

This table is only need for our ETL process and we can disable the loading, because we don’t need it on our data model.

The Emp-Table

To do this, we must start with the Employee (Emp) table and load it into PowerQuery and sort it by EmpId and Date:

  1. To sort by two or more columns you need to do some M scripting:

#"Sorted by EmpId and FromDate" = Table.Sort(#"Changed Type",{{"EmpId", Order.Ascending}, {"FromDate", Order.Ascending} }),

2. Next we must group the data by the EmpId

3. The we must add an index

  1. Then we can remove the other columns except the new Custom column and expand it.
  2. Next we can merge it with the MaxIndex table and expand the Row Count (MaxIndex)
  • Now we must determine the next row and we add a number to the Index
  • After that we can join this table with it self by the EmpId and the index and the new column. Then we can expand the column and we only need the FromDate column which we can rename this as EndDate.
  • Then we can check if the index equals the MaxIndex column and if it’s match, we know this is the last entry.
  • Next we can adjust the End Date, if it’s null, we can set it to 12/31/9999
  • Now we can add the surrogate key
  1. After that there is only to do some renaming and subtract one day of the end day, so you don’t have an overlapping of the start and end date.

The fact table

  1. After you have loaded the fact into PowerQuery you can merge it with the Emp table by using the EmpId. You can expand the columns Start Date, End Date and the Surrogate Key.
  2. After that you can add a M Step to filter the data:

#“Filtered Rows“ = Table.SelectRows(#“Expanded Emp“, each [Date] >= [Emp.FromDate] and [Date] < [Emp.EndDate]),

  • Now you can remove the Start and End Date and load the data to your model

That’s it, There are many steps to to and you will find the PBIX file for download. I think the steps can be more simplified, but for this approach it’s okay. But please aware of the amount of data, if you have more than 1 Mio. Fact rows, please build it with T-SQL or another database engine.

Power BI: Translations and the XMLA Endpoint

When you need translations for your  Power BI reporting in a multi lingual scenario, you need a Premium capacity. When you have edited your PBIX with Tabular Editor and added translations, this works with Power BI Desktop, but unfortunately not with Power BI Pro or Free. If you use a SSAS Cube with translations as your data source, you should don’t have this problem and you can use pro licenses. 😊

But in the most cases in a self service BI environment, Excel, CSV and so on are used as data sources. Theses sources are not able to provide translations. So you must to do something in your data model with M or DAX. With the Tabular Editor from Daniel Otykier you can add the needed translations to your data model.

With Power BI Premium you can use the XMLA endpoint. This endpoint gives you the possibility to read the data from your model and re-use them for Reporting Services and so on and now you are able to writeback data to your model. You can write back perspectives, translations and measure groups. To try this, you must enable the Read/Write option for your premium capacity. You can also try this with a Azure-SKU for Power BI Embedded – with an A4 or higher.


After that and assigning the capacity to your workspace, you can consume the translations. Otherwise you get the origin data labels from your model.

At the properties dialogue of your workspace or of your dataset, you get the access connection for your XMLA endpoint. With this URL you can connect via the Tabular Editor to your Power BI data model.

Now you can add the translations for your tables, columns and measures.

If you want to add a default translation for every object in your data model, Daniel has wrote an awesome script to do this: https://tabulareditor.com/2019/10/23/TabularEditorTricks3.html
From SQL BI you can get a good editor to edit very easily the export file without having an active connection to the model: https://www.sqlbi.com/tools/ssas-tabular-translator/
So another department can make the translations for you and after that, you can re-import the file to add the translations.

After saving your modifications, you can consume your report:

And here is the report with the added translations:

Power BI Service Principals

With the Power BI Embedded OnBoarding Tool you can easily embed applications. But the standard mode uses an users with a password. This users requires a Power BI Pro license and normally this users stands under some organizational standards, such as the password must be change after X days. But the biggest issue are, that the password is unencrypted at the web.config file.
These issues can be resolved by using a “service principal” account. Microsoft has documented the way at this page:https://docs.microsoft.com/en-us/power-bi/developer/embedded/embed-service-principal

But here are some missing hints to get the principal work, such as the API permissions. This Blog Post will show you step by step how you can configure a account and setup a demo application.

First you must enable at the Admin Portal of your Power BI tenant the use of the “service principals”. (in a productive scenario you should use groups, not the entire organization)

To configure our embedded application we need our workspace and report id. (needed to configure the config file)

1 = workspace id
2 = report id

Next we must create at the Azure tenant an “App registration”:

Then we must create the client secret

Important: After you left this page, you cannot get the secret again!

Now we must give the service principal API permissions for Power BI

Cgeck that you only grant the required permissions!
you must grant the permissions to your tenant!

After that, you must give the app registration user adin access to the workspace:

Now you must setup the web.config:

  1. Setup the Authentication Type to “Service Principal”
  2. Set the application Id, you will get it from the overview page of the created service principal
  3. Setup the workspace and report id (step 3 and 4)
  4. Paste here the secret
  5. Setup your tenant id, you will also get the id from the overview page of the service principal

Finally we can run our app by using the principal

That’s it! Enjoy!

Connect your on-premises AS Cube with Power BI Embedded and re-use your existing security

If you have deployed your SSAS Cubes on premises and you don’t want to deploy them into the cloud, but you want Power BI? One option is to use Power BI Report Server, but there you don’t have all capabilities which Power BI offers you. You can only embed the Reports with an iFrame and you also don’t have the latest features.
With Power BI Embedded you can integrate your Reports, Dashboards and also Tiles into your own application and you can design the app by your needs. Power BI Embedded can only used by the cloud environment. So, there is no option to use this technology on your own on premises infrastructure.
When you also have some limitations from your management like:

  • GDPR
  • Compliance
  • Row Level Security on your Cubes
  • safe money; don’t give every user in your organisation a Power BI Pro licence or buy a Premium capacity

… it can be very complicated to use Power BI Embedded.

For the first to things you can implement a Hybrid Scenario. You have your own infrastructure, install a Power BI Data Gateway and connect your Power BI Tenant to your local environment. But this solves not the two other problems. If you use a Power BI Embedded application, you can run it with a „Embed for your customers“-Mode, so you spent money for the capacity, for one Power BI Pro license, but the application has only one user, so you get in struggle with the Row Level Security. So you must by for every user a license or for the app a capacity? No! You can set up the Gateway with an Impersonation Mode. With this setting you can push your user names to this Cubes and re-use the Row Level Security.

So, GDPR und Compliance can be solved by the Gateway, because you have only Runtime data at the cloud. The Row-Level Security can be used by the Impersonation and you can also safe money by using Power BI Embedded „Embed for your customers“ and with this technology you can design your own application.

Now I want to demonstrate how you achieve this:

To create an Emdedded App you need:

  1. a Service User
  2. a Power BI Pro license
  3. Setup a Data Gateway
  4. Analysis Services on our Infrastructure as Service environment
  5. to Setup the Gateway
  6. Invoke the Gateway for the Impersination
  7. Power BI Embedded APP
  1. Service User
The user will only assigned to a group which group will synchronized to our local Server.

2. Power BI Pro

To assign very easily a Pro Trial license to this user, log in with this account to Power BI and create a new Workspace; No other Office 365 licences are needed!

3. Setup a Data Gateway

Now we must install a Data Gateway on our on premises environment. This is also a reason why we need a Pro license. The Data Gateway connects our AS Cube to the Cloud.

4. Analysis Services on our Infrastructure as Service environment

First we must add the Gateway user as an SSAS Administrator. This permission is needed to change the user context.
In the Cube we have a role named as „RLS“. The role will filter the data. The members are AD Accounts which will don’t have a Power BI or O365 license.

5. Setup the Gateway

I’ve created a Power BI Report which will use a Live connection to the Cube. The report display a Sales Amount to test the RLS and the current username who views the report.
After the report published to the service, we must setup the Gateway connection
The dataset must be connected to the Gateway. After that you can view the report with the data from the on premises cube.

6. Invoke the Gateway for the Impersination

Here comes the magic. We must setup the Gateway connection to impersonate the users. It’s really no magic, but it’s very cool :) The setting cannot be done via the portal, it must be done via the REST API.

On the msdn website (Power BI REST API documentation) is the magic. There you can find the hint how you can setup an impersonation


"emailAddress": "HybridService001
@plenz.onmicrosoft.com",
"datasourceAccessRight": "ReadOverrideEffectiveIdentity"
}

You can setup the Gateway without Postman or other tool. The try it button will set the properties. You need the data source and the gateway id. You can get the id’s from the gateway configuration website.

1: Gateway Id, 2: Connection Id
After a successful run you should get a 200 code

7. Power BI Embedded APP
You can register very easily an APP with the on-boarding Tool.
Or you can download the code and manually configure the APP: https://github.com/microsoft/PowerBI-Developer-Samples/tree/master/PowerShell%20Scripts

After the wizard completion you can download a configured Visual Studio solution
Now we must edit the Report ID in the web.config to our live report.

Now we can test the Report by providing another username with the Rolename.

Conclusion

With the impersonation you can connect Power BI to your Cubes which are using a Row Level Security. So, Hybrid scenarios are not very new, but when you Embedded, a Data Gateway and this setting, you can provide Power BI solutions very easily and very cheap, because you can use the A-SKU for Embedding. But you must aware of, that you must code an application which makes the authentication.