Azure Purview is the next generation of the Azure Data Catalog. Now we have a Public Preview of this service and there are some changes how to catalogue local data sources.
Yesterday I started with Power Bis is a good way to create a catalogue with all data source types. Actually Power BI is a little bit limited, because you only see the dataset on a import mode report without any tables, columns and so on. If you use a direct query dataset, you see the source, but also without tables and columns. Another things was, that you get every workspace and report regardless if it was deleted or active. But we talk about a preview and with the Data Catalog there was no way to get information about Power BI.
Okay, let’s go back to the main topic … With Purview you can register online and on premises data sources. For the online sources there is no need to install or register additional components. For on premises sources you need to install a self hosted integration runtime. It is the runtime which we know from the Data Factory. On the link for the runtime you will get more information about the behavior of this setup.
By registering the Runtime at the Purview portal, you will get the keys and the download link:
It is a little bit confusing by the download, because you can download different versions. I used the latest one. After starting the setup and providing the key, the setup finds the runtime which is configured at the portal.
Actually you can discover SQL Server sources, but no sources of the type of Analysis Services. I think a reason of this behavior is, that the runtime is for the data factory and this will write the sources? I’m now sure, but I think there will come a connector.
Next we need a SQL authenticated login to discover the SQL Server. The user will need high permissions. You can check this on the link above. After that, you need to store the login information of this account in your Azure Key Vault:
After that I created on the Purview Page a new data source collection with a data source with the type of SQL Server.
For the scheduling you need the Key Vault entry:
After a discovery of my source I get all tables of database:
It is a little bit confusing that my SQL Server name is MSSQLSERVER. I installed the server as an unnamed instance. This is the reason why. Now you can discover the tables, columns and setup classifications and so on.
I think this tool gives us more flexibility as the Data Catalog. To inventory local data sources and also online sources, it is a little bit different. Hopefully we will get the GA version very soon 😊