Process Event Hub data with Stream Analysis and push them into a dimensional model

In my scenario, I get 1000 JSON files per day, which must be analyzed with Power BI. So, I will get approx. 380 million rows per year. The system has an initial load of circa. 500 Mio. rows. So, I think a data model is mandatory to perform analysis and a live watch of the data. To do this, I need some dimensions which must be connected to the fact data.

I push the data via a C# script, at this point, many thanks to a very good colleague – Stephan Tomke – for providing me this awesome program, to a Azure Event Hub and staging the data inside a SQL database. In this table, I don’t have aby relationships to the dimensions. To perform this, it is not a option to do this with SSIS, ADF or some other ETL tools, because the business needs real time data.

To do a real time processing, I decided to add a table trigger inside the SQL database to get the needed identifiers from the dimensions:

alter TRIGGER TR_Audit_Employees
ON dbo.eh
FOR INSERT
AS
INSERT INTO dbo.eh2
(
    Id,
    TotalNumber,
    SuccessfulUploads,
    FailedUploads,
    ProcessDate,
    UploadTypeId,
    UploadStatus,
    ElapsedTime,
    UploadSize,
    ErrorCode,
    ErrorMessage
)
select Id,
       TotalNumber,
       SuccessfulUploads,
       FailedUploads,
       ProcessDate,
       (
           select id from dbo.uploadtype x where d.uploadtype = x.uploadtype
       ),
       d.UploadStatus,
       ElapsedTime,
       UploadSize,
       ErrorCode,
       ErrorMessage
FROM inserted D


Now, I can connect to the tables and built up an efficient model:

If you add an automatic page refresh or using a change detection measure, you can provide a near real time analysis.

Categorized: Allgemein

Comments are closed.