In my scenario, I get 1000 JSON files per day, which must be analyzed with Power BI. So, I will get approx. 380 million rows per year. The system has an initial load of circa. 500 Mio. rows. So, I think a data model is mandatory to perform analysis and a live watch of the data. To do this, I need some dimensions which must be connected to the fact data.
I push the data via a C# script, at this point, many thanks to a very good colleague – Stephan Tomke – for providing me this awesome program, to a Azure Event Hub and staging the data inside a SQL database. In this table, I don’t have aby relationships to the dimensions. To perform this, it is not a option to do this with SSIS, ADF or some other ETL tools, because the business needs real time data.
To do a real time processing, I decided to add a table trigger inside the SQL database to get the needed identifiers from the dimensions:
alter TRIGGER TR_Audit_Employees
ON dbo.eh
FOR INSERT
AS
INSERT INTO dbo.eh2
(
Id,
TotalNumber,
SuccessfulUploads,
FailedUploads,
ProcessDate,
UploadTypeId,
UploadStatus,
ElapsedTime,
UploadSize,
ErrorCode,
ErrorMessage
)
select Id,
TotalNumber,
SuccessfulUploads,
FailedUploads,
ProcessDate,
(
select id from dbo.uploadtype x where d.uploadtype = x.uploadtype
),
d.UploadStatus,
ElapsedTime,
UploadSize,
ErrorCode,
ErrorMessage
FROM inserted D
Now, I can connect to the tables and built up an efficient model:

If you add an automatic page refresh or using a change detection measure, you can provide a near real time analysis.