Following up the investigation of my colleague Kaj, the past few weeks I have been looking into ADF (Azure Data Factory) Mapping Data Flow, which is now in public preview, to see if Valid can start using this new technology for ETL- and data warehousing purposes.The Valid BI Framework is a standardized way to build data warehouse solutions. Within this framework we currently use SSIS (SQL Server Integration Services) to run ELT-processes. The framework also supports the automatic generation of database objects (tables, views and stored procedures) and SSIS packages based on metadata from source systems.
The main goal was to look at the three different aspects listed below. With the outcome of these aspects we can compare our current solutions with the ADF Mapping Data Flow technology, and are able to judge whether the latter is ready to use for completely Cloud based solutions.
- SSIS Comparison: can ADF Mapping Data Flow do what SSIS can?
- Performance: do ADF Mapping Data Flow solutions perform up to par?
- Automation: is it possible to generate ADF Mapping Data Flow solutions based on source system metadata?
Here are my findings on each aspect:
Currently not all SSIS components that are being used in our Valid BI Framework have an equivalent activity in ADF Mapping Data Flow, or support the same functionalities. I also discovered that not all activity combinations are possible. E.g. a Conditional Split activity cannot be followed by a New Branch activity (Multicast in SSIS).
Up until now the performance of ADF pipelines that contain Mapping Data Flow activities is below par. Loading 1.250 records from a CSV file into an Azure SQL Database table took over 6 minutes. This is probably due to the fact that it is still in public preview. Once ADF Mapping Data Flow becomes generally available we need to reassess the performance.
Like almost every other Azure component ADF Mapping Data Flow works with ARM (Azure Resource Manager) templates. These templates consist of JSON code. Once the necessary functionalities are added and we can manually build all the necessary types of data flows (e.g. CSV2SA, SA2HIS) we can start generating the JSON code on a larger scale.
Because of the lack of crucial ADF Mapping Data Flow activities and functionalities, and the overall bad performance, it is my opinion that ADF Mapping Data Flow is not yet ready to be incorporated within our customer Cloud solutions. We are currently talking to Microsoft to see which functionalities will be added to ADF Mapping Data Flow, and in what period that will be.
So that's it for now! If you have any questions about this topic, don't hesitate to leave comment!