• Welcome to CloudMonks
  • +91 96660 64406
  • info@thecloudmonks.com

Azure Data Engineering With Microsoft Fabric

Microsoft Fabric Analytics Ecosystem:

Microsoft Fabric is an end-to-end analytics platform that unifies all data and analytics workloads — data engineering, data integration, data warehousing, real-time analytics, BI, and governance — into one ecosystem.It brings together the best parts of Azure Data services under one unified SaaS environment.


Core Idea of Microsoft Fabric:
  • Microsoft Fabric provides:
    • One unified lake (OneLake)
    • One engine (Lakehouse & Warehouse powered by Delta/Parquet)
    • One security & governance model (Purview)
    • One workspace for all analytics personas
    • One platform for end-to-end analytics (from ingestion → transformation → reporting)

    Why Microsoft Fabric Is Powerful:
  • Unified platform
    • All analytics services under one umbrella.
  • SaaS & serverless
    • No need to manage clusters or compute.
  • Shared storage
    • Everything stored in OneLake using open Delta format.
  • Cost efficient
    • Single capacity-based pricing.
  • One security model
    • End-to-end governance with Purview.

    Microsoft Fabric Analytics Master Program


  • Module 1: Introduction to Microsoft Fabric
  • 1.1 Overview of Fabric
    • What is Microsoft Fabric?
    • Why Fabric? Evolution from Azure Synapse, Power BI & Lakehouse
    • Fabric architecture: OneLake + Workloads
    • Key capabilities and unified analytics approach
  • 1.2 Fabric Workspaces & User Interface
    • Navigating Fabric UI
    • Creating workspaces
    • Licensing & capacity models
  • Module 2: OneLake – Unified Data Lake
  • 2.1 OneLake Concepts
    • What is OneLake?
    • Delta format & Lakehouse foundation
    • OneLake vs ADLS Gen2
  • 2.2 OneLake Shortcuts
    • ADLS, Amazon S3 shortcuts
    • Managing distributed data across clouds
  • 2.3 OneLake Security & Governance
    • Role-based access
    • Cross-workspace security
    • Data lineage in OneLake
  • Module 3: Introduction to Microsoft Fabric Data Factory
    • Understanding Data Factory in Fabric (How it differs from Azure Data Factory)
    • Key capabilities: Pipelines, Dataflows, Git integration, Lakehouse connectivity
    • Real-time vs batch data processing in Fabric
    • Role of Data Factory in the end-to-end Analytics workflow
    • Data Factory home page overview
    • Navigation through Pipeline, Dataflows, and Data Pipelines
  • Module 4: Pipelines in Microsoft Fabric
  • 4.1 Understanding Pipelines
    • What is a Fabric Pipeline?
    • Comparison: ADF Pipelines vs Fabric Pipelines
    • Key components: Activities, Objects, Triggers
    4.2 Pipeline Activities
    • Move Data Activities
      • Copy activity
      • Data movement sources & sinks
      • File format support
    • Control Flow Activities
      • If condition
      • ForEach
      • Wait
      • Switch
      • Set Variables
    • Transformation Activities
      • Dataflow Gen2
      • Notebook activity
      • Spark job activity
    4.3 Working with Pipeline Parameters
    • Pipeline parameters
    • Variable types
    • Dynamic content using expressions
    • Pipeline debugging & validation
  • Module 5: Dataflow Gen2 (Power Query in Fabric)
  • 5.1 Introduction to Dataflow Gen2
    • What is Dataflow Gen2?
    • Differences from ADF Mapping Dataflows
    • Low-code/no-code ETL using Power Query
    5.2 Dataflow Components
    • Connectors & source options
    • Power Query Editor (M code basics)
    • Transformations:
      • Filter, Sort
      • Merge, Append
      • Group by
      • Pivot/Unpivot
      • Derived columns
    5.3 Dataflow Gen2 Outputs
    • Writing data to Lakehouse (Delta tables)
    • Write modes: Append, Overwrite
    • Scheduling Dataflows
    • Performance optimization techniques
  • Module 6: Connecting to Lakehouse, Warehouse & Data Sources
  • 6.1 Data Sources Supported
    • ADLS Gen2
    • Azure SQL Database
    • APIs & REST endpoints
    • On-prem SQL (using Gateway)
    6.2 Data Integration Patterns
    • Ingesting data into Lakehouse
    • Writing to Tables, Files, Folders in OneLake
    • Using Warehouse connectors
    • Uploading Excel/CSV files
  • Module 7: Orchestration with Triggers & Schedules
  • Triggers in Fabric
    • Types of Triggers in Fabric
      • Manual
      • Scheduled
      • Event-Based (upcoming features)
    • Setting Recurrence
    • Time Zone configuration
    • Managing Trigger history
    • Best practices for scheduling workloads
  • Module 8: Monitoring & Activity Debugging
    • Pipeline run view
    • Dataflow execution logs
    • Failure & Success output messages
    • Data Lineage & tracking
    • Performance dashboard
    • Error handling & retries
    • Logs and alerts
  • Module 9: Introduction to Spark Architecture in Microsoft Fabric
    • What is Apache Spark?
    • Spark in Microsoft Fabric (Workspace, Lakehouse, Notebook compute)
    • Components of Spark:
      • Driver
      • Executors
      • Cluster Manager
      • Spark Context & Session
    • How Spark executes jobs in Fabric
    • Stages, Tasks, DAG
    • Lazy evaluation concept
    • Storage formats used by Fabric (Delta, Parquet)
    • Understanding Fabric’s compute environment for Spark
  • Module 10: Reading Data from CSV Files
    • Reading CSV files using PySpark
    • Options: header, delimiter, inferSchema, multiline
      • header
      • delimiter
      • inferSchema
      • multiline
    • Using Spark’s file path in Lakehouse / OneLake
    • Handling corrupt/unwanted records
    • Saving CSV data into tables
    • Best practices for handling large CSV files
  • Module 11: Reading Data from JSON Files
    • Reading multiline JSON
    • When to use spark.read.json() vs read.format("json")
    • Working with nested JSON structures
    • Exploding arrays and nested structures
    • Converting JSON into structured DataFrames
    • Writing processed JSON into Delta tables
  • Module 12: Reading Data from XML Files
    • Installing and using Spark XML library in Fabric Notebook
    • Reading XML using:
      • spark.read.format("xml")
    • XML options: rowTag, attributePrefix, valueTag
    • Flattening XML hierarchical data
    • Handling missing / irregular XML structures
    • Saving XML output to Delta tables
  • Module 13: Reading Data from Excel Files
    • Using Spark Excel library
    • Reading .xlsx and .xls files
    • Options: header, sheetName, dataAddress
    • Handling multiple sheets
    • Converting Excel data to DataFrame
    • Writing Excel processed data to Delta tables
  • Module 14: DataFrame Operations
    • Creating DataFrames (manual + from files)
    • Schema definition (manual + struct types)
    • Column operations: add, drop, rename
    • Filtering, sorting, null handling
    • Type casting & conversions
    • Handling duplicates
    • Aggregations, groupBy operations
    • Joins: inner, left, right, outer, semi, anti
    • Window functions
    • Statistical operations
  • Module 15: DataFrame Transformations
    • Narrow vs Wide transformations
    • Common transformations:
      • select(), withColumn(), filter(), drop(), distinct()
      • join(), union(), repartition(), coalesce()
    • Chaining transformations
    • Transformation functions: map, flatMap
    • Using Spark SQL functions (F module)
    • Repartitioning strategies for performance
  • Module 16: Delta Lake Fundamentals
    • What is Delta Lake?
    • Delta vs Parquet
    • ACID Transactions
    • Schema Enforcement & Schema Evolution
    • Delta Logs and File Structure
    • Compaction & Optimization
    • Time Travel operations
    • Using Delta in Fabric Lakehouse
  • Module 17: Working with Delta Tables
    • Creating managed & unmanaged Delta tables
    • Converting existing data to Delta format
    • CRUD operations on Delta
      • Insert, Update, Delete
      • Merge (UPSERT)
    • Versioning & Rollback
    • Vacuum & Optimize commands
    • Partitioning Delta tables
    • Auto Loader with Delta (optional)
  • Module 18: Spark Compute in Microsoft Fabric
    • Fabric Runtime for Apache Spark
    • Types of compute environments
    • How Compute is allocated
    • Session-level compute vs Job compute
    • Autoscaling behavior
    • Managing Spark jobs in Monitoring Hub
    • Optimizing compute usage & cost
    • When to use SQL Engine vs Spark Engine
  • Module 19: Spark SQL
    • Creating SQL temporary views
    • Running SQL queries in notebooks
    • SQL functions for analytics
    • Using Spark SQL to query Delta tables
    • Joins, aggregations, window functions
    • Optimizing Spark SQL queries
    • Saving SQL results as tables
  • Module 20: Types of Views in Spark / Fabric
    • Local Temporary Views
    • Global Temporary Views
    • When to use each
    • Limitations of temporary views
    • Accessing global views across notebooks
  • Module 21: Types of Tables in Fabric/Spark
    • Managed Tables
      • Storage in Lakehouse managed location
    • External Tables
      • Storage outside managed area
    • Delta Tables
    • Parquet / Other format tables
    • SQL Warehouse Tables vs Lakehouse Tables
    • Table creation using Spark SQL
    • Best practices for table selection
  • Module 22: Introduction to Real-Time Analytics in Fabric
    • What is Real-Time Analytics?
    • Role of KQL Database in Microsoft Fabric
    • Difference between Warehouse, Lakehouse & KQL DB
    • Architecture of RTA in Fabric
    22.2 Components
    • KQL Database
    • Event Streams
    • Data Connections
    • Real-time dashboards
  • Module 23: Introduction to Microsoft Fabric & Data Warehouse
    • Understanding Lakehouse vs Warehouse vs KQL DB
    • Role of the Fabric Data Warehouse in the ecosystem
    • Differences between Fabric Warehouse and Azure Synapse Dedicated SQL Pool
    • Architecture overview:
      • OneLake
      • Delta format
      • SQL Endpoint
      • Direct Lake mode
  • Module 24: Fabric Warehouse Architecture
    • Storage layer (Delta tables in OneLake)
    • Compute layer (distributed SQL compute)
    • Metadata & cataloging (Unity Catalog upcoming)
    • Distributed query processing model
    • How Fabric Warehouse achieves performance
    • Warehouse item structure: Schemas, Tables, Views, Procedures
  • Module 25: Setting Up Fabric Warehouse
    • Creating a new Warehouse in Fabric Workspace
    • Understanding capacities & licensing
    • Warehouse UI & navigation
    • Creating schemas and managing objects
    • Permissions & role-based access control (RBAC)
    • Integrating Warehouse with Lakehouse
  • Module 26: Tables & Data Modeling
    • Managed tables
    • External tables (Delta Lake / Parquet / Lakehouse)
    • Differences & use cases
    26.2 Data Modeling Concepts
    • Star schema vs Snowflake schema
    • Dimensions & Facts
    • Primary keys & surrogate keys
    • Slowly Changing Dimensions (SCD) Types 0–3
    26.3 Table Operations
    • CREATE / ALTER / DROP table commands
    • Defining data types for Warehouse
    • Working with constraints (PK, FK, Unique, Not Null)
    • Partitioning considerations in Fabric
  • Module 27: Ingesting Data into Warehouse
  • Ingestion Methods
    • Data Pipelines (Fabric Data Factory)
    • Dataflow Gen2
    • Notebooks & PySpark
    • COPY INTO Command
  • Module 28: Integration with Power BI
    • Connecting Warehouse to Power BI
    • DirectQuery, Direct Lake & Import modes
    • Automatic model creation from Warehouse
    • Performance benefits of Direct Lake
    • Publishing & refreshing reports
  • Module 29: Performance Optimization
    • Query performance tuning
    • Delta file optimization (Optimize, Vacuum)
    • Index-like concepts in Fabric (Clustered storage ordering)
    • Techniques to speed up ingestion
    • Using materialized views for performance
    • Avoiding long-running queries
  • Module 30: Power BI in Microsoft Fabric
    • Creating datasets
    • Relationships, measures, DAX basics
    30.1 Visualization & Reporting
    • Interactive dashboards
    • Power BI Desktop vs Fabric Power BI
    30.2 DirectLake Mode
    • DirectQuery vs Import vs DirectLake
    • Performance optimization

    Train your teams on the theory and enable technical mastery of cloud computing courses essential to the enterprise such as security, compliance, and migration on AWS, Azure, and Google Cloud Platform.

    Talk With Us