Microsoft Fabric Incremental Load Techniques Explained

Mitra

5th Aug 2025

Talk to our cloud experts

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Reloading full datasets every time data updates?

That’s a recipe for bloated pipelines, wasted compute, and missed insights. Incremental loading solves that by moving only what’s changed, nothing more, nothing less. It’s faster, leaner, and built for modern data needs.

This blog breaks down Microsoft Fabric incremental load strategies that actually work. You’ll learn how to implement Change Data Capture (CDC), use timestamp filtering, configure incremental refresh with Dataflow Gen2, and build efficient Data Factory pipelines. Every technique is explained with hands-on clarity, no fluff, just what you need to build smarter, scalable data systems.

Key Takeaways :

  • Incremental load reduces processing by moving only new or changed records.
  • Microsoft Fabric supports techniques like Change Data Capture (CDC), timestamp filters, and Dataflow Gen2 refresh.
  • It boosts performance, lowers costs, and supports near-real-time analytics.
  • Data Factory pipelines can automate incremental workflows at scale.
  • WaferWire can help you design and implement these strategies efficiently.

What is Incremental Loading?

Incremental loading refers to the process of updating only the new or changed data in a destination, rather than performing a full data reload. It's especially useful when working with large datasets, time-sensitive applications, or frequent data refreshes.

Key Benefits:

  • Efficiency: Reduces compute time and resource usage.
  • Near Real-Time Syncs: Frequent updates are possible without long refresh windows.
  • Scalability: Easily handles big data scenarios with minimal overhead.

Common techniques include:

  • Timestamps: Filter by last modified column.
  • Change Data Capture (CDC): Detects changes via the database's transaction logs.
  • Control tables: Maintain metadata for tracking previous loads.

Incremental Load from Data Warehouse to Lakehouse in Microsoft Fabric

Incremental load in Microsoft Fabric helps you move only new or updated data, so you don't have to reload everything. This approach boosts performance, cuts compute costs, and delivers faster insights. Let’s break down how to set up incremental data movement from a Data Warehouse to a Lakehouse using Fabric’s Data Factory pipeline.

Step-by-Step Overview

Incremental load in Microsoft Fabric optimizes data movement by only transferring new or updated records, reducing compute costs and enhancing performance. The step-by-step process involves selecting a watermark column, creating a tracking table, building a Data Factory pipeline, and updating the watermark value after each successful load.

1. Select a Watermark Column

The first step is choosing a column (like LastModifytime) that indicates changes in the source table. This column acts as a watermark, identifying new or updated records since the last pipeline run.

2. Create a Watermark Tracking Table

A simple table (watermarktable) in your Data Warehouse stores the last known watermark value. This acts as a reference point for your next incremental load.

create table watermarktable (

    TableName varchar(255),

    WatermarkValue DATETIME2(6)

);

You’ll insert an initial default value, such as:

INSERT INTO watermarktable VALUES ('data_source_table','1/1/2010 12:00:00 AM');

3. Develop a Pipeline in Data Factory

Build a pipeline in Microsoft Fabric's Data Factory using the following activities:

  • Lookup Activity 1: Fetch the previous watermark value from watermarktable.
  • Lookup Activity 2: Retrieve the latest watermark using a query like:
    SELECT MAX(LastModifytime) as NewWatermarkvalue FROM data_source_table
  • Copy Activity: Pull records where LastModifytime is greater than the old watermark and less than or equal to the new watermark:
    SELECT * FROM data_source_table

WHERE LastModifytime > '@{activity('LookupOldWaterMarkActivity').output.firstRow.WatermarkValue}'

AND LastModifytime <= '@{activity('LookupNewWaterMarkActivity').output.firstRow.NewWatermarkvalue}'

  • Store to Lakehouse: Copy results to your target Lakehouse folder using dynamic file names such as:
    @CONCAT('Incremental-', pipeline().RunId, '.txt')

4. Update the Watermark

After a successful copy, use a Stored Procedure Activity to update the watermark value in watermarktable, setting it to the newly captured maximum value.

CREATE PROCEDURE usp_write_watermark @LastModifiedtime datetime, @TableName varchar(50)

AS

BEGIN

  UPDATE watermarktable

  SET WatermarkValue = @LastModifiedtime

  WHERE TableName = @TableName

END

5. Run and Monitor

Trigger the pipeline. After execution, check your Lakehouse folder to verify that a new data file with the incremental records has been created.

6. Test by Adding New Data

To validate incremental logic:

  • Insert new records into data_source_table
  • Re-run the pipeline
  • Confirm that only new records are picked up and written as a new file in the Lakehouse

Incremental data loads in Microsoft Fabric offer a resource-efficient way to keep your Lakehouse in sync with source systems.

Ideal for time-series data, logs, and transactional records, this approach ensures your analytics stay fresh, without the cost of full reloads.

Automating Incremental Loads Using Config Files and Notebooks in Microsoft Fabric

While a basic pipeline setup helps you handle incremental loads for a single table, scaling that logic across multiple datasets can become tedious. Microsoft Fabric supports a more modular and automated approach using a combination of Lakehouses, JSON-based config files, notebooks, and dynamic pipeline activities, perfect for large-scale or growing data environments.

This method enables data engineers to maintain a clean, extensible structure that scales with evolving business requirements.

High-Level Workflow

Here’s a breakdown of the scalable incremental load process using Microsoft Fabric Data Factory and notebooks:

1. Use Lakehouse as Both Source and Staging

Rather than relying on an external database, this method demonstrates how to use two Fabric Lakehouses, one as the source and the other as a staging area. This keeps everything within Fabric’s unified environment, optimizing speed and simplifying governance.

2. Create a Config File for Load Control

Instead of hardcoding table names or watermark values, you maintain a JSON config file that tracks the last_updated timestamp for each table. This file is stored in the Lakehouse (not as a table, but as a file) and allows pipelines to loop over multiple tables without manual changes.

Example config.json:

[

  {

    "table_name": "customers",

    "last_updated": "2024-01-04 08:11:07"

  },

  {

    "table_name": "orders",

    "last_updated": "2024-01-04 08:11:07"

  }

]

This approach dramatically reduces maintenance overhead, you simply update the config file to onboard new tables.

3. Pipeline with Dynamic Lookup and Looping

In the Data Factory pipeline:

  • A lookup activity fetches the JSON config.
  • A "For Each" loop iterates through each table entry.
  • Variables like starttime, table_name, and last_updated are dynamically assigned per iteration.

4. Execute a Notebook for Each Table

Within the loop, a notebook activity is invoked with parameters:

  • table – the table name
  • last_updated – previous max watermark
  • max_updated – current max watermark (calculated within the notebook)

Sample notebook logic using PySpark:

delta_table_path = f"abfss://<your_workspace_name>@onelake.dfs.fabric.microsoft.com/<Lakehouse>.Lakehouse/Tables/{table}"

df = spark.read.format("delta").load(delta_table_path)\

     .filter(f"last_updated >= '{last_updated}' AND last_updated < '{max_updated}'")

This filters only the new or updated records between two points in time.

5. Update the Config File Post Load

After successful processing, the pipeline updates the same config.json with the latest last_updated timestamp, ensuring the next run starts from the correct point.

for item in data:

    item["last_updated"] = new_time

This automated approach ensures your incremental loads stay accurate and continuous, no manual fixes required. With a config-driven setup, notebooks, and Fabric’s Lakehouse, teams can scale across datasets without hardcoding or duplication. 

Whether migrating existing workloads or building new ones, this technique delivers the flexibility and automation today’s data platforms need.

Stop wasting resources on full reloads. WaferWire helps you implement CDC, timestamp filters, and Dataflow Gen2 for real-time, efficient data sync in Microsoft Fabric.

Incremental Data Refresh Using Dataflow Gen2 in Microsoft Fabric

Dataflow Gen2 in Microsoft Fabric simplifies incremental loading by automating the detection of new or changed records, no manual tracking of timestamps or watermarks needed. Ideal for large-scale data ingestion and transformation, it streamlines performance and reduces engineering overhead with built-in refresh logic.

What is Incremental Refresh in Dataflow Gen2?

Incremental refresh in Dataflow Gen2 means only new or changed data is processed and loaded, rather than reloading the entire dataset each time. This offers:

  • Faster refreshes by skipping unchanged data
  • Lower compute and memory usage
  • Improved reliability through shorter run times

It’s particularly well-suited for scenarios like transactional systems, time-series data, or large-scale reporting pipelines.

Prerequisites and Supported Destinations

To use this feature, you must:

  • Be on a Microsoft Fabric capacity
  • Have a Date or DateTime column for change tracking
  • Ensure query folding is supported (where logic is pushed to the source system)

Supported data destinations include:

  • Fabric Lakehouse (preview)
  • Fabric Warehouse
  • Azure SQL Database
  • Azure Synapse Analytics

Other destinations can still benefit from this feature if paired with a staging query.

How It Works: Behind the Scenes

Incremental Data Refresh using Dataflow Gen2 in Microsoft Fabric automates the process of detecting and refreshing changed records without manual timestamp tracking. It divides data into buckets based on a DateTime column, processes only modified buckets in parallel, and replaces outdated data while leaving the rest untouched. Key configuration settings include DateTime columns, historical range, bucket size, and change detection logic.

Incremental refresh works by dividing data into buckets using a chosen DateTime column. Here's how the process unfolds:

1. Evaluate Changes

  1. Fabric checks if the max value in the DateTime column has changed since the last refresh.
  2. Only changed “buckets” are marked for processing.

2. Retrieve Data in Parallel

  1. Dataflow fetches data only from changed buckets, improving speed and reducing load.
  2. This retrieval happens in parallel, capped at 10 concurrent evaluations for Lakehouse.

3. Replace Data in Destination

  1. Dataflow replaces the old bucket data with fresh data.
  2. All other historical data outside that bucket remains untouched.

Key Configuration Settings

When setting up incremental refresh, you’ll define:

  • DateTime column: Used to determine changes.
  • Historical range: E.g., load only past 1 month, 3 quarters, etc.
  • Bucket size: Controls how granular each refresh unit is (days, months, etc.)
  • Change detection: Refresh only when the max value in the column changes.
  • Concluded periods (optional): Ensures only completed periods (like full months) are processed.

There are also advanced options, such as requiring the query to fully fold (highly recommended for performance).

Limitations to Note

Some caveats apply when using Dataflow Gen2 with Lakehouse:

  • Max 10 concurrent evaluations for Lakehouse
  • No support for dynamic schema, destination schema must be fixed
  • No support for multiple writers, avoid Spark or parallel writes during refresh
  • Update method = Replace only, partial updates aren’t supported
  • Bucket limits: Max 50 per query, 150 per dataflow

What’s New vs. Gen1?

Dataflow Gen2 makes incremental refresh a first-class feature:

Feature Dataflow Gen1 Dataflow Gen2
Setup Timing After publishing Directly in the editor
Historical Range Required Yes No
Auto Parameter Handling Manual Automated via query folding and filters

Dataflow Gen2 incremental refresh is a code-free, intuitive solution for managing Microsoft Fabric incremental load. Building dashboards, running ML pipelines, or maintaining operational reporting, this feature keeps your data current without overloading systems.

If you’re already using Power BI or Fabric Lakehouse, this approach offers a smart, low-friction way to stay efficient as your data grows.

Using Change Data Capture (CDC) with Microsoft Fabric Pipelines

One of the most reliable ways to implement microsoft fabric incremental load patterns, especially for SQL-based sources, is by leveraging Change Data Capture (CDC). With CDC, you track inserts, updates, and deletes at the source without needing to scan the entire table.

In Microsoft Fabric (via integrated Data Factory pipelines), you can configure a CDC-based pipeline that:

  • Identifies changes in an SQL source using cdc.fn_cdc_get_net_changes_* functions.
  • Passes parameters dynamically (such as tumbling window timeframes).
  • Loads only the changed records into a destination like Fabric Lakehouse or Azure Blob Storage.

Key Steps in CDC-Based Incremental Load

Enable CDC on the source SQL table using:
EXEC sys.sp_cdc_enable_db;

EXEC sys.sp_cdc_enable_table 

    @source_schema = 'dbo', 

    @source_name = 'customers', 

    @supports_net_changes = 1;

1. Set up the pipeline in Data Factory (inside Microsoft Fabric):

  1. Use Lookup activity to detect changes within a time window.
  2. Add If Condition to proceed only if changes exist.
  3. Add a Copy activity to extract delta data and write it to storage.

2. Configure dynamic parameters for window start and end

  • Use the @pipeline().parameters and trigger().outputs.windowStartTime.
  • Store output in time-based partitions like:
    customers/incremental/yyyy/MM/dd/filename.csv

3. Schedule with Tumbling Window Triggers, so the pipeline auto-runs at set intervals, ensuring new changes are regularly ingested with minimal latency.

Even though this was originally an Azure Data Factory setup, Fabric offers a streamlined version of this exact process:

  • No need to leave the workspace for configuring pipelines.
  • CDC works seamlessly with Fabric-native destinations like Lakehouse or Warehouse.
  • With tumbling window triggers and parameterized SQL queries, your loads are both precise and automated.

This method is ideal when you want a reliable and low-latency sync between operational databases and analytical storage without constant full refreshes.

download microsoft favric e-book

Foundational Approaches to Incremental Loading in the Microsoft Ecosystem

Foundational approaches to incremental loading in the Microsoft ecosystem include using watermark columns, SQL Change Tracking, LastModifiedDate for file-based sources, and time-partitioned folders. These methods help load only new or updated data, improving performance. With Microsoft Fabric, these techniques are unified into a single platform with enhanced capabilities, including Delta Lake compatibility, seamless connections, and simplified orchestration.

Before Microsoft Fabric unified tools like Data Factory, Synapse, and Power BI into a single platform, several incremental load techniques were commonly used across Azure services. These strategies still form the core principles behind Microsoft Fabric's incremental load capabilities today.

Here’s a quick overview of the traditional methods that Fabric now builds upon:

1. Watermark Columns

One of the most common patterns involves using a watermark column, such as a timestamp (LastModifiedTime) or incremental ID. This method compares old vs. new values to copy only changed data.

Example Use Cases:

  • Loading delta rows from Azure SQL to Blob Storage.
  • Using a control table to track watermark values.

2. SQL Change Tracking

SQL Server and Azure SQL Database offer a lightweight Change Tracking feature. This enables you to identify rows that were inserted, updated, or deleted without full table scans. Fabric can now ingest from sources using Change Tracking when configured via Data Factory.

3. LastModifiedDate for File-Based Sources

When dealing with file storage (like Azure Blob or Data Lake), you can use the LastModifiedDate of files to detect and load only new or updated files. However, scanning large volumes of files, even if copying just a few, can impact performance.

4. Time-Partitioned Folders

If your source systems organize files in a time-based folder structure (like /2024/07/08/filename.csv), Fabric pipelines can efficiently copy only the most recent partitioned data. This is often the fastest strategy for large-scale file movement.

With the integration of Data Factory into Fabric, all of the above techniques are still relevant, but with improved accessibility. You now configure these methods directly in your Fabric workspace, taking advantage of:

  • Unified UI for pipelines
  • Built-in Delta Lake compatibility
  • Seamless connection to Lakehouse or Warehouse destinations
  • Fabric-native scheduling and orchestration

Microsoft Fabric doesn't reinvent incremental loading, it simplifies, centralizes, and enhances it for modern, scalable analytics.

Make Incremental Loads Effortless with WaferWire

Implementing Microsoft Fabric incremental load techniques requires more than just technical know-how it demands precision, scalability, and alignment with your data goals.

WaferWire simplifies this process.

Our team helps you choose the right approach (CDC, timestamps, or Dataflow Gen2), set up efficient Data Factory pipelines, and optimize Lakehouse-Warehouse sync for real-time insights and lower costs.

As a Microsoft Solutions Partner, WaferWire ensures your incremental load setup is not just effective but enterprise-ready.

Conclusion

As data grows and demands for real-time insights increase, incremental loading becomes essential not just for performance, but for smarter, cost-effective data operations. Microsoft Fabric makes this seamless with built-in support for Change Data Capture, timestamp-based filtering, and Dataflow Gen2’s incremental refresh.

By loading only what’s changed, you cut down on compute, speed up pipelines, and keep your analytics fresh.

Looking to streamline your data architecture on Microsoft Fabric?

WaferWire helps enterprises design and implement modern data pipelines that scale.

Talk to WaferWire today to modernize with confidence.

FAQs

1. What is incremental load in Microsoft Fabric?
Incremental load refers to updating only the data that has changed since the last load, avoiding a full reload of datasets. This improves performance and reduces compute costs.

2. Which Microsoft Fabric components support incremental load?
Data Factory, Dataflow Gen2, Lakehouse, and Warehouse all support incremental loading through various methods like CDC and timestamp filtering.

3. What is Change Data Capture (CDC)?
CDC is a technique that captures inserts, updates, and deletes in a source system, enabling precise and efficient incremental data movement.

4. Can I use incremental refresh with non-Microsoft data sources?

Yes, as long as the source supports query folding and includes a suitable date or timestamp column for filtering changes.

5. What’s the best way to manage multiple tables for incremental load?
Using a config file with table names and last_updated timestamps allows dynamic and scalable orchestration without hardcoding logic for each table.

Need to discuss on

Talk to us today

Subscribe to Our Newsletter

Get instant updates in your email without missing any news

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Empowering digital transformation through innovative IT solutions.

Copyright © 2025 WaferWire Cloud Technologies

Send us a message
We cannot wait to hear from you!
Hey! This is Luna from WaferWire, drop us a message below and we will get back to you asap :)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.