Skip to content

Delta Loader Overview

The DeltaLoader is a powerful tool for performing incremental data loads on Delta tables. It provides multiple strategies for reading data incrementally, ensuring efficient data processing while maintaining complete audit trails through automated metadata management.

Key Capabilities

  • Multiple strategies: Choose between CDF and timestamp-based loading
  • Automatic metadata tracking: Seamless continuation across load operations
  • Flexible configuration: Customize behavior for different use cases

How Delta Loading Works

Delta loading enables you to process only the data that has changed since your last load; this dramatically improves performance and reduces resource consumption for large datasets.

Core Workflow

  1. Configure: Choose your loading strategy and set options
  2. Create: Use DeltaLoaderFactory to instantiate the appropriate loader
  3. Load: Call get_data() to retrieve incremental data
  4. Process: Apply transformations and business logic
  5. Write: Update your target tables
  6. Commit: Mark the load as processed to update metadata
Component Purpose
DeltaLoadOption Configuration object defining strategy and options
DeltaLoaderFactory Factory for creating appropriate loader instances
DeltaLoader Base interface for all loading strategies
Metadata Table Tracks loading progress and state
  • Performance: Process only changed data
  • Reliability: Automatic progress tracking
  • Scalability: Handle large datasets efficiently

Available Strategies

Choose the right delta loading strategy based on your data characteristics and requirements.

Best for: Tables with frequent updates, deletes, and complex change patterns

The DeltaCDFLoader leverages Delta Lake's Change Data Feed to capture all table changes at the transaction level.

Key Features:

  • Tracks change types INSERT & UPDATE
  • Version-based progress tracking
  • Automatic deduplication support
  • Handles complex change scenarios

Requirements:

ALTER TABLE your_table SET TBLPROPERTIES (delta.enableChangeDataFeed = true)

When to Use CDF

  • Data undergoes frequent updates
  • You need to capture all change types
  • Source table supports Change Data Feed
  • Complex merge operations are required

Learn More →

Best for: Time-series data and append-only scenarios

The DeltaTimestampLoader filters data based on timestamp columns to identify new records.

Key Features:

  • Timestamp-based filtering
  • Ideal for time-series data
  • Simple append operations
  • Custom time range support

Requirements:

  • One or more timestamp columns in your data
  • Timestamps should be monotonically increasing for new records

When to Use Timestamp

  • Working with time-series data
  • Append-only data patterns
  • No updates to historical records
  • Simple incremental processing needs

Learn More →

Reference Documentation

For detailed API documentation, see the Delta Loader Reference.

Additional Resources