Index
            Pipeline
¶
    
              Bases: LoggerMixin
A Pipeline represents the logical unit of one ETL process.
This class manages a directed acyclic graph (DAG) of steps, ensuring that each step is executed in the correct order based on dependencies.
Attributes:
| Name | Type | Description | 
|---|---|---|
| name | str | The name of the pipeline. | 
| steps | OrderedDict[str, PipelineStep] | An ordered dictionary of PipelineSteps that are part of the pipeline. | 
Source code in src/cloe_nessy/pipeline/pipeline.py
                | 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |  | 
            graph
  
      property
  
¶
    Get the pipeline graph.
            _create_graph()
¶
    Creates a directed acyclic graph (DAG) representing the pipeline steps and their dependencies.
Each node in the graph represents a single step in the pipeline, and each edge represents a dependency.
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            _get_ready_to_run_steps(remaining_steps, g)
¶
    Identifies and returns the steps that are ready to run.
This method checks the directed acyclic graph (DAG) to find steps that have no predecessors, indicating that they are ready to be executed. It logs the remaining steps and the steps that are ready to run.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| remaining_steps | list[str] | A list of step IDs that are yet to be executed. | required | 
| g | DiGraph | The directed acyclic graph representing the pipeline. | required | 
Returns:
| Type | Description | 
|---|---|
| set[str] | A set of step IDs that are ready to be executed. | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            _handle_completed_tasks(futures, g, remaining_steps)
¶
    Handles the completion of tasks in the pipeline.
This method processes the futures that have completed execution. It removes the corresponding steps from the directed acyclic graph (DAG) and checks if new steps are ready to run. If new steps are ready, it returns True to indicate that the pipeline can continue execution.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| futures | A dictionary mapping futures to their corresponding steps. | required | |
| g | The directed acyclic graph representing the pipeline. | required | |
| remaining_steps | A list of steps that are yet to be executed. | required | 
Returns:
| Type | Description | 
|---|---|
| True if new steps are ready to run, False otherwise. | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            _run_step(step_name)
¶
    Executes the run method of the corresponding step in the pipeline.
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            _submit_ready_steps(ready_to_run, remaining_steps, executor, futures)
¶
    Submits the ready-to-run steps to the executor for execution.
This method takes the steps that are ready to run, removes them from the list of remaining steps, and submits them to the executor for concurrent execution. It also updates the futures dictionary to keep track of the submitted tasks.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| ready_to_run | set[str] | A set of steps that are ready to be executed. | required | 
| remaining_steps | list[str] | A list of steps that are yet to be executed. | required | 
| executor | ThreadPoolExecutor | The executor that manages the concurrent execution of steps. | required | 
| futures | dict | A dictionary mapping futures to their corresponding step ID. | required | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            _trim_graph(graph, until)
¶
    Trims the pipeline graph to only include steps up to the specified 'until' step (excluding).
This method first verifies that the given step exists in the graph. It then finds all ancestors (i.e., steps that precede the 'until' step) and creates a subgraph consisting solely of those steps.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| graph | The complete directed acyclic graph representing the pipeline. | required | |
| until | The identifier of the step up to which the graph should be trimmed. | required | 
Returns:
| Type | Description | 
|---|---|
| A subgraph containing only the steps leading to (and including) the 'until' step. | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If the specified 'until' step is not found in the graph. | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            plot_graph(save_path=None)
¶
    Generates a visual representation of the pipeline steps and their dependencies.
This method uses the PipelinePlottingService to create a plot of the pipeline graph. If a save path is specified, the plot will be saved to that location; otherwise, it will be displayed interactively.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| save_path | str | None | Optional; the file path where the plot should be saved. If None, the plot is displayed interactively. | None | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            run(until=None)
¶
    Executes the pipeline steps in the correct order based on dependencies.
This method creates a directed acyclic graph (DAG) of the pipeline steps and, if specified, trims the graph to only include steps up to the given 'until' step (excluding: the step specified as 'until' will not be executed). It then concurrently executes steps with no pending dependencies using a ThreadPoolExecutor, ensuring that all steps are run in order. If a cyclic dependency is detected, or if any step fails during execution, the method raises an error.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| until | str | None | Optional; the identifier of the step up to which the pipeline should be executed. | None | 
Raises:
| Type | Description | 
|---|---|
| RuntimeError | If a cyclic dependency is detected. | 
| Exception | Propagates any error raised during the execution of a step. | 
Source code in src/cloe_nessy/pipeline/pipeline.py
              
            PipelineAction
¶
    
              Bases: ABC, LoggerMixin
Models the operation being executed against an Input.
Attributes:
| Name | Type | Description | 
|---|---|---|
| name | str | The name of the action. | 
Source code in src/cloe_nessy/pipeline/pipeline_action.py
                
            __init__(tabular_logger=None)
¶
    Initializes the PipelineAction object.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| tabular_logger | Logger | None | The tabular logger to use for dependency injection. | None | 
Source code in src/cloe_nessy/pipeline/pipeline_action.py
              
            PipelineContext
¶
    A class that models the context of a pipeline.
The context consists of Table Metadata (the Table definition) and the actual data as a DataFrame.
Attributes:
| Name | Type | Description | 
|---|---|---|
| table_metadata | The Nessy-Table definition. | |
| data | The data of the context. | |
| runtime_info | Additional runtime information, e.g. streaming status. | |
| status | The status of the context. Can be "initialized", "successful" or "failed". | 
Note
This is not a pydantic class, because Fabric does not support the type ConnectDataFrame.
Source code in src/cloe_nessy/pipeline/pipeline_context.py
                
            from_existing(table_metadata=None, data=None, runtime_info=None)
¶
    Creates a new PipelineContext from an existing one.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| table_metadata | Table | None | The metadata of the new context. | None | 
| data | DataFrame | None | The data of the new context. | None | 
| runtime_info | dict[str, Any] | None | The runtime_info of the new context. | None | 
Returns:
| Type | Description | 
|---|---|
| PipelineContext | The new PipelineContext. | 
Source code in src/cloe_nessy/pipeline/pipeline_context.py
              
            PipelineParsingService
¶
    A service class that parses a YAML document or string into a Pipeline object.
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
                | 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |  | 
            _replace_variables(yaml_str)
  
      staticmethod
  
¶
    Replace variable placeholders in a YAML string.
Replaces environment variables with the pattern {{env:var-name}}. Where
the var-name is the name of the environment variable. Replaces secret
references with the pattern {{secret-scope-name:secret-key}}. Where
scope-name is the name of the secret scope and secret-key is the key of
the secret.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| yaml_str | str | A string that can be parsed in YAML format. | required | 
Returns:
| Type | Description | 
|---|---|
| str | The same YAML string with environment variable placeholders replaced. | 
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
              
            parse(path=None, yaml_str=None)
  
      staticmethod
  
¶
    Reads the YAML from a given Path and returns a Pipeline object.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| path | Path | None | Path to the YAML document. | None | 
| yaml_str | str | None | A string that can be parsed in YAML format. | None | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If neither 'path' nor 'yaml_str' has been provided. | 
Returns:
| Name | Type | Description | 
|---|---|---|
| Pipeline | Pipeline | The resulting Pipeline instance. | 
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
              
            register_pipeline_action(pipeline_action_class)
  
      staticmethod
  
¶
    Registers a custom pipeline action class.
Note
Registering an action enables the custom action to be used in the pipeline YAML definition. This is automatically called, when the PipelineParsingService is instantiated with (a list of) custom actions.
Source code in src/cloe_nessy/pipeline/pipeline_parsing_service.py
              
            PipelineStep
  
      dataclass
  
¶
    A PipelineStep is a logical step within a Pipeline.
The step stores the PipelineContext and offers an interface to interact with the Steps DataFrame.
Attributes:
| Name | Type | Description | 
|---|---|---|
| name | str | The name of the step. | 
| action | PipelineAction | The action to be executed. | 
| is_successor | PipelineAction | A boolean indicating if the step is a successor and takes the previous steps context. | 
| context | PipelineContext | The context of the step. | 
| options | dict[str, Any] | Additional options for the step | 
| _predecessors | set[str] | A list of names of the steps that are predecessors to this step. | 
| _context_ref | str | None | Reference to the previous steps context | 
| _table_metadata_ref | str | None | Reference to the previous steps metadata |