transform_hash_columns
            HashConfig
¶
    
              Bases: BaseModel
A configuration model for defining hash settings for specific columns.
Attributes:
| Name | Type | Description | 
|---|---|---|
| hash_config | dict[str, HashSettings] | A dictionary where the keys are column names
(as strings) and the values are  | 
Methods:
| Name | Description | 
|---|---|
| validate_config | Validates the hash configuration to ensure it contains
at least one entry and that all column names are valid strings. Raises a
 | 
Source code in src/cloe_nessy/pipeline/actions/transform_hash_columns.py
                
            validate_config(values)
¶
    Validates the hash configuration provided in the model.
This method is executed in "before" mode to ensure that the hash_config
field in the input values meets the required criteria:
- It must be a dictionary.
- It must contain at least one entry.
- Each key in the dictionary must be a non-empty string.
Raises:
| Type | Description | 
|---|---|
| ValueError | If  | 
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| cls | The class to which this validator is applied. | required | |
| values | The input values to validate. | required | 
Returns:
| Type | Description | 
|---|---|
| The validated input values. | 
Source code in src/cloe_nessy/pipeline/actions/transform_hash_columns.py
              
            HashSettings
¶
    
              Bases: BaseModel
Represents the settings for hashing columns.
Attributes:
| Name | Type | Description | 
|---|---|---|
| columns | list[str] | List of column names to hash. | 
| algorithm | str | Hashing algorithm to use. Must be one of "hash", "md5", "sha1", "sha2", "xxhash64", or "crc32". | 
| bits | int | None | Bit length for the 'sha2' algorithm. Optional. | 
Source code in src/cloe_nessy/pipeline/actions/transform_hash_columns.py
                
            validate_all(values)
¶
    Validates the input values for a hashing operation before model instantiation.
This method performs the following checks:
- Ensures the specified hashing algorithm is supported.
- Validates that at least one column is provided and that the columns parameter is a non-empty list.
- Checks that hashing multiple columns is only supported for the 'hash' and 'xxhash64' algorithms.
- For the 'sha2' algorithm, ensures that the 'bits' parameter is one of the valid options.
- Ensures that the 'bits' parameter is not provided for algorithms other than 'sha2'.
Raises:
| Type | Description | 
|---|---|
| ValueError | If the algorithm is unsupported, no columns are provided, the columns parameter is invalid, or the 'bits' parameter is invalid for the specified algorithm. | 
| NotImplementedError | If multiple columns are provided and the algorithm does not support hashing multiple columns. | 
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| cls | The class being validated. | required | |
| values | A dictionary of input values containing 'algorithm', 'columns', and 'bits'. | required | 
Returns:
| Type | Description | 
|---|---|
| The validated input values. | 
Source code in src/cloe_nessy/pipeline/actions/transform_hash_columns.py
              
            TransformHashColumnsAction
¶
    
              Bases: PipelineAction
Hashes specified columns in a DataFrame using a chosen algorithm.
Given the following hash_config:
Example
Given a DataFrame df with the following structure:
| column1 | column2 | column3 | 
|---|---|---|
| foo | bar | baz | 
After running the action, the resulting DataFrame will look like:
| column1 | column2 | column3 | hashed_column1 | hashed_column2 | 
|---|---|---|---|---|
| foo | bar | baz | 17725b837e9c896e7123b142eb980131dcc0baa6160db45d4adfdb21 | 1670361220 | 
Hash values might vary
The actual hash values will depend on the hashing algorithm used and the input data.
Source code in src/cloe_nessy/pipeline/actions/transform_hash_columns.py
                | 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |  | 
            run(context, *, hash_config=None, **_)
¶
    Hashes the specified columns in the DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| context | PipelineContext | Context in which this Action is executed. | required | 
| hash_config | HashConfig | None | Dictionary that contains the configuration for executing the hashing. | None | 
Returns:
| Type | Description | 
|---|---|
| PipelineContext | Updated PipelineContext with hashed columns. | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If columns are missing, data is None, or algorithm/bits are invalid. | 
| ValueError | If the hash configuration is invalid. |