transform_json_normalize
            TransformJsonNormalize
¶
    
              Bases: PipelineAction
Normalizes and flattens the DataFrame by exploding array columns and flattening struct columns.
The method performs recursive normalization on the DataFrame present in the context, ensuring that the order of columns is retained and new columns created by flattening structs are appended after existing columns.
Example
Example Input Data:| id | name | coordinates | attributes | 
|---|---|---|---|
| 1 | Alice | [10.0, 20.0] | {"age": 30, "city": "NY"} | 
| 2 | Bob | [30.0, 40.0] | {"age": 25, "city": "LA"} | 
Example Output Data:
| id | name | coordinates | attributes_age | attributes_city | 
|---|---|---|---|---|
| 1 | Alice | [10.0, 20.0] | 30 | NY | 
| 2 | Bob | [30.0, 40.0] | 25 | LA | 
Source code in src/cloe_nessy/pipeline/actions/transform_json_normalize.py
                | 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |  | 
            _normalize(df, exclude_columns)
  
      staticmethod
  
¶
    Recursively normalizes the given DataFrame by exploding arrays and flattening structs.
This method performs two primary operations: 1. Explodes any array columns, unless they are in the list of excluded columns. 2. Flattens any struct columns, renaming nested fields and appending them to the top-level DataFrame.
The method continues these operations in a loop until there are no array or struct columns left.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| df | The input DataFrame to normalize. | required | |
| exclude_columns | A list of column names to exclude from the normalization process. These columns will not be exploded or flattened. | required | 
Returns:
| Type | Description | 
|---|---|
| pyspark.sql.DataFrame: The normalized DataFrame with no array or struct columns. | 
Source code in src/cloe_nessy/pipeline/actions/transform_json_normalize.py
              
            run(context, *, exclude_columns=None, **_)
¶
    Executes the normalization process on the DataFrame present in the context.
Please note that columns retain their relative order during the normalization process, and new columns created by flattening structs are appended after the existing columns.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| context | PipelineContext | The pipeline context that contains the DataFrame to be normalized. | required | 
| exclude_columns | list[str] | None | A list of column names to exclude from the normalization process. These columns will not be exploded or flattened. | None | 
| **_ | Any | Additional keyword arguments (not used). | {} | 
Returns:
| Type | Description | 
|---|---|
| PipelineContext | A new pipeline context with the normalized DataFrame. | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If the DataFrame in the context is  |