read_api
            ReadAPIAction
¶
    
              Bases: PipelineAction
Reads data from an API and loads it into a Spark DataFrame.
This method uses the provided API parameters to make a request using the
APIReader and return a
DataFrame containing the response data.
Example
Read API:
    action: READ_API
    options:
        base_url: https://some_url.com/api/
        endpoint: my/endpoint/
        method: GET
        timeout: 90
        auth:
            - type: basic
              username: my_username
              password: my_password
            - type: secret_scope
              secret_scope: my_secret_scope
              header_template:
                "header_key_1": "<ENVIRONMENT_VARIABLE_NAME>"
            - type: secret_scope
              secret_scope: my_secret_scope
              header_template:
                "header_key_2": "<SECRET_NAME>"
            - type: secret_scope
              secret_scope: my_other_secret_scope
              header_template:
                "header_key_3": "<SECRET_NAME>"
            - type: azure_oauth
              client_id: my_client_id
              client_secret: my_client_secret
              tenant_id: my_tenant_id
              scope: <entra-id-client-id>
The above example will combine the headers from the different auth types. The resulting header will look like this:
Secret information
Don't write sensitive information like passwords or tokens directly in the pipeline configuration. Use secret scopes or environment variables instead.
Source code in src/cloe_nessy/pipeline/actions/read_api.py
                | 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |  | 
            run(context, *, base_url=None, auth=None, default_headers=None, endpoint='', method='GET', key=None, timeout=30, params=None, headers=None, data=None, json=None, max_retries=0, options=None, **_)
  
      staticmethod
  
¶
    Utility class for reading an API into a DataFrame.
This class uses an APIClient to fetch data from an API and load it into a Spark DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| context | PipelineContext | The pipeline context containing information about the pipeline. | required | 
| base_url | str | None | The base URL for the API to be called. | None | 
| auth | AuthBase | dict[str, str] | None | The authentication credentials for the API. | None | 
| default_headers | dict[str, str] | None | Default headers to include in the API request. | None | 
| endpoint | str | The specific API endpoint to call. | '' | 
| method | str | The HTTP method to use for the request (default is "GET"). | 'GET' | 
| key | str | None | Key for accessing specific data in the response. | None | 
| timeout | int | Timeout for the API request in seconds (default is 30). | 30 | 
| params | dict[str, str] | None | URL parameters to include in the API request. | None | 
| headers | dict[str, str] | None | Additional headers to include in the request. | None | 
| data | dict[str, str] | None | Data to send with the request for POST methods. | None | 
| json | dict[str, str] | None | JSON data to send with the request for POST methods. | None | 
| max_retries | int | Maximum number of retries for the API request (default is 0). | 0 | 
| options | dict[str, str] | None | Additional options for the API request. | None | 
Returns:
| Type | Description | 
|---|---|
| PipelineContext | The updated pipeline context containing the DataFrame with the API response data. | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If the base_url is not specified. | 
Source code in src/cloe_nessy/pipeline/actions/read_api.py
              
            process_auth(auth)
¶
    Processes the auth parameter to create an AuthBase object.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| auth | Mapping[str, str | Mapping[str, str] | list[Mapping[str, str]]] | AuthBase | None | The auth parameter to be processed. | required |