api_reader
            APIReader
¶
    
              Bases: BaseReader
Utility class for reading an API into a DataFrame.
This class uses an APIClient to fetch data from an API and load it into a Spark DataFrame.
Attributes:
| Name | Type | Description | 
|---|---|---|
| api_client | The client for making API requests. | 
Source code in src/cloe_nessy/integration/reader/api_reader.py
                | 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |  | 
            __init__(base_url, auth, default_headers=None)
¶
    Initializes the APIReader object.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| base_url | The base URL for the API. | required | |
| auth | AuthBase | None | The authentication method for the API. | required | 
| default_headers | dict[str, str] | None | Default headers to include in requests. | None | 
Source code in src/cloe_nessy/integration/reader/api_reader.py
              
            _add_metadata_column(df, response)
¶
    Adds a metadata column to a DataFrame.
This method appends a column named __metadata to the given DataFrame, containing a map
of metadata related to an API response. The metadata includes the current timestamp,
the base URL of the API, the URL of the request, the HTTP status code, the reason phrase,
and the elapsed time of the request in seconds.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| df | DataFrame | The DataFrame to which the metadata column will be added. | required | 
| response | APIResponse | The API response object containing the metadata to be added. | required | 
Returns:
| Name | Type | Description | 
|---|---|---|
| DataFrame | The original DataFrame with an added  | 
Source code in src/cloe_nessy/integration/reader/api_reader.py
              
            read(*, endpoint='', method='GET', key=None, timeout=30, params=None, headers=None, data=None, json_body=None, max_retries=0, options=None, add_metadata_column=False, **kwargs)
¶
    Reads data from an API endpoint and returns it as a DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| endpoint | str | The endpoint to send the request to. | '' | 
| method | str | The HTTP method to use for the request. | 'GET' | 
| key | str | None | The key to extract from the JSON response. | None | 
| timeout | int | The timeout for the request in seconds. | 30 | 
| params | dict[str, str] | None | The query parameters for the request. | None | 
| headers | dict[str, str] | None | The headers to include in the request. | None | 
| data | dict[str, str] | None | The form data to include in the request. | None | 
| json_body | dict[str, str] | None | The JSON data to include in the request. | None | 
| max_retries | int | The maximum number of retries for the request. | 0 | 
| options | dict[str, str] | None | Additional options for the createDataFrame function. | None | 
| add_metadata_column | bool | If set, adds a __metadata column containing metadata about the API response. | False | 
| **kwargs | Any | Additional keyword arguments to maintain compatibility with the base class method. | {} | 
Returns:
| Name | Type | Description | 
|---|---|---|
| DataFrame | DataFrame | The Spark DataFrame containing the read data in the json_object column. | 
Raises:
| Type | Description | 
|---|---|
| RuntimeError | If there is an error with the API request or reading the data. |