file_reader
            FileReader
¶
    
              Bases: BaseReader
Utility class for reading a file into a DataFrame.
This class reads data from files and loads it into a Spark DataFrame.
Source code in src/cloe_nessy/integration/reader/file_reader.py
                | 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |  | 
            __init__()
¶
    
            _add_metadata_column(df)
¶
    Add all metadata columns to the DataFrame.
Source code in src/cloe_nessy/integration/reader/file_reader.py
              
            _get_reader()
¶
    
            _get_stream_reader()
¶
    
            read(location, *, spark_format=None, extension=None, schema=None, search_subdirs=True, options=None, add_metadata_column=False, delta_load_options=None, **kwargs)
¶
    Reads files from a specified location and returns a DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| location | str | Location of files to read. | required | 
| spark_format | str | None | Format of files to read. If not provided, it will be inferred from the extension. | None | 
| extension | str | None | File extension (csv, json, parquet, txt). Used if spark_format is not provided. | None | 
| schema | str | None | Schema of the file. If None, schema will be inferred. | None | 
| search_subdirs | bool | Whether to include files in subdirectories. | True | 
| options | dict | None | Spark DataFrame reader options. | None | 
| add_metadata_column | bool | Whether to include __metadata column in the DataFrame. | False | 
| delta_load_options | DeltaLoadOptions | None | Options for delta loading, if applicable. When provided and spark_format is 'delta', uses delta loader for incremental loading of Delta Lake tables. | None | 
| **kwargs | Any | Additional keyword arguments to maintain compatibility with the base class method. | {} | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If neither spark_format nor extension is provided. | 
| ValueError | If the provided extension is not supported. | 
| Exception | If there is an error while reading the files. | 
Note
- The spark_formatparameter is used to specify the format of the files to be read.
- If spark_formatis not provided, the method will try to infer it from theextension.
- The extensionparameter is used to specify the file extension (e.g., 'csv', 'json', etc.).
- If both spark_formatandextensionare provided,spark_formatwill take precedence.
- The method will raise an error if neither spark_formatnorextensionis provided.
Returns:
| Type | Description | 
|---|---|
| DataFrame | A DataFrame containing the data from the files. | 
Source code in src/cloe_nessy/integration/reader/file_reader.py
              | 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |  | 
            read_stream(location='', schema=None, format='delta', add_metadata_column=False, options=None, **_)
¶
    Reads specified location as a stream and returns streaming DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| location | Location of files to read. | '' | |
| format | str | Format of files to read. | 'delta' | 
| schema | StructType | str | None | Schema of the file. | None | 
| add_metadata_column | bool | Whether to include __metadata column in the DataFrame. | False | 
| options | dict[str, Any] | None | Spark DataFrame reader options. | None | 
Raises:
| Type | Description | 
|---|---|
| ValueError | If location is not provided. | 
Returns:
| Type | Description | 
|---|---|
| DataFrame | A Streaming DataFrame |