read_excel
            ReadExcelAction
¶
    
              Bases: PipelineAction
Reads data from an Excel file or directory of Excel files and returns a DataFrame.
The function reads Excel files using the
ExcelDataFrameReader either
from a single file or a directory path. It can read specific sheets, handle
file extensions, and offers various options to customize how the data is
read, such as specifying headers, index columns, and handling missing
values. The resulting data is returned as a DataFrame, and metadata about
the read files can be included in the context.
Example
More Options
The READ_EXCEL action supports additional options that can be passed to the
run method. For more information, refer to the method documentation.
Source code in src/cloe_nessy/pipeline/actions/read_excel.py
                | 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |  | 
            run(context, *, file=None, path=None, extension='xlsx', recursive=False, sheet_name=0, sheet_name_as_column=False, header=0, index_col=None, usecols=None, dtype=None, fillna=None, true_values=None, false_values=None, nrows=None, na_values=None, keep_default_na=True, parse_dates=False, date_parser=None, thousands=None, include_index=False, options=None, add_metadata_column=True, load_as_strings=False, **_)
¶
    Reads data from an Excel file or directory of Excel files and returns a DataFrame.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| context | PipelineContext | The context in which the action is executed. | required | 
| file | str | None | The path to a single Excel file. Either  | None | 
| path | str | None | The directory path containing multiple Excel files. Either  | None | 
| extension | str | The file extension to look for when reading from a directory. | 'xlsx' | 
| recursive | bool | Whether to include subdirectories when reading from a directory path. | False | 
| sheet_name | str | int | list | The sheet name(s) or index(es) to read from the Excel file. | 0 | 
| sheet_name_as_column | bool | Whether to add a column with the sheet name to the DataFrame. | False | 
| header | int | list[int] | Row number(s) to use as the column labels. | 0 | 
| index_col | int | list[int] | None | Column(s) to use as the index of the DataFrame. | None | 
| usecols | int | str | list | Callable | None | Subset of columns to parse. Can be an integer, string, list, or function. | None | 
| dtype | str | None | Data type for the columns. | None | 
| fillna | str | dict[str, list[str]] | dict[str, str] | None | Method or value to use to fill NaN values. | None | 
| true_values | list | None | Values to consider as True. | None | 
| false_values | list | None | Values to consider as False. | None | 
| nrows | int | None | Number of rows to parse. | None | 
| na_values | list[str] | dict[str, list[str]] | None | Additional strings to recognize as NaN/NA. | None | 
| keep_default_na | bool | Whether to append default NaN values when custom  | True | 
| parse_dates | bool | list | dict | Options for parsing date columns. | False | 
| date_parser | Callable | None | Function to use for converting strings to datetime objects. | None | 
| thousands | str | None | Thousands separator to use when parsing numeric columns. | None | 
| include_index | bool | Whether to include an index column in the output DataFrame. | False | 
| options | dict | None | Additional options to pass to the DataFrame reader. | None | 
| add_metadata_column | bool | Whether to add a metadata column with file information to the DataFrame. | True | 
| load_as_strings | bool | Whether to load all columns as strings. | False | 
Raises:
| Type | Description | 
|---|---|
| ValueError | Raised if both  | 
Returns:
| Type | Description | 
|---|---|
| PipelineContext | The updated context, with the read data as a DataFrame. | 
Source code in src/cloe_nessy/pipeline/actions/read_excel.py
              | 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |  |