DataFrameHandler

class aito.sdk.data_frame_handler.DataFrameHandler

Bases: object

Pandas DataFrame handler

allowed_format = ['csv', 'json', 'excel', 'ndjson']
static apply_functions_on_df(df: pandas.DataFrame, functions: List[Callable]) → pandas.DataFrame

Applying partial functions to a dataframe

Parameters
  • df (pd.DataFrame) – input pandas DataFrame

  • functions (List[Callable]) – list of partial functions that will be applied to the loaded pd.DataFrame

Returns

output DataFrame

Return type

pd.DataFrame

convert_df_using_aito_table_schema(df: pandas.DataFrame, table_schema: Dict) → pandas.DataFrame

Convert a pandas DataFrame to match a given Aito table schema

Parameters
  • df (pd.DataFrame) – input pandas DataFrame

  • table_schema (Dict) – input table schema

Raises
  • ValueError – input table schema is invalid

  • e – failed to convert

Returns

converted DataFrame

Return type

pd.DataFrame

convert_file(read_input: Union[str, pathlib.Path, IO], write_output: Union[str, pathlib.Path, IO], in_format: str, out_format: str, read_options: Dict = None, convert_options: Dict = None, apply_functions: List[Callable[..., pandas.DataFrame]] = None, use_table_schema: Dict = None) → pandas.DataFrame

Converting input file to expected format, generate or use Aito table schema if specified

Parameters
  • read_input (FilePathOrBuffer) – read input

  • write_output (FilePathOrBuffer) – write output

  • in_format (str) – input format

  • out_format (str) – output format

  • read_options (Dict, optional) – dictionary contains arguments for pandas read function, defaults to None

  • convert_options (Dict, optional) – dictionary contains arguments for pandas write function, defaults to None

  • apply_functions (List[Callable[.., pd.DataFrame]], optional) – list of partial functions that will be applied to the loaded pd.DataFrame, defaults to None

  • use_table_schema (Dict, optional) – use an aito schema to dictates data types and convert the data, defaults to None

Returns

converted DataFrame

Return type

pd.DataFrame

static datetime_to_string(df: pandas.DataFrame) → pandas.DataFrame

Convert pandas datetime type to string

Parameters

df (pd.DataFrame) – input pandas DataFrame

Returns

converted pandas DataFrame

Return type

pd.DataFrame

df_to_format(df: pandas.DataFrame, out_format: str, write_output: Union[str, pathlib.Path, IO], convert_options: Dict = None)

Write a Pandas DataFrame

Parameters
  • df (pd.DataFrame) – input DataFrame

  • out_format (str) – output format

  • write_output (FilePathOrBuffer) – write output

  • convert_options (Dict, optional) – dictionary contains arguments for pandas write function, defaults to None

read_file_to_df(read_input: Union[str, pathlib.Path, IO], in_format: str, read_options: Dict = None) → pandas.DataFrame

Read input to a Pandas DataFrame

Parameters
  • read_input (FilePathOrBuffer) – read input

  • in_format (str) – input format

  • read_options (Dict, optional) – dictionary contains arguments for pandas read function, defaults to None

Returns

read DataFrame

Return type

pd.DataFrame

validate_in_out_format(in_format: str, out_format: str)

Validate the file parameters of the converter

Parameters
  • in_format (str) – input format

  • out_format (str) – output format

Raises

ValueError – Unexpected input format or output format