Quickstart

This section explains how to upload data to Aito with either CLI or Python SDK.

Essentially, uploading data into Aito can be broken down into the following steps:

  1. Infer a Table Schema cli | sdk

  2. Change the inferred schema if needed cli | sdk

  3. Create a table cli | sdk

  4. Convert the data cli | sdk

  5. Upload the data cli | sdk

Note

Skip steps 1, 2, and 3 if you upload data to an existing table Skip step 4 if you already have the data in the appropriate format for uploading or the data matches the table schema

If you don’t have a data file, you can download our example file and follow the guide.

Upload Data with the CLI

Note

You can use the Quick Add Table Operation instead of doing upload step-by-step if you want to upload to a new table and don’t think you need to adjust the inferred schema.

The CLI supports all steps needed to upload data:

Infer a Table Schema

For examples, infer a table schema from a csv file:

$ aito infer-table-schema csv < path/to/myCSVFile.csv > path/to/inferredSchema.json

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add an Analyzer to a Text column. In that case, just make changes to the inferred schema JSON file.

The example below use jq to change the id column type:

$ jq '.columns.id.type = "String"' < path/to/schemaFile.json > path/to/updatedSchemaFile.json

Create a Table

You need a table name and a table schema to create a table:

$ aito database create-table tableName path/to/tableSchema.json

Convert the Data

If you made changes to the inferred schema or have an existing schema, use the schema when with the -s flag to make sure that the converted data matches the schema:

$ aito convert csv -s path/to/updatedSchema.json path/to/myCSVFile.csv > path/to/myConvertedFile.ndjson

You can either convert the data to:

  • A list of entries in JSON format for Batch Upload:

    $ aito convert csv --json path/to/myCSVFile.csv > path/to/myConvertedFile.json
    
  • A NDJSON file for File Upload:

    $ aito convert csv < path/to/myFile.csv > path/to/myConvertedFile.ndjson
    

    Remember to gzip the NDJSON file:

    $ gzip path/to/myConvertedFile.ndjson
    

Upload the Data

You can upload data with the CLI by using the database command.

First, Set Up Aito Credentials. The easiest way is by using the environment variables:

$ export AITO_INSTANCE_NAME=your-instance-name
$ export AITO_API_KEY=your-api-key

You can then upload the data by either:

  • Batch Upload:

    $ aito database upload-batch tableName < tableEntries.json
    
  • File Upload:

    $ aito database upload-file tableName tableEntries.ndjson.gz
    

Upload Data with the SDK

The Aito Python SDK uses Pandas DataFrame for multiple operations.

The example below show how you can load a csv file into a DataFrame, please read the official guide for further instructions.

import pandas as pd

reddit_df = pd.read_csv('reddit_sample.csv')

Infer a Table Schema

The SchemaHandler can infer table schema from a DataFrame:

from aito.utils.schema_handler import SchemaHandler
schema_handler = SchemaHandler()
inferred_schema = schema_handler.infer_table_schema_from_pandas_data_frame(data_frame)

Change the Schema

You might want to change the ColumnType, e.g: The id column should be of type String instead of Int, or add a Analyzer to a Text column.

The return inferred schema from SchemaHandler is a Python Dictionary Object and hence, can be updated by updating the value:

inferred_schema['columns']['id']['type'] = 'String'

Create a Table

The AitoClient can create a table using a table name and a table schema:

from aito.utils.aito_client import AitoClient
table_schema = {
  "type": "table",
  "columns": {
    "id": { "type": "Int" },
    "name": { "type": "String" },
    "price": { "type": "Decimal" },
    "description": { "type": "Text", "analyzer": "English" }
  }
}
aito_client = AitoClient(instance_name='your_aito_instance_name', api_key='your_rw_api_key')
aito_client.put_table_schema(table_name='your-table-name', table_schema=table_schema)

Convert the Data

The DataFrameHandler can convert a DataFrame to match an existing schema:

converted_data_frame = data_frame_handler.convert_df_from_aito_table_schema(
  df=data_frame,
  table_schema=table_schema_content
)

A DataFrame can be converted to:

  • A list of entries in JSON format for Batch Upload:

    entries = data_frame.to_dict(orient="records")
    
  • A gzipped NDJSON file for File Upload using the DataFrameHandler:

    from aito.utils.data_frame_handler import DataFrameHandler
    data_frame_handler = DataFrameHandler()
    data_frame_handler.df_to_format(
      df=data_frame,
      out_format='ndjson',
      write_output='path/to/myConvertedFile.ndjson.gz',
      convert_options={'compression': 'gzip'}
    )
    

Upload the Data

The AitoClient can upload the data with either Batch Upload or File Upload:

from aito.utils.aito_client import AitoClient
aito_client = AitoClient(instance_name="your_aito_instance_name", api_key="your_rw_api_key")

# Batch upload
aito_client.populate_table_entries(table_name='reddit', entries=entries)

# File Upload

with file_path.open(mode='rb') as in_f:
  aito_client.populate_table_by_file_upload(table_name='table_name', binary_file_object=in_f)