Preparer Chaining Example

Example of chaining multiple preparers together with auto processing using the ChunkText and SummarizeText operations in AI Accelerator.

Create the first Preparer to chunk text

-- Create source test table
CREATE TABLE source_table__1321
(
    id      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    content TEXT NOT NULL
);

SELECT aidb.create_table_preparer(
    name => 'chunking_preparer__1321',
    operation => 'ChunkText',
    source_table => 'source_table__1321',
    source_key_column => 'id',
    source_data_column => 'content',
    destination_table => 'chunked_data__1321',
    destination_data_column => 'chunk',
    destination_key_column => 'id',
    options => '{"desired_length": 1, "max_length": 1000}'::JSONB  -- Configuration for the ChunkText operation
);

Create the second Preparer to summarize the chunked text

-- Create the model. It must support the decode_text and decode_text_batch operations.
SELECT aidb.create_model('model__1321', 't5_local');

SELECT aidb.create_table_preparer(
    name => 'summarizing_preparer__1321',
    operation => 'SummarizeText',
    source_table => 'chunked_data__1321',         -- Reference the output from the ChunkText preparer
    source_key_column => 'unique_id',             -- Reference the unique column from the output of the ChunkText preparer
    source_data_column => 'chunk',                -- Reference the output from the ChunkText preparer
    destination_table => 'summarized_data__1321',
    destination_data_column => 'summary',
    destination_key_column => 'chunk_unique_id',
    options => '{"model": "model__1321"}'::JSONB  -- Configuration for the SummarizeText operation
);
Tip

This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id column. See the unnesting concept for more detail.

Set both Preparers to Live automatic processing

SELECT aidb.set_auto_preparer('chunking_preparer__1321', 'Live');
SELECT aidb.set_auto_preparer('summarizing_preparer__1321', 'Live');

Insert data for processing

Now, when we insert data into the source data table, we see processed results flowing automatically...

INSERT INTO source_table__1321
VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');

Chunks calculated automatically:

SELECT * FROM chunked_data__1321;
Output
 id | part_id | unique_id |                                                                       chunk
----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
 1  |       2 | 1.part.2  | This enables processing or storage of data in manageable parts.
(3 rows)

Summaries of the chunks calculated automatically:

SELECT * FROM summarized_data__1321;
Output
 chunk_unique_id |                                               summary
-----------------+------------------------------------------------------------------------------------------------------
 1.part.0        | text example might require splitting into smaller chunks .
 1.part.1        | the purpose of this function is to partition text data into segments of a specified maximum length .
 1.part.2        | enables processing or storage of data in manageable parts .
(3 rows)

The same automatic flow of logic occurs for deletions:

DELETE FROM source_table__1321 WHERE id = 1;
SELECT * FROM chunked_data__1321;
Output
 id | part_id | unique_id | chunk
----+---------+-----------+-------
(0 rows)
SELECT * FROM summarized_data__1321;
Output
 chunk_unique_id | summary
-----------------+---------
(0 rows)

Could this page be better? Report a problem or suggest an addition!