Preparers Auto Processing Example

Example of using preparer auto processing with the ChunkText operation in AI Accelerator.

Tip

This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id column. See the unnesting concept for more detail.

Preparer with table data source

-- Create source test table
CREATE TABLE source_table__1628
(
    id      INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    content TEXT NOT NULL
);

SELECT aidb.create_table_preparer(
    name => 'preparer__1628',
    operation => 'ChunkText',
    source_table => 'source_table__1628',
    source_data_column => 'content',
    destination_table => 'chunked_data__1628',
    destination_data_column => 'chunk',
    source_key_column => 'id',
    destination_key_column => 'id',
    options => '{"desired_length": 1, "max_length": 1000}'::JSONB  -- Configuration for the ChunkText operation
);

SELECT aidb.set_auto_preparer('preparer__1628', 'Live');

INSERT INTO source_table__1628
VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.');
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                                                                       chunk
----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
 1  |       2 | 1.part.2  | This enables processing or storage of data in manageable parts.
(3 rows)
INSERT INTO source_table__1628
VALUES (2, 'This sentence should be its own chunk. This too.');
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                                                                       chunk
----+---------+-----------+---------------------------------------------------------------------------------------------------------------------------------------------------
 1  |       0 | 1.part.0  | This is a significantly longer text example that might require splitting into smaller chunks.
 1  |       1 | 1.part.1  | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters.
 1  |       2 | 1.part.2  | This enables processing or storage of data in manageable parts.
 2  |       0 | 2.part.0  | This sentence should be its own chunk.
 2  |       1 | 2.part.1  | This too.
(5 rows)
DELETE FROM source_table__1628 WHERE id = 1;
SELECT * FROM chunked_data__1628;
Output
 id | part_id | unique_id |                 chunk
----+---------+-----------+----------------------------------------
 2  |       0 | 2.part.0  | This sentence should be its own chunk.
 2  |       1 | 2.part.1  | This too.
(2 rows)
SELECT aidb.set_auto_preparer('preparer__1628', 'Disabled');

Could this page be better? Report a problem or suggest an addition!