Preparers chunk text operation examples
These examples use preparers with the ChunkText operation in AI Accelerator.
Tip
This operation transforms the shape of the data, automatically unnesting collections by introducing a part_id
column. See the unnesting concept for more detail.
Primitive
-- Only specify a desired length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10}');
Output
part_id | chunk ---------+----------- 0 | This is a 1 | simple 2 | test 3 | sentence. (4 rows)
-- Specify a desired length and a maximum length SELECT * FROM aidb.chunk_text('This is a simple test sentence.', '{"desired_length": 10, "max_length": 15}');
Output
part_id | chunk ---------+------------- 0 | This is a 1 | simple test 2 | sentence. (3 rows)
-- Named parameters SELECT * FROM aidb.chunk_text( input => 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.', options => '{"desired_length": 40}' );
Output
part_id | chunk ---------+---------------------------------------- 0 | This is a significantly longer text 1 | example that might require splitting 2 | into smaller chunks. 3 | The purpose of this function is to 4 | partition text data into segments of a 5 | specified maximum length, for example, 6 | this sentence 145 is characters. 7 | This enables processing or storage of 8 | data in manageable parts. (9 rows)
-- Semantic chunking to split into the largest continuous semantic chunk that fits in the max_length SELECT * FROM aidb.chunk_text('This sentence should be its own chunk. This too.', '{"desired_length": 1, "max_length": 1000}');
Output
part_id | chunk ---------+---------------------------------------- 0 | This sentence should be its own chunk. 1 | This too. (2 rows)
Preparer with table data source
-- Create source test table CREATE TABLE source_table__1628 ( id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, content TEXT NOT NULL ); INSERT INTO source_table__1628 VALUES (1, 'This is a significantly longer text example that might require splitting into smaller chunks. The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. This enables processing or storage of data in manageable parts.'), (2, 'This sentence should be its own chunk. This too.'); SELECT aidb.create_table_preparer( name => 'preparer__1628', operation => 'ChunkText', source_table => 'source_table__1628', source_data_column => 'content', destination_table => 'chunked_data__1628', destination_data_column => 'chunks', source_key_column => 'id', destination_key_column => 'id', options => '{"desired_length": 1, "max_length": 1000}'::JSONB -- Configuration for the ChunkText operation ); SELECT aidb.bulk_data_preparation('preparer__1628'); SELECT * FROM chunked_data__1628;
Output
id | part_id | unique_id | chunks ----+---------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------- 1 | 0 | 1.part.0 | This is a significantly longer text example that might require splitting into smaller chunks. 1 | 1 | 1.part.1 | The purpose of this function is to partition text data into segments of a specified maximum length, for example, this sentence 145 is characters. 1 | 2 | 1.part.2 | This enables processing or storage of data in manageable parts. 2 | 0 | 2.part.0 | This sentence should be its own chunk. 2 | 1 | 2.part.1 | This too. (5 rows)
- On this page
- Primitive
- Preparer with table data source
← Prev
Preparers Auto Processing Example
↑ Up
AI Accelerator Pipelines - Preparers examples
Next →
Preparers parse HTML operation examples
Could this page be better? Report a problem or suggest an addition!