Dataset management - keras-text Documentation

Dataset

__init__(self, inputs, labels, test_indices=None, **kwargs)

Encapsulates all pieces of data to run an experiment. This is basically a bag of items that makes it easy to serialize and deserialize everything as

Args:

inputs: The raw model inputs. This can be set to None if you dont want to serialize this value when you save the dataset.
labels: The raw output labels.
test_indices: The optional test indices to use. Ideally, this should be generated one time and reused across experiments to make results comparable. generate_test_indices can be used generate first time indices. **kwargs: Additional key value items to store.

save(self, file_path)

Serializes this dataset to a file.

Args:

train_val_split(self, split_ratio=0.1)

Generates train and validation sets from the training indices.

Args:

Returns:

The stratified train and val subsets. Multi-label outputs are handled as well.

update_test_indices(self, test_size=0.1)

Updates test_indices property with indices of test_size proportion.

Args: