Source: keras_text/data.py#L0


Dataset

Dataset.labels

Dataset.num_classes

Dataset.test_indices

Dataset.train_indices


Dataset.__init__

__init__(self, inputs, labels, test_indices=None, **kwargs)

Encapsulates all pieces of data to run an experiment. This is basically a bag of items that makes it easy to serialize and deserialize everything as

Args:

  • inputs: The raw model inputs. This can be set to None if you dont want to serialize this value when you save the dataset.
  • labels: The raw output labels.
  • test_indices: The optional test indices to use. Ideally, this should be generated one time and reused across experiments to make results comparable. generate_test_indices can be used generate first time indices. **kwargs: Additional key value items to store.

Dataset.save

save(self, file_path)

Serializes this dataset to a file.

Args:

  • file_path: The file path to use.

Dataset.train_val_split

train_val_split(self, split_ratio=0.1)

Generates train and validation sets from the training indices.

Args:

  • split_ratio: The split proportion in [0, 1] (Default value: 0.1)

Returns:

The stratified train and val subsets. Multi-label outputs are handled as well.


Dataset.update_test_indices

update_test_indices(self, test_size=0.1)

Updates test_indices property with indices of test_size proportion.

Args:

  • test_size: The test proportion in [0, 1] (Default value: 0.1)