Source: keras_text/data.py#L0
Dataset
Dataset.labels
Dataset.num_classes
Dataset.test_indices
Dataset.train_indices
Dataset.__init__
__init__(self, inputs, labels, test_indices=None, **kwargs)
Encapsulates all pieces of data to run an experiment. This is basically a bag of items that makes it easy to serialize and deserialize everything as
Args:
- inputs: The raw model inputs. This can be set to None if you dont want to serialize this value when you save the dataset.
- labels: The raw output labels.
- test_indices: The optional test indices to use. Ideally, this should be generated one time and reused
across experiments to make results comparable.
generate_test_indices
can be used generate first time indices. **kwargs: Additional key value items to store.
Dataset.save
save(self, file_path)
Serializes this dataset to a file.
Args:
- file_path: The file path to use.
Dataset.train_val_split
train_val_split(self, split_ratio=0.1)
Generates train and validation sets from the training indices.
Args:
- split_ratio: The split proportion in [0, 1] (Default value: 0.1)
Returns:
The stratified train and val subsets. Multi-label outputs are handled as well.
Dataset.update_test_indices
update_test_indices(self, test_size=0.1)
Updates test_indices
property with indices of test_size
proportion.
Args:
- test_size: The test proportion in [0, 1] (Default value: 0.1)