TransformedDataset
minnt.TransformedDataset
Bases: Dataset
A dataset capable of applying transformations to its items and batches.
Assuming the TransformedDataset is used within a DataLoader,
batches are produced as follows:
- First, every element is retrieved from the source dataset using
__getitem__(or a list of all batch elements at the same time using__getitems__). - Then, if transform is defined, it is applied to each individual item.
- Next, if collate is defined, it is applied to the list of items to form a batch.
- Finally, if transform_batch is defined, it is applied to the batch.
Warning
Given how PyTorch torch.utils.data.DataLoader works, when specifying
collate and/or transform_batch,
the collate_fn of the DataLoader must be set to self.collate_fn.
This is automatically done when using the dataloader
method of this class to create the DataLoader. However,
if you create a DataLoader manually, you must pass
collate_fn=transformed_dataset.collate_fn or otherwise collate
and transform_batch will be ignored.
Source code in minnt/transformed_dataset.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
__init__
Create a new transformed dataset using the provided dataset with an optional limit.
Parameters:
-
dataset(Dataset) –The source dataset implementing
__len__and__getitem__. -
dataset_limit(int | None, default:None) –If given, limits the length of the dataset to this value.
Environment variables: The following environment variable can be used to override the method parameters:
MINNT_DATASET_LIMIT: If set to a positive integer, overrides thedataset_limitparameter.
Source code in minnt/transformed_dataset.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
__len__
__len__() -> int
Return the number of items in the dataset.
Source code in minnt/transformed_dataset.py
59 60 61 | |
__getitem__
Return the item at the specified index.
Source code in minnt/transformed_dataset.py
63 64 65 66 67 68 | |
__getitems__
Return a batch of items at the specified indices.
Source code in minnt/transformed_dataset.py
70 71 72 73 74 75 76 77 78 | |
transform
class-attribute
instance-attribute
transform: Callable | None = None
If given, transform is called on each item before returning it.
If the dataset item is a tuple or a list, transform is called with it unpacked.
collate
class-attribute
instance-attribute
collate: Callable | None = None
If given, collate is called on a list of items before returning them as a batch.
transform_batch
class-attribute
instance-attribute
transform_batch: Callable | None = None
If given, transform_batch is called on a batch before returning it.
If the batch is a tuple or a list, transform_batch is called with it unpacked.
collate_fn
A function for a DataLoader to collate a batch of items using collate and/or transform_batch.
This function is used as the collate_fn parameter of a DataLoader when collate or transform_batch is set.
Parameters:
Source code in minnt/transformed_dataset.py
100 101 102 103 104 105 106 107 108 109 110 111 | |
dataloader
dataloader(
batch_size=1, *, shuffle=False, seed=None, num_workers=0, **kwargs: Any
) -> DataLoader
Create a DataLoader for this dataset.
This method is a convenience wrapper around torch.utils.data.DataLoader setting up the required parameters. Most arguments are passed directly to the torch.utils.data.DataLoader, with a few exceptions:
- When
seedis given, it is used to construct thegeneratorargument for the DataLoader usingtorch.Generator().manual_seed(seed); thegeneratoroptions must not be specified inkwargs. - When
shuffleisFalseand nogeneratoris given,torch.Generator()is passed asgenerator. Otherwise, the global random number generator would be used during every construction of an iterator, i.e. during everyiter(dataloader)call. - When
num_workersis greater than 0,persistent_workersis set to True. - When
collateortransform_batchis set, theself.collate_fnis passed as thecollate_fnparameter.
Source code in minnt/transformed_dataset.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |