torchtext#
Source code: sensai/torch/torchtext.py
- class TorchtextDataSetFromDataFrame(*args: Any, **kwargs: Any)[source]#
Bases:
Dataset
A specialisation of torchtext.data.Dataset, where the data is taken from a pandas.DataFrame
- Parameters:
df – the data frame from which to obtain the data
fields – a mapping from column names in the given data frame to torchtext fields, i.e. the keys are the columns to read and the values are the fields to use for generated Example instances
- class TorchDataSetFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)[source]#
Bases:
TorchDataSet
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Generator[Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor], None, None] [source]#
Provides an iterator over batches from the data set.
- Parameters:
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetProviderFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)[source]#
Bases:
TorchDataSetProvider
- provide_split(fractional_size_of_first_set: float) Tuple[TorchDataSet, TorchDataSet] [source]#
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters:
fractional_size_of_first_set – the fractional size of the first data set
- Returns:
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data