torchtext#


class TorchtextDataSetFromDataFrame(*args: Any, **kwargs: Any)[source]#

Bases: Dataset

A specialisation of torchtext.data.Dataset, where the data is taken from a pandas.DataFrame

Parameters:
  • df – the data frame from which to obtain the data

  • fields – a mapping from column names in the given data frame to torchtext fields, i.e. the keys are the columns to read and the values are the fields to use for generated Example instances

class TorchDataSetFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: Optional[str], cuda: bool)[source]#

Bases: TorchDataSet

iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Generator[Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor], None, None][source]#

Provides an iterator over batches from the data set.

Parameters:
  • batch_size – the maximum size of each batch

  • shuffle – whether to shuffle the data set

  • input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size() Optional[int][source]#

Returns the total size of the data set (number of data points) if it is known.

Returns:

the number of data points or None of the size is not known.

class TorchDataSetProviderFromTorchtextDataSet(dataSet: torchtext.data.Dataset, inputField: str, outputField: str, cuda: bool, model_output_dim, input_dim=None)[source]#

Bases: TorchDataSetProvider

provide_split(fractional_size_of_first_set: float) Tuple[TorchDataSet, TorchDataSet][source]#

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters:

fractional_size_of_first_set – the fractional size of the first data set

Returns:

a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data