torch_data

torch_data#

Source code: sensai/torch/torch_data.py

to_tensor(d: Union[torch.Tensor, numpy.ndarray, list], cuda=False)[source]#

class TensorScaler[source]#

Bases: ABC

abstract cuda()[source]#: Makes this scaler’s components use CUDA

abstract normalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

abstract denormalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerCentreAndScale(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)[source]#

Bases: TensorScaler

cuda()[source]#: Makes this scaler’s components use CUDA

normalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

denormalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerFromVectorDataScaler(vector_data_scaler: VectorDataScaler, cuda: bool)[source]#: Bases: TensorScalerCentreAndScale

class TensorScalerIdentity[source]#

Bases: TensorScaler

cuda()[source]#: Makes this scaler’s components use CUDA

normalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies scaling/normalisation to the given tensor :param tensor: the tensor to scale/normalise :return: the scaled/normalised tensor

denormalise(tensor: torch.Tensor) → torch.Tensor[source]#: Applies the inverse of method normalise to the given tensor :param tensor: the tensor to denormalise :return: the denormalised tensor

class TensorScalerFromDFTSkLearnTransformer(dft: DFTSkLearnTransformer)[source]#: Bases: TensorScalerCentreAndScale

class Tensoriser[source]#

Bases: ABC

Represents a method for transforming a data frame into one or more tensors to be processed by a neural network model

tensorise(df: pandas.DataFrame) → Union[torch.Tensor, List[torch.Tensor]][source]#

abstract fit(df: pandas.DataFrame, model=None)[source]#

Parameters:

df – the data frame with which to fit this tensoriser
model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.

class RuleBasedTensoriser[source]#

Bases: Tensoriser, ABC

Base class for tensorisers which transform data frames into tensors based on a predefined set of rules and do not require fitting

fit(df: pandas.DataFrame, model=None)[source]#

Parameters:

df – the data frame with which to fit this tensoriser
model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.

class TensoriserDataFrameFloatValuesMatrix[source]#: Bases: RuleBasedTensoriser

class TensoriserClassLabelIndices[source]#: Bases: RuleBasedTensoriser

class DataUtil[source]#

Bases: ABC

Interface for DataUtil classes, which are used to process data for neural networks

abstract split_into_tensors(fractional_size_of_first_set) → Tuple[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]][source]#

Splits the data set

Parameters:: fractional_size_of_first_set – the desired fractional size in
Returns:: a tuple (A, B) where A and B are tuples (in, out) with input and output data

abstract get_output_tensor_scaler() → TensorScaler[source]#

Gets the scaler with which to scale model outputs

Returns:: the scaler

abstract get_input_tensor_scaler() → TensorScaler[source]#

Gets the scaler with which to scale model inputs

Returns:: the scaler

abstract model_output_dim() → int[source]#

Returns:: the dimensionality that is to be output by the model to be trained

abstract input_dim()[source]#

Returns:: the dimensionality of the input or None if it is variable

class VectorDataUtil(inputs: pandas.DataFrame, outputs: pandas.DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, data_frame_splitter: Optional[DataFrameSplitter] = None)[source]#

Bases: DataUtil

Parameters:

inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode

get_output_tensor_scaler()[source]#

Gets the scaler with which to scale model outputs

Returns:: the scaler

get_input_tensor_scaler()[source]#

Gets the scaler with which to scale model inputs

Returns:: the scaler

split_into_tensors(fractional_size_of_first_set)[source]#

Splits the data set

Parameters:: fractional_size_of_first_set – the desired fractional size in
Returns:: a tuple (A, B) where A and B are tuples (in, out) with input and output data

split_into_data_sets(fractional_size_of_first_set, cuda: bool, tensorise_dynamically=False) → Tuple[TorchDataSet, TorchDataSet][source]#

input_dim()[source]#

Returns:: the dimensionality of the input or None if it is variable

output_dim()[source]#

Returns:: the dimensionality of the outputs (ground truth values)

model_output_dim()[source]#

Returns:: the dimensionality that is to be output by the model to be trained

class ClassificationVectorDataUtil(inputs: pandas.DataFrame, outputs: pandas.DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, data_frame_splitter: Optional[DataFrameSplitter] = None)[source]#

Bases: VectorDataUtil

Parameters:

inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode

model_output_dim()[source]#

Returns:: the dimensionality that is to be output by the model to be trained

class TorchDataSet[source]#

Bases: object

abstract iter_batches(batch_size: int, shuffle: bool = False, input_only=False) → Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]][source]#

Provides an iterator over batches from the data set.

Parameters:

batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

abstract size() → Optional[int][source]#

Returns the total size of the data set (number of data points) if it is known.

Returns:: the number of data points or None of the size is not known.

class TorchDataSetProvider(input_tensor_scaler: Optional[TensorScaler] = None, output_tensor_scaler: Optional[TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)[source]#

Bases: object

abstract provide_split(fractional_size_of_first_set: float) → Tuple[TorchDataSet, TorchDataSet][source]#

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters:: fractional_size_of_first_set – the fractional size of the first data set
Returns:: a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

get_output_tensor_scaler() → TensorScaler[source]#

get_input_tensor_scaler() → TensorScaler[source]#

get_model_output_dim() → int[source]#

Returns:: the number of output dimensions that would be required to be generated by the model to match this dataset.

get_input_dim() → Optional[int][source]#

Returns:: the number of output dimensions that would be required to be generated by the model to match this dataset. For models that accept variable input sizes (such as RNNs), this may be None.

class TensorTuple(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])[source]#

Bases: object

Represents a tuple of tensors (or a single tensor) and can be used to manipulate the contained tensors simultaneously

cuda() → TensorTuple[source]#

tuple() → Sequence[torch.Tensor][source]#

item() → Union[torch.Tensor, Sequence[torch.Tensor]][source]#

concat(other: TensorTuple) → TensorTuple[source]#

class TorchDataSetFromTensors(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)[source]#

Bases: TorchDataSet

Parameters:

x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device

iter_batches(batch_size: int, shuffle: bool = False, input_only=False) → Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]][source]#

Provides an iterator over batches from the data set.

Parameters:

batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size()[source]#

Returns the total size of the data set (number of data points) if it is known.

Returns:: the number of data points or None of the size is not known.

class TorchDataSetFromDataFramesPreTensorised(input_df: pandas.DataFrame, output_df: Optional[pandas.DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None)[source]#

Bases: TorchDataSetFromTensors

Parameters:

x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device

class TorchDataSetFromDataFramesDynamicallyTensorised(input_df: pandas.DataFrame, output_df: Optional[pandas.DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None)[source]#

Bases: TorchDataSet

size() → Optional[int][source]#

Returns the total size of the data set (number of data points) if it is known.

Returns:: the number of data points or None of the size is not known.

iter_batches(batch_size: int, shuffle: bool = False, input_only=False)[source]#

Provides an iterator over batches from the data set.

Parameters:

batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

class TorchDataSetFromDataFrames(input_df: pandas.DataFrame, output_df: Optional[pandas.DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, tensorise_dynamically=False)[source]#

Bases: TorchDataSet

iter_batches(batch_size: int, shuffle: bool = False, input_only=False)[source]#

Provides an iterator over batches from the data set.

Parameters:

batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.

size() → Optional[int][source]#

Returns the total size of the data set (number of data points) if it is known.

Returns:: the number of data points or None of the size is not known.

class TorchDataSetProviderFromDataUtil(data_util: DataUtil, cuda: bool)[source]#

Bases: TorchDataSetProvider

provide_split(fractional_size_of_first_set: float) → Tuple[TorchDataSet, TorchDataSet][source]#

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters:: fractional_size_of_first_set – the fractional size of the first data set
Returns:: a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

class TorchDataSetProviderFromVectorDataUtil(data_util: VectorDataUtil, cuda: bool, tensorise_dynamically=False)[source]#

Bases: TorchDataSetProvider

provide_split(fractional_size_of_first_set: float) → Tuple[TorchDataSet, TorchDataSet][source]#

Provides two data sets, which could, for example, serve as training and validation sets.

Parameters:: fractional_size_of_first_set – the fractional size of the first data set
Returns:: a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data

class TensorTransformer[source]#

Bases: ABC

abstract transform(t: torch.Tensor) → torch.Tensor[source]#