torch_data#
Source code: sensai/torch/torch_data.py
- class TensorScaler[source]#
Bases:
ABC
- class TensorScalerCentreAndScale(centre: Optional[torch.Tensor] = None, scale: Optional[torch.Tensor] = None)[source]#
Bases:
TensorScaler
- class TensorScalerFromVectorDataScaler(vector_data_scaler: VectorDataScaler, cuda: bool)[source]#
Bases:
TensorScalerCentreAndScale
- class TensorScalerIdentity[source]#
Bases:
TensorScaler
- class TensorScalerFromDFTSkLearnTransformer(dft: DFTSkLearnTransformer)[source]#
Bases:
TensorScalerCentreAndScale
- class Tensoriser[source]#
Bases:
ABC
Represents a method for transforming a data frame into one or more tensors to be processed by a neural network model
- abstract fit(df: DataFrame, model=None)[source]#
- Parameters:
df – the data frame with which to fit this tensoriser
model – the model in the context of which the fitting takes place (if any). The fitting process may set parameters within the model that can only be determined from the (pre-tensorised) data.
- class RuleBasedTensoriser[source]#
Bases:
Tensoriser
,ABC
Base class for tensorisers which transform data frames into tensors based on a predefined set of rules and do not require fitting
- class TensoriserDataFrameFloatValuesMatrix[source]#
Bases:
RuleBasedTensoriser
- class TensoriserClassLabelIndices[source]#
Bases:
RuleBasedTensoriser
- class DataUtil[source]#
Bases:
ABC
Interface for DataUtil classes, which are used to process data for neural networks
- abstract split_into_tensors(fractional_size_of_first_set) Tuple[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]] [source]#
Splits the data set
- Parameters:
fractional_size_of_first_set – the desired fractional size in
- Returns:
a tuple (A, B) where A and B are tuples (in, out) with input and output data
- abstract get_output_tensor_scaler() TensorScaler [source]#
Gets the scaler with which to scale model outputs
- Returns:
the scaler
- abstract get_input_tensor_scaler() TensorScaler [source]#
Gets the scaler with which to scale model inputs
- Returns:
the scaler
- class VectorDataUtil(inputs: DataFrame, outputs: DataFrame, cuda: bool, normalisation_mode=NormalisationMode.NONE, differing_output_normalisation_mode=None, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, data_frame_splitter: Optional[DataFrameSplitter] = None)[source]#
Bases:
DataUtil
- Parameters:
inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode
- get_output_tensor_scaler()[source]#
Gets the scaler with which to scale model outputs
- Returns:
the scaler
- get_input_tensor_scaler()[source]#
Gets the scaler with which to scale model inputs
- Returns:
the scaler
- split_into_tensors(fractional_size_of_first_set)[source]#
Splits the data set
- Parameters:
fractional_size_of_first_set – the desired fractional size in
- Returns:
a tuple (A, B) where A and B are tuples (in, out) with input and output data
- split_into_data_sets(fractional_size_of_first_set, cuda: bool, tensorise_dynamically=False) Tuple[TorchDataSet, TorchDataSet] [source]#
- class ClassificationVectorDataUtil(inputs: DataFrame, outputs: DataFrame, cuda, num_classes, normalisation_mode=NormalisationMode.NONE, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, data_frame_splitter: Optional[DataFrameSplitter] = None)[source]#
Bases:
VectorDataUtil
- Parameters:
inputs – the data frame of inputs
outputs – the data frame of outputs
cuda – whether to apply CUDA
normalisation_mode – the normalisation mode to use for inputs and (unless differingOutputNormalisationMode is specified) outputs
differing_output_normalisation_mode – the normalisation mode to apply to outputs, overriding normalisationMode; if None, use normalisationMode
- class TorchDataSet[source]#
Bases:
object
- abstract iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]] [source]#
Provides an iterator over batches from the data set.
- Parameters:
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetProvider(input_tensor_scaler: Optional[TensorScaler] = None, output_tensor_scaler: Optional[TensorScaler] = None, input_dim: Optional[int] = None, model_output_dim: Optional[int] = None)[source]#
Bases:
object
- abstract provide_split(fractional_size_of_first_set: float) Tuple[TorchDataSet, TorchDataSet] [source]#
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters:
fractional_size_of_first_set – the fractional size of the first data set
- Returns:
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data
- get_output_tensor_scaler() TensorScaler [source]#
- get_input_tensor_scaler() TensorScaler [source]#
- class TensorTuple(tensors: Union[torch.Tensor, Sequence[torch.Tensor]])[source]#
Bases:
object
Represents a tuple of tensors (or a single tensor) and can be used to manipulate the contained tensors simultaneously
- cuda() TensorTuple [source]#
- concat(other: TensorTuple) TensorTuple [source]#
- class TorchDataSetFromTensors(x: Union[torch.Tensor, Sequence[torch.Tensor]], y: Optional[torch.Tensor], cuda: bool)[source]#
Bases:
TorchDataSet
- Parameters:
x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False) Iterator[Union[Tuple[torch.Tensor, torch.Tensor], Tuple[Sequence[torch.Tensor], torch.Tensor], torch.Tensor, Sequence[torch.Tensor]]] [source]#
Provides an iterator over batches from the data set.
- Parameters:
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetFromDataFramesPreTensorised(input_df: DataFrame, output_df: Optional[DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None)[source]#
Bases:
TorchDataSetFromTensors
- Parameters:
x – the input tensor(s); if more than one, they must be of the same length (and a slice of each shall be provided to the model as an input in each batch)
y – the output tensor
cuda – whether any generated tensors shall be moved to the selected CUDA device
- class TorchDataSetFromDataFramesDynamicallyTensorised(input_df: DataFrame, output_df: Optional[DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None)[source]#
Bases:
TorchDataSet
- size() Optional[int] [source]#
Returns the total size of the data set (number of data points) if it is known.
- Returns:
the number of data points or None of the size is not known.
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False)[source]#
Provides an iterator over batches from the data set.
- Parameters:
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetFromDataFrames(input_df: DataFrame, output_df: Optional[DataFrame], cuda: bool, input_tensoriser: Optional[Tensoriser] = None, output_tensoriser: Optional[Tensoriser] = None, tensorise_dynamically=False)[source]#
Bases:
TorchDataSet
- iter_batches(batch_size: int, shuffle: bool = False, input_only=False)[source]#
Provides an iterator over batches from the data set.
- Parameters:
batch_size – the maximum size of each batch
shuffle – whether to shuffle the data set
input_only – whether to provide only inputs (rather than inputs and corresponding outputs). If true, provide only inputs, where inputs can either be a tensor or a tuple of tensors. If false, provide a pair (i, o) with inputs and corresponding outputs (o is always a tensor). Some data sets may only be able to provide inputs, in which case inputOnly=False should lead to an exception.
- class TorchDataSetProviderFromDataUtil(data_util: DataUtil, cuda: bool)[source]#
Bases:
TorchDataSetProvider
- provide_split(fractional_size_of_first_set: float) Tuple[TorchDataSet, TorchDataSet] [source]#
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters:
fractional_size_of_first_set – the fractional size of the first data set
- Returns:
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data
- class TorchDataSetProviderFromVectorDataUtil(data_util: VectorDataUtil, cuda: bool, tensorise_dynamically=False)[source]#
Bases:
TorchDataSetProvider
- provide_split(fractional_size_of_first_set: float) Tuple[TorchDataSet, TorchDataSet] [source]#
Provides two data sets, which could, for example, serve as training and validation sets.
- Parameters:
fractional_size_of_first_set – the fractional size of the first data set
- Returns:
a tuple of data sets (A, B) where A has (approximately) the given fractional size and B encompasses the remainder of the data