Template Class DataLoader¶

Defined in File dataloader.hpp

Class Documentation¶

template<typename DatasetX = arma::mat, typename DatasetY = arma::mat, class ScalerType = mlpack::data::MinMaxScaler> class mlpack::models::DataLoader¶

Dataloader class to load popular datasets.

// Create a dataloader for any popular dataset.
// Set parameters for dataset.
const string datasetName = "mnist";
bool shuffleData = true;
double ratioForTrainTestSplit = 0.75;

// Create the DataLoader object.
DataLoader<> dataloader(datasetName, shuffleData,
   ratioForTrainTestSplit);

// Use the dataloader for training.
model.Train(dataloader.TrainFeatures(), dataloader.TrainLabels());

// Use the dataloader for prediction.
model.Predict(dataloader.TestFeatures(), dataloader.TestLabels());

tparam DatasetX: Datatype for loading input features.
tparam DatasetY: Datatype for prediction features.
tparam ScalerType: mlpack’s Scaler Object for scaling features.

Public Functions

DataLoader()¶: Create DataLoader object.

DataLoader(const std::string &dataset, const bool shuffle, const double validRatio = 0.25, const bool useScaler = true, const std::vector<std::string> augmentation = std::vector<std::string>(), const double augmentationProbability = 0.2)¶

Constructor for DataLoader. This is used for loading popular Datasets such as MNIST, ImageNet, Pascal VOC and many more.

Parameters

datasetPath – Path or name of dataset.
shuffle – whether or not to shuffle the data.
validRatio – Ratio of dataset to be used for validation set.
useScaler – Use feature scaler for pre-processing the dataset.
augmentation – Adds augmentation to training data only.
augmentationProbability – Probability of applying augmentation on dataset.

void LoadCSV(const std::string &datasetPath, const bool loadTrainData = true, const bool shuffle = true, const double validRatio = 0.25, const bool useScaler = false, const int startInputFeatures = -1, const int endInputFeatures = -1, const int startPredictionFeatures = -1, const int endPredictionFeatures = -1, const std::vector<std::string> augmentation = std::vector<std::string>(), const double augmentationProbability = 0.2)¶

Function to load and preprocess train or test data stored in CSV files.

Parameters

datasetPath – Path to the dataset.
loadTrainData – Boolean to determine whether data will be stored for training or testing. If true, data will be loaded for training. Note: This option augmentation to NULL, set ratio to 1 and scaler will be used to only transform the test data.
shuffle – Boolean to determine whether or not to shuffle the data.
validRatio – Ratio of dataset to be used for validation set.
useScaler – Fits the scaler on training data and transforms dataset.
startInputFeatures – First Index which will be fed into the model as input. Note: Indicies are wrapped and -1 implies last column.
endInputFeature – Last Index which will be fed into the model as input. Note: Indicies are wrapped and -1 implies last column.
startPredictionFeatures – First Index which be predicted by the model as output. Note: Indicies are wrapped and -1 implies last column.
endPredictionFeatures – Last Index which be predicted by the model as output. Note: Indicies are wrapped and -1 implies last column.
augmentation – Vector strings of augmentations supported by mlpack.
augmentationProbability – Probability of applying augmentation to a particular cell.

void LoadObjectDetectionDataset(const std::string &pathToAnnotations, const std::string &pathToImages, const std::vector<std::string> &classes, const double validRatio = 0.2, const bool shuffle = true, const std::vector<std::string> &augmentation = std::vector<std::string>(), const double augmentationProbability = 0.2, const bool absolutePath = false, const std::string &baseXMLTag = ~~"annotation"~~, const std::string &imageNameXMLTag = ~~"filename"~~, const std::string &sizeXMLTag = ~~"size"~~, const std::string &objectXMLTag = ~~"object"~~, const std::string &bndboxXMLTag = ~~"bndbox"~~, const std::string &classNameXMLTag = ~~"name"~~, const std::string &x1XMLTag = ~~"xmin"~~, const std::string &y1XMLTag = ~~"ymin"~~, const std::string &x2XMLTag = ~~"xmax"~~, const std::string &y2XMLTag = ~~"ymax"~~)¶

Loads object detection dataset. It requires a single annotation file in XML format. Each XML file should correspond to a single image in images folder.

XML file should containg the following :

Each XML file should be wrapped in annotation tag.
Filename of image in images folder will be depicted by filename tag.
Object tag depicting characteristics of bounding box.
Each object tag should contain name tag i.e. class of the object.
Each object tag should contain bndbox tag containing xmin, ymin, xmax, ymax.

NOTE : Labels are assigned using classes vector. Set verbose to 1 to print labels and their corresponding class. The labels type should be field type here.

Parameters

pathToAnnotations – Path to the folder containing XML type annotation files.
pathToImages – Path to folder containing images corresponding to annotations.
classes – Vector of strings containing list of classes. Labels are assigned according to this vector.
validRatio – Ratio of dataset that will be used for validation.
shuffle – Boolean to determine whether the dataset is shuffled.
augmentation – Vector strings of augmentations supported by mlpack.
augmentationProbability – Probability of applying augmentation to a particular image.
absolutePath – Boolean to determine if absolute path is used. Defaults to false.
baseXMLTag – XML tag name which wraps around the annotation file.
imageNameXMLTag – XML tag name which holds the value of image filename.
objectXMLTag – XML tag name which holds details of bounding box i.e. class and coordinates of bounding box.
bndboxXMLTag – XML tag name which holds coordinates of bounding box.
classNameXMLTag – XML tag name inside objectXMLTag which holds the name of the class of bounding box.
x1XMLTag – XML tag name inside bndboxXMLTag which hold value of lower most x coordinate of bounding box.
y1XMLTag – XML tag name inside bndboxXMLTag which hold value of lower most y coordinate of bounding box.
x2XMLTag – XML tag name inside bndboxXMLTag which hold value of upper most x coordinate of bounding box.
y2XMLTag – XML tag name inside bndboxXMLTag which hold value of upper most y coordinate of bounding box.

void LoadAllImagesFromDirectory(const std::string &imagesPath, DatasetX &dataset, DatasetY &labels, const size_t imageWidth, const size_t imageHeight, const size_t imageDepth, const size_t label = 0)¶

Load all images from directory.

Parameters

imagesPath – Path to all images.
dataset – Armadillo type where images will be loaded.
labels – Armadillo type where labels will be loaded.
imageWidth – Width of images in dataset.
imageHeight – Height of images in dataset.
imageDepth – Depth of images in dataset.
label – Label which will be assigned to image.
augmentation – Vector strings of augmentations supported by mlpack.
augmentationProbability – Probability of applying augmentation to a particular image.

void LoadImageDatasetFromDirectory(const std::string &pathToDataset, const size_t imageWidth, const size_t imageHeight, const size_t imageDepth, const bool trainData = true, const double validRatio = 0.2, const bool shuffle = true, const std::vector<std::string> &augmentation = std::vector<std::string>(), const double augmentationProbability = 0.2)¶

Load all images from directory.

Parameters

pathToDataset – Path to all folders containing all images.
imageWidth – Width of images in dataset.
imageHeight – Height of images in dataset.
imageDepth – Depth of images in dataset.
trainData – Determines whether data is training set or test set.
shuffle – Boolean to determine whether or not to shuffle the data.
validRatio – Ratio of dataset to be used for validation set.
augmentation – Vector strings of augmentations supported by mlpack.
augmentationProbability – Probability of applying augmentation to a particular image.

inline DatasetX TrainFeatures() const¶: Get the training dataset features.

inline DatasetX &TrainFeatures()¶: Modify the training dataset features.

inline DatasetY TrainLabels() const¶: Get the training dataset labels.

inline DatasetY &TrainLabels()¶: Modify the training dataset labels.

inline DatasetX TestFeatures() const¶: Get the test dataset features.

inline DatasetX &TestFeatures()¶: Modify the test dataset features.

inline DatasetY TestLabels() const¶: Get the test dataset labels.

inline DatasetY &TestLabels()¶: Modify the test dataset labels.

inline DatasetX ValidFeatures() const¶: Get the validation dataset features.

inline DatasetX &ValidFeatures()¶: Modify the validation dataset features.

inline DatasetY ValidLabels() const¶: Get the validation dataset labels.

inline DatasetY &ValidLabels()¶: Modify the validation dataset labels.

inline std::tuple<DatasetX, DatasetY> TrainSet() const¶: Get the training dataset.

inline std::tuple<DatasetX, DatasetY> ValidSet() const¶: Get the validation dataset.

inline std::tuple<DatasetX, DatasetY> TestSet() const¶: Get the testing dataset.

inline ScalerType Scaler() const¶: Get the Scaler.

inline ScalerType &Scaler()¶: Modify the Scaler.