megatron.data.image_folder.make_dataset#

megatron.data.image_folder.make_dataset(directory: str, class_to_idx: Dict[str, int], data_per_class_fraction: float, extensions: Tuple[str, ...] | None = None, is_valid_file: Callable[[str], bool] | None = None) List[Tuple[str, int]]#

Generates a list of samples of a form (path_to_sample, class). :param directory: root dataset directory :type directory: str :param class_to_idx: dictionary mapping class name to class index :type class_to_idx: Dict[str, int] :param extensions: A list of allowed extensions.

Either extensions or is_valid_file should be passed. Defaults to None.

Parameters:

is_valid_file (optional) – A function that takes path of a file and checks if the file is a valid file (used to check of corrupt files) both extensions and is_valid_file should not be passed. Defaults to None.

Raises:

ValueError – In case extensions and is_valid_file are None or both are not None.

Returns:

samples of a form (path_to_sample, class)

Return type:

List[Tuple[str, int]]