megatron.data.ict_dataset.get_ict_dataset#
- megatron.data.ict_dataset.get_ict_dataset(use_titles=True, query_in_block_prob=1)#
Get a dataset which uses block samples mappings to get ICT/block indexing data (via get_block()) rather than for training, since it is only built with a single epoch sample mapping.