Publication number: 20250045316
Abstract: An example method includes providing, to a sequence model (i) a plurality of few-shot prompts, wherein each prompt comprises a demonstration passage, a demonstration task, and a demonstration query, wherein the demonstration task describes a type of retrieval, and wherein the demonstration query is relevant to the demonstration task, and (ii) a plurality of passages sampled from a corpus of passages. The method also includes receiving, from the sequence model and for the plurality of passages and based on the plurality of few-shot prompts, a respective plurality of predicted task-query pairs, the sequence model having been prompted to predict a task based on an input passage, and predict an output query relevant to the predicted task. The method further includes generating a synthetic training dataset comprising the plurality of passages and the respective plurality of predicted task-query pairs. The method also includes providing the synthetic training dataset.
Type:
Application
Filed:
July 30, 2024
Publication date:
February 6, 2025
Inventors:
Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Iftekhar Naim, Yi Luan, Blair Yuxin Chen, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Daniel Matthew Cer, Gustavo Adolfo Hernandez Abrego, Jeremy Robert Cole, Colin Hearne Evans, Yuzhe Zhao, Pranay Bhatia, Rajvi Kapadia, Riham Hassan Abdel-Moneim Mansour, Raphael Dominik Hoffman, Simon Kunio Tokumine, Scott Bradley Huffman, Stephen Zachary Karukas, Michael Yiupun Kwong, Shu Zheng, Yan Qiao, Lukas Rutishauser, Anand Rajan Iyer