Abstract: The present disclosure relates to a system, method and non-transitory computer readable medium for training language models. The exemplary method includes obtaining a first language model. The method includes using a determined set of weights of the first language model to initialize a second language model. The first and second language model are different model types. The method includes applying the second language model to perform an operation.
Type:
Application
Filed:
November 30, 2022
Publication date:
June 8, 2023
Applicant:
Cohere Inc.
Inventors:
Nicholas Myles Wisener FROSST, Rozhina GHANAVI, Christopher Alexander CREMER
Abstract: A method of training a neural network model and related systems are disclosed. The method includes training the neural network model by factorising, based on a singular value decomposition scheme, a first plurality of nodes of the neural network model into a low rank neural network model comprising a second plurality of nodes. Each node of the second plurality of nodes is defined at least in part by at least one weight matrix, and the factorisation is based on a matrix decomposition scheme constrained by one or more directionality criteria.
Abstract: A system and method are provided for generating a trained model to filter data sets for filtering hate speech. The method includes obtaining an unfiltered corpus of data, obtaining a set of trigger phrases, and using the set of trigger phrases to generate a trained model which comprises at least one conditional likelihood of the trigger phrases conditioned on documents in the corpus of data. A system and method are also provided for filtering data sets for hate speech using pre-trained models. The method includes obtaining a pretrained model generated using a set of trigger phrases and which comprises at least one conditional likelihood of the trigger phrases conditioned on document in a corpus of data used to generate the pretrained model; using the pretrained model to filter an unfiltered dataset and generate a filtered dataset; and outputting the filtered dataset.