Patents Assigned to Cohere Inc.

System and method for filtering datasets using conditional-likelihood filtration

Patent number: 12619872

Abstract: A system and method are provided for generating a trained model to filter data sets for filtering hate speech. The method includes obtaining an unfiltered corpus of data, obtaining a set of trigger phrases, and using the set of trigger phrases to generate a trained model which comprises at least one conditional likelihood of the trigger phrases conditioned on documents in the corpus of data. A system and method are also provided for filtering data sets for hate speech using pre-trained models. The method includes obtaining a pretrained model generated using a set of trigger phrases and which comprises at least one conditional likelihood of the trigger phrases conditioned on document in a corpus of data used to generate the pretrained model; using the pretrained model to filter an unfiltered dataset and generate a filtered dataset; and outputting the filtered dataset.

Type: Grant

Filed: June 22, 2022

Date of Patent: May 5, 2026

Assignee: Cohere Inc.

Inventors: Helen Ngo, Nicholas Frosst
Training Transformers Using Sliceout

Publication number: 20260093985

Abstract: A system for training the neural network using dropout with slicing operations preserves the regularization effects of dropout, while speeding up computations and reducing the memory requirements of training the neural network. Instead of randomly dropping weights connected to neurons in a neural network, the system slices contiguous memory segments of weight matrices. For transformer models, the approach first receives input data that consist of a sequence of elements. Based on the input data, input embedding vectors with positional encoding are generated. Then the transformer model is trained by passing the input embedding vectors through various neural network layers. While passing through linear layers, some of the weight matrices are sliced (e.g., masked) such that a contiguous section of a weight matrix is kept unsliced and used for training and the rest of the weight matrix is not accessed.

Type: Application

Filed: December 8, 2025

Publication date: April 2, 2026

Applicant: Cohere Inc.

Inventors: Aidan GOMEZ, Seoyeon YOO
Training transformers using sliceout

Patent number: 12518158

Abstract: A system for training the neural network using dropout with slicing operations preserves the regularization effects of dropout, while speeding up computations and reducing the memory requirements of training the neural network. Instead of randomly dropping weights connected to neurons in a neural network, the system slices contiguous memory segments of weight matrices. For transformer models, the approach first receives input data that consist of a sequence of elements. Based on the input data, input embedding vectors with positional encoding are generated. Then the transformer model is trained by passing the input embedding vectors through various neural network layers. While passing through linear layers, some of the weight matrices are sliced (e.g., masked) such that a contiguous section of a weight matrix is kept unsliced and used for training and the rest of the weight matrix is not accessed.

Type: Grant

Filed: November 19, 2021

Date of Patent: January 6, 2026

Assignee: Cohere Inc.

Inventors: Aidan Gomez, Seoyeon Yoo
System and Method for Training Language Models Using Already Trained Language Models

Publication number: 20230177279

Abstract: The present disclosure relates to a system, method and non-transitory computer readable medium for training language models. The exemplary method includes obtaining a first language model. The method includes using a determined set of weights of the first language model to initialize a second language model. The first and second language model are different model types. The method includes applying the second language model to perform an operation.

Type: Application

Filed: November 30, 2022

Publication date: June 8, 2023

Applicant: Cohere Inc.

Inventors: Nicholas Myles Wisener FROSST, Rozhina GHANAVI, Christopher Alexander CREMER
System and Method for Low Rank Training of Neural Networks

Publication number: 20230057387

Abstract: A method of training a neural network model and related systems are disclosed. The method includes training the neural network model by factorising, based on a singular value decomposition scheme, a first plurality of nodes of the neural network model into a low rank neural network model comprising a second plurality of nodes. Each node of the second plurality of nodes is defined at least in part by at least one weight matrix, and the factorisation is based on a matrix decomposition scheme constrained by one or more directionality criteria.

Type: Application

Filed: July 21, 2022

Publication date: February 23, 2023

Applicant: Cohere Inc.

Inventors: Siddhartha Rao KAMALAKARA, Bharat VENKITESH, Aidan N. GOMEZ, Acyr Flavio Neto Locatelli
System And Method for Filtering Datasets Using Conditional-Likelihood Filtration

Publication number: 20220414467

Abstract: A system and method are provided for generating a trained model to filter data sets for filtering hate speech. The method includes obtaining an unfiltered corpus of data, obtaining a set of trigger phrases, and using the set of trigger phrases to generate a trained model which comprises at least one conditional likelihood of the trigger phrases conditioned on documents in the corpus of data. A system and method are also provided for filtering data sets for hate speech using pre-trained models. The method includes obtaining a pretrained model generated using a set of trigger phrases and which comprises at least one conditional likelihood of the trigger phrases conditioned on document in a corpus of data used to generate the pretrained model; using the pretrained model to filter an unfiltered dataset and generate a filtered dataset; and outputting the filtered dataset.

Type: Application

Filed: June 22, 2022

Publication date: December 29, 2022

Applicant: Cohere Inc.

Inventors: Helen NGO, Nicholas FROSST

System and method for filtering datasets using conditional-likelihood filtration

Training Transformers Using Sliceout

Training transformers using sliceout

System and Method for Training Language Models Using Already Trained Language Models

System and Method for Low Rank Training of Neural Networks

System And Method for Filtering Datasets Using Conditional-Likelihood Filtration