Patents Assigned to COGNIGO RESEARCH LTD.

Token matching in large document corpora

Patent number: 10248646

Abstract: A method comprising receiving a dictionary comprising a plurality of entities, wherein each entity has a length of between 1 and n tokens; constructing a probabilistic data representation model comprising n Bloom filter (BF) pairs indexed from 1 to n; populating said probabilistic data representation model with a data representation of said entities, wherein, with respect to each BF pair indexed i: (i) a first BF is populated with the first i tokens of all said entities having at least i+1 tokens, and (ii) a second BF in populated with all said entities having exactly i tokens; receiving a text corpus, wherein said text corpus is segmented into tokens; and automatically matching each token in said text corpus against said populated probabilistic data representation model, wherein said matching comprises sequentially querying each said BF pair in the order of said indexing, to determine a match.

Type: Grant

Filed: August 22, 2018

Date of Patent: April 2, 2019

Assignee: COGNIGO RESEARCH LTD.

Inventor: Guy Leibovitz
Multi-modal electronic document classification

Patent number: 10223586

Abstract: A method comprising operating at least one hardware processor for: receiving, as input, a plurality of electronic documents, training a machine learning classifier based, at least on part, on a training set comprising: (i) labels associated with the electronic documents, (ii) raw text from each of said plurality of electronic documents, and (iii) a rasterized version of each of said plurality of electronic documents, and applying said machine learning classifier to classify one or more new electronic documents.

Type: Grant

Filed: July 17, 2018

Date of Patent: March 5, 2019

Assignee: COGNIGO RESEARCH LTD.

Inventors: Guy Leibovitz, Adam Bali

Token matching in large document corpora

Multi-modal electronic document classification