Abstract: Systems and methods receive a selection of and filter documents for an electronic document search. A semantic vector analysis module is applied to text of the filtered documents to build respective floating point vectors for each of the documents, and the respective floating point vectors are stored to a vector database. A textual query is received and vectorized to produce a query vector representing search criteria. The vector database is searched in accordance with the query vector to identify similar documents that satisfy a similarity measure. A ranked list of vectors that include highest similarity scores relative the query vector is generated and similar documents are identified. A neural network is used to process document data of the similar documents to generate a textual response to the textual query, and control signal(s) are transmitted to a user device to initiate displaying the generated textual response and question answer events.
Abstract: The technology implements diversity sampling for a technology-assisted review of documents. An apparatus obtains an unlabeled set of documents and constructs a first batch of documents. The apparatus obtains labels for the documents and constructs a classification model using the labeled documents. The apparatus logs a found rate of a subsequent batch of documents from the unlabeled set of documents, the subsequent batch of documents being selected based on a comparison to the classification model. The apparatus determines that the classification model requires further training based on the found rate of the subsequent batch of documents and constructs a second batch of documents that includes an amount of diversity, which may be based on the found rate. The apparatus obtains labels for the second batch of documents and updates the model using the labeled second batch of documents. The method may be repeated to continue to refine the classification model.
Type:
Grant
Filed:
August 27, 2020
Date of Patent:
October 17, 2023
Assignee:
Consilio, LLC
Inventors:
Jeffrey A. Johnson, Md Ahsan Habib, Chandler L. Burgess
Abstract: Systems and methods enable convenient and accurate searching, filtering, reviewing, and classification of electronic documents without the loss of metadata. A communication data source file is parsed into conversation-specific files that include message content and metadata. The message content and metadata are displayed on a computing device operated by a reviewer. To streamline the review process, the reviewer can filter display of the message content according to various metadata categories as well as search conversation-specific files using the metadata categories.