Abstract: Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. A computer-implemented method includes receiving a training set of elements, each element in the training set being assigned to one of a plurality of categories and having one of a plurality of content profiles associated therewith; receiving a population set of elements, each element in the population set having one of the plurality of content profiles associated therewith; and calculating using at least one of a stacked regression algorithm, a bias formula algorithm, a noise elimination algorithm, and an ensemble method consisting of a plurality of algorithmic methods the results of which are averaged, based on the content profiles associated with and the categories assigned to elements in the training set and the content profiles associated with the elements of the population set, a distribution of elements of the population set over the categories.
Type:
Grant
Filed:
March 14, 2013
Date of Patent:
November 1, 2016
Assignee:
Crimson Hexagon, Inc.
Inventors:
Aykut Firat, Mitchell Brooks, Christopher Bingham, Amac Herdagdelen, Gary King
Abstract: Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. In general, the systems and methods can include a language classification module for classifying text of an input data set using the output of a training module. In an exemplary embodiment, a bootstrapping step feeds the output of the language classification module back into the training module to increase the accuracy of the language classification module. By iterating the language classification and training modules with input data having certain features, a user can tailor the language classification module for use with text having those or similar features.
Type:
Grant
Filed:
March 13, 2013
Date of Patent:
May 31, 2016
Assignee:
Crimson Hexagon, Inc.
Inventors:
Amac Herdagdelen, Aykut Firat, Christopher Bingham
Abstract: Improved searching of digital content using a large corpus of content collected from content generating websites is described. A search query received from a user is compared to the collected content to determine how often the elements of the search query are repeated in the collected content and whether these elements have frequently co-occurred with other elements in the content. Co-occurring elements are presented to the user so that the user can select one or more elements that best describe her intent in conducting the search. An updated search query is formed based on the information received from the user. The updated query is used to retrieve a number of documents and the retrieved documents are classified to distinguish relevant documents from those irrelevant to the user's intent. Documents classified as relevant are presented to the user.
Type:
Application
Filed:
September 4, 2015
Publication date:
March 10, 2016
Applicant:
Crimson Hexagon, Inc.
Inventors:
Aykut Firat, Mitchell Brooks, Christopher Bingham, Francesco Liuzzi
Abstract: Systems and methods are provided for classifying text based on language using one or more computer servers and storage devices. A computer-implemented method includes receiving a training set of elements, each element in the training set being assigned to one of a plurality of categories and having one of a plurality of content profiles associated therewith; receiving a population set of elements, each element in the population set having one of the plurality of content profiles associated therewith; and calculating using at least one of a stacked regression algorithm, a bias formula algorithm, a noise elimination algorithm, and an ensemble method consisting of a plurality of algorithmic methods the results of which are averaged, based on the content profiles associated with and the categories assigned to elements in the training set and the content profiles associated with the elements of the population set, a distribution of elements of the population set over the categories.
Type:
Application
Filed:
March 14, 2013
Publication date:
January 9, 2014
Applicant:
CRIMSON HEXAGON, INC.
Inventors:
Aykut Firat, Mitchell Brooks, Christopher Bingham, Amac Herdagdelen, Gary King