Patents Assigned to MSC Intellectual Properties B.V.
-
Patent number: 10565502Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatioType: GrantFiled: January 7, 2016Date of Patent: February 18, 2020Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventor: Johannes Cornelis Scholtes
-
Patent number: 9477750Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.Type: GrantFiled: October 26, 2015Date of Patent: October 25, 2016Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
-
Patent number: 9264387Abstract: A system, method and computer program product for authorship determination, and alias resolution, including a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; and a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection.Type: GrantFiled: February 6, 2013Date of Patent: February 16, 2016Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes Cornelis Scholtes, Freek Peter Elisabeth Maes
-
Patent number: 9235812Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatioType: GrantFiled: December 4, 2012Date of Patent: January 12, 2016Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventor: Johannes Cornelis Scholtes
-
Patent number: 9171072Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.Type: GrantFiled: March 13, 2013Date of Patent: October 27, 2015Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
-
Patent number: 9135252Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: GrantFiled: August 5, 2013Date of Patent: September 15, 2015Assignee: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Publication number: 20140280173Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.Type: ApplicationFiled: March 13, 2013Publication date: September 18, 2014Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
-
Publication number: 20140222928Abstract: A system, method and computer program product for authorship determination, and alias resolution, including a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; and a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection.Type: ApplicationFiled: February 6, 2013Publication date: August 7, 2014Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes Cornelis Scholtes, Freek Peter Elisabeth Maes
-
Publication number: 20140156567Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatioType: ApplicationFiled: December 4, 2012Publication date: June 5, 2014Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventor: Johannes Cornelis Scholtes
-
Publication number: 20130318054Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: ApplicationFiled: August 5, 2013Publication date: November 28, 2013Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Patent number: 8504578Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: GrantFiled: August 16, 2012Date of Patent: August 6, 2013Assignee: MSC Intellectual Properties B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Publication number: 20120317126Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: ApplicationFiled: August 16, 2012Publication date: December 13, 2012Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Patent number: 8250079Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: GrantFiled: March 30, 2011Date of Patent: August 21, 2012Assignee: MSC Intellectual Properties B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Publication number: 20110191354Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: ApplicationFiled: March 30, 2011Publication date: August 4, 2011Applicant: MSC INTELLECTUAL PROPERTIES B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen
-
Patent number: 7930306Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.Type: GrantFiled: April 30, 2008Date of Patent: April 19, 2011Assignee: MSC Intellectual Properties B.V.Inventors: Johannes C. Scholtes, Siebe Bloembergen