Patents Assigned to MSC Intellectual Properties B.V.
  • Patent number: 10565502
    Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatio
    Type: Grant
    Filed: January 7, 2016
    Date of Patent: February 18, 2020
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventor: Johannes Cornelis Scholtes
  • Patent number: 9477750
    Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.
    Type: Grant
    Filed: October 26, 2015
    Date of Patent: October 25, 2016
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
  • Patent number: 9264387
    Abstract: A system, method and computer program product for authorship determination, and alias resolution, including a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; and a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection.
    Type: Grant
    Filed: February 6, 2013
    Date of Patent: February 16, 2016
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes Cornelis Scholtes, Freek Peter Elisabeth Maes
  • Patent number: 9235812
    Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatio
    Type: Grant
    Filed: December 4, 2012
    Date of Patent: January 12, 2016
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventor: Johannes Cornelis Scholtes
  • Patent number: 9171072
    Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: October 27, 2015
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
  • Patent number: 9135252
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: September 15, 2015
    Assignee: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Publication number: 20140280173
    Abstract: A system, method and computer program product for validating a document classification process, including a document collection; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set of documents from the document collection; and a document review process performed on the random validation set of documents to validate results of the document classification process. The system, method and computer program product are configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the document classification process based on the documents that are validated, and given a size of a total data set of the document collection.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes Cornelis Scholtes, Yuriy Pasichnyk
  • Publication number: 20140222928
    Abstract: A system, method and computer program product for authorship determination, and alias resolution, including a document collection; a Jaro-Winkler similarity module configured for performing authorship determination and alias resolution based on at least one of email addresses, user identification numbers (IDs) on social networks, names written in text, and proper names, including countries and cities in the document collection; an authorship Support Vector Machine (SVM) module configured for performing authorship determination and alias resolution based on content of documents in the document collection, including at least one of emails, and social networks information; and a Jaccard similarity module configured for performing authorship determination and alias resolution based on link networks in the document collection.
    Type: Application
    Filed: February 6, 2013
    Publication date: August 7, 2014
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes Cornelis Scholtes, Freek Peter Elisabeth Maes
  • Publication number: 20140156567
    Abstract: A system, method and computer program product for automatic document classification, including an extraction module configured to extract structural, syntactical and/or semantic information from a document and normalize the extracted information; a machine learning module configured to generate a model representation for automatic document classification based on feature vectors built from the normalized and extracted semantic information for supervised and/or unsupervised clustering or machine learning; and a classification module configured to select a non-classified document from a document collection, and via the extraction module extract normalized structural, syntactical and/or semantic information from the selected document, and generate via the machine learning module a model representation of the selected document based on feature vectors, and match the model representation of the selected document against the machine learning model representation to generate a document category, and/or classificatio
    Type: Application
    Filed: December 4, 2012
    Publication date: June 5, 2014
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventor: Johannes Cornelis Scholtes
  • Publication number: 20130318054
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Application
    Filed: August 5, 2013
    Publication date: November 28, 2013
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Patent number: 8504578
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Grant
    Filed: August 16, 2012
    Date of Patent: August 6, 2013
    Assignee: MSC Intellectual Properties B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Publication number: 20120317126
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Application
    Filed: August 16, 2012
    Publication date: December 13, 2012
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Patent number: 8250079
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: August 21, 2012
    Assignee: MSC Intellectual Properties B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Publication number: 20110191354
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Application
    Filed: March 30, 2011
    Publication date: August 4, 2011
    Applicant: MSC INTELLECTUAL PROPERTIES B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen
  • Patent number: 7930306
    Abstract: A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: April 19, 2011
    Assignee: MSC Intellectual Properties B.V.
    Inventors: Johannes C. Scholtes, Siebe Bloembergen