Patents by Inventor Kenneth Heafield

Kenneth Heafield has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8713034
    Abstract: The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: April 29, 2014
    Assignee: Google Inc.
    Inventors: Taylor Curtis, Kenneth Heafield
  • Patent number: 8209665
    Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.
    Type: Grant
    Filed: September 17, 2008
    Date of Patent: June 26, 2012
    Assignee: Infosys Limited
    Inventors: Girish Maskeri Rama, Kenneth Heafield, Santonu Sarkar
  • Patent number: 7958136
    Abstract: The present invention provides systems and methods for identifying similar documents. In an embodiment, the present invention identifies similar documents by (1) receiving document text for a current document that includes at least one word; (2) calculating a prominence score and a descriptiveness score for each word and each pair of consecutive words; (3) calculating a comparison metric for the current document; (4) finding at least one potential document, where document text for each potential document includes at least one of the words; and (5) analyzing each potential document to identify at least one similar document.
    Type: Grant
    Filed: March 18, 2008
    Date of Patent: June 7, 2011
    Assignee: Google Inc.
    Inventors: Taylor Curtis, Kenneth Heafield
  • Publication number: 20090254884
    Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.
    Type: Application
    Filed: September 17, 2008
    Publication date: October 8, 2009
    Applicant: Infosys Technologies Ltd.
    Inventors: Girish Maskeri Rama, Kenneth Heafield, Santonu Sarkar