Patents by Inventor Robert Charles Paulsen, Jr.

Robert Charles Paulsen, Jr. has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 6704698
    Abstract: A technique for identifying a language in which a computer document is written. Words from the document are compared to words in a plurality of word tables. Each of the word tables is associated with a respective candidate language and contains a selection of the most frequently used words in the language. The words in each word table are selected based on the frequency of occurrence in a candidate language so that each word table covers an equivalent percentage of the associated candidate language. A count is accumulated for each candidate language each time one of the plurality of words from the document is present in the associated word table. In the simple counting embodiment of the invention, the count is incremented by one. The language of the document is identified as the language associated with the count having the highest value.
    Type: Grant
    Filed: August 19, 1996
    Date of Patent: March 9, 2004
    Assignee: International Business Machines Corporation
    Inventors: Robert Charles Paulsen, Jr., Michael John Martino
  • Patent number: 6216102
    Abstract: Comparing the short and truncated words of a document to word tables of most frequently used words in each of the respective candidate language to identify the language in which the document is written. First, a plurality of words from a document is read into a computer memory. Then, words within the plurality of words which exceed a predetermined length are truncated to produce a set of short and truncated words. The set of short and truncated words are compared to words in a plurality of word tables. Each word table is associated with and contains a selection of most frequently used words in a respective candidate language. Although the most frequently words in most languages tend to be short those which which exceed the predetermined length may be truncated in the word tables. A respective count for each candidate language each time one of the set of short and truncated words from the document matches a word in a word table associated with the candidate language.
    Type: Grant
    Filed: September 30, 1996
    Date of Patent: April 10, 2001
    Assignee: International Business Machines Corporation
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.
  • Patent number: 6078917
    Abstract: A method of retrieving documents from a document database is disclosed. A set of documents is retrieved according to a first search statement. A signature for a first retrieved document, and preferably other documents by searching for words in the first document and removing common words which occur in a relatively high frequency in a natural language in which the first document is written. The document for which the signature was developed is displayed. Responsive to a user indication that a second search is to be made, deriving a second search statement from the signature of the document.In the preferred embodiment, a "spectrum" of documents is prepared and presented to the user. The signatures of a plurality of documents from the documents retrieved according to the first search statement by searching for words in the documents and removing common words which occur in a relatively high frequency in a natural language in which the documents are written.
    Type: Grant
    Filed: December 18, 1997
    Date of Patent: June 20, 2000
    Assignee: International Business Machines Corporation
    Inventors: Robert Charles Paulsen, Jr., Michael John Martino
  • Patent number: 6061646
    Abstract: The method for providing information in response to a question in one of a plurality of natural spoken languages begins by recognizing a detected utterance with a speech recognition engine equipped with a plurality of small dictionaries. Each of the small dictionaries is for respective one of the plurality of languages. Each small dictionary including speech data for a selected few common words in the respective language. Next, the method selects one of the plurality of languages as the language of the detected utterance based on a number of recognized words for each language from the small dictionaries. Next, a more thorough recognition of the detected utterance using a large dictionary for the language of the detected utterance which contains information on a much larger vocabulary. Finally, the method responds to the user in the selected language, i.e. the language of the detected utterance, either aurally or visually. Once the language of a first utterance is identified, a timer is started.
    Type: Grant
    Filed: December 18, 1997
    Date of Patent: May 9, 2000
    Assignee: International Business Machines Corp.
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.
  • Patent number: 6023670
    Abstract: The language in which a computer document is written is identified. A plurality of words from the document are compared to words in a word list associated with a candidate language. The words in the word list are a selection of the most frequently used words in the candidate language. A count of matches between words in the document and words in the word list for each word in the word list to produce a sample count. The sample count is correlated to a reference count for the candidate language to produce a correlation score for the candidate language. The language of the document is identified based on the correlation score. Generally, there are a plurality of candidate languages. Thus, comparing, accumulating, correlating and identifying processes are practiced for each language. The language of the document is identified as the candidate language having a reference count which generates a highest correlation score.
    Type: Grant
    Filed: December 20, 1996
    Date of Patent: February 8, 2000
    Assignee: International Business Machines Corporation
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.
  • Patent number: 6009382
    Abstract: A language in which a document is written is identified through the use of sets of most frequently used words in each of a plurality of candidate languages. Each set of most frequently used words in a respective set of word tables for a respective candidate language according to letter pairs in each set of most frequently used words. In the preferred embodiment, each word table is an N.times.N bit table, where each bit represents a given letter pair at a particular place in one of the most frequently used words in one of the candidate languages. Words from the document are compared to the most frequently used words stored in the word tables. A count of the number of matches between the words from the document and the words stored in each respective set of word tables is kept for each respective language. The language of the document as the respective candidate language having the greatest number of matches.
    Type: Grant
    Filed: September 30, 1996
    Date of Patent: December 28, 1999
    Assignee: International Business Machines Corporation
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.
  • Patent number: 6002998
    Abstract: A language in which a document is written is identified by comparing the words of a document to the most frequently used words in a plurality of candidate languages. The words are stored in a plurality of sets of word tables, each set of word tables for storing a selected set of most frequently used words in a respective candidate language according to letter pairs in the words. In the preferred embodiment, each of the word tables is an N.times.N bit table, where each bit represents a given letter pair at a particular place in one of the most frequently used words in a respective candidate language. A set of table access registers, is used for accessing a respective set of word tables to compare words from the document to words stored in the word tables; each table access register accesses word tables for a respective candidate language. One or more word counting registers count a number of matches for a respective candidate language.
    Type: Grant
    Filed: September 30, 1996
    Date of Patent: December 14, 1999
    Assignee: International Business Machines Corporation
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.
  • Patent number: 5913185
    Abstract: Language shift points in a computer document written in a plurality of natural languages are determined. An interval is defined on and moved through a text document in a computer memory, the interval contains a portion of the text in the document. As the interval is moved through the document for each position of the interval, a probability that the text in the interval is written in each of a plurality of candidate languages is determined for the position. For the first position of the interval, generally the beginning of the document, a first candidate language is classified as the current language if it has the highest probability of all the candidate languages within the interval. A language shift point in the document is identified where the relative probability of a second candidate language is higher than the current language at a new position of the interval. At this point, the second candidate language is classified as the current language in the document after the language shift point.
    Type: Grant
    Filed: December 20, 1996
    Date of Patent: June 15, 1999
    Assignee: International Business Machines Corporation
    Inventors: Michael John Martino, Robert Charles Paulsen, Jr.