Patents by Inventor Robert Boman

Robert Boman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for detecting generic items in image sequence

Patent number: 9064161

Abstract: A system and method for detecting the presence of known or unknown objects based on visual features is disclosed. In the preferred embodiment, the system is a checkout system for detecting items of merchandise on a shopping cart. The merchandise checkout system preferably includes a feature extractor for extracting visual features from a plurality of images; a motion detector configured to detect one or more groups of the visual features present in at least two of the plurality of images; a classifier to classify each of said groups of the visual features based on one or more classification criteria, wherein each of the one or more parameters is associated with one of said groups of visual features; and an alarm configured to generate an alert if the one or more parameters for any of said groups of the visual features satisfy one or more classification criteria.

Type: Grant

Filed: June 8, 2007

Date of Patent: June 23, 2015

Assignee: Datalogic ADC, Inc.

Inventors: Robert Boman, Luis Goncalves, James Ostrowski
SYSTEMS AND METHODS FOR OBJECT RECOGNITION USING A LARGE DATABASE

Publication number: 20110286628

Abstract: A method of organizing a set of recognition models of known objects stored in a database of an object recognition system includes determining a classification model for each known object and grouping the classification models into multiple classification model groups. Each classification model group identifies a portion of the database that contains the recognition models of the known objects having classification models that are members of the classification model group. The method also includes computing a representative classification model for each classification model group. Each representative classification model is derived from the classification models that are members of the classification model group. When a target object is to be recognized, the representative classification models are compared to a classification model of the target object to enable selection of a subset of the recognition models of the known objects for comparison to a recognition model of the target object.

Type: Application

Filed: May 13, 2011

Publication date: November 24, 2011

Inventors: Luis F. Goncalves, Jim Ostrowski, Robert Boman
Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing

Patent number: 7324943

Abstract: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

Type: Grant

Filed: October 2, 2003

Date of Patent: January 29, 2008

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Luca Rigazio, Robert Boman, Patrick Nguyen, Jean-Claude Junqua
Enhanced automotive monitoring system using sound

Publication number: 20050240324

Abstract: The vehicular monitoring system obtains audio information from the speech of occupants within the vehicle and then processes that audio information to extract information about the behavior of the vehicle occupants. Using the behavioral information the system then assesses whether said behavior is in compliance with a predefined set of rules. Severe violations are reported to a third party via cellular telephone or other means. Less severe violations are recorded in a log file that is subsequently uploaded to a networked computer system for review and analysis. The behavioral rules used to assess violations may be modified by the administrative user.

Type: Application

Filed: April 26, 2004

Publication date: October 27, 2005

Inventors: Robert Boman, Roland Kuhn, Brian Hanson
Media production system using time alignment to scripts

Publication number: 20050228663

Abstract: A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.

Type: Application

Filed: March 31, 2004

Publication date: October 13, 2005

Inventors: Robert Boman, Patrick Nguyen, Jean-Claude Junqua
Personal item monitor using radio frequency identification

Publication number: 20050148339

Abstract: A personal item monitoring system includes a monitor having a transmitter and a receiver located therein. At least one radio identification tag is adapted to be coupled to a personal item. Alternatively, the radio identification tag may be pre-installed into the personal item. The monitor emits a radio frequency received by the radio frequency identification tag, and the radio frequency identification tag emits a responding signal if within a detection range. The monitor then alerts a user if the radio identification tag leaves the range of detection.

Type: Application

Filed: January 6, 2004

Publication date: July 7, 2005

Inventors: Robert Boman, Brian Hanson
Collaborative media indexing system and method

Publication number: 20050114357

Abstract: An indexing system for tagging a media stream is provided. The indexing system includes a plurality of inputs for defining at least one tag. A tagging system assigns the tag to the media stream. A tag analysis system selectively distributes tags for review and editing by members of the collaborative group. A tag database stores the tag and the media stream. Retrieval architecture can search the database using the tags.

Type: Application

Filed: November 20, 2003

Publication date: May 26, 2005

Inventors: Rathinavelu Chengalvarayan, Philippe Morin, Robert Boman, Ted Applebaum
Personalized agent for portable devices and cellular phone

Patent number: 6895257

Abstract: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.

Type: Grant

Filed: February 18, 2002

Date of Patent: May 17, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Robert Boman, Kirill Stoimenov, Roland Kuhn, Jean-Claude Junqua
Speech recognizer performance in car and home applications utilizing novel multiple microphone configurations

Patent number: 6889189

Abstract: System speakers are switched to function as sound input transducers to improve recognizer performance and to support recognizer features. A crossbar switch is selectively activated, either manually or under software control, to allow system loudspeakers to function as sound input transducers that supplement the recognition system microphone or microphone array. Using loudspeakers as “microphones” improves speech recognition in noisy environments, thus attaining better recognition performance with little added system cost. The loudspeakers, positioned in physically separate locations also provide spatial information that can be used to determine the location of the person speaking and thereby offer different functionality for different persons. Acoustic models are selected based on environmental and vehicle operating conditions and may be adapted dynamically using ambient information obtained using the loudspeakers as sound input transducers.

Type: Grant

Filed: September 26, 2003

Date of Patent: May 3, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Robert Boman, Luca Rigazio, Brian Hanson, Rathinavelu Chengalvarayan
Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing

Publication number: 20050075881

Abstract: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.

Type: Application

Filed: October 2, 2003

Publication date: April 7, 2005

Inventors: Luca Rigazio, Robert Boman, Patrick Nguyen, Jean-Claude Junqua
SPEECH RECOGNIZER PERFORMANCE IN CAR AND HOME APPLICATIONS UTILIZING NOVEL MULTIPLE MICROPHONE CONFIGURATIONS

Publication number: 20050071159

Abstract: System speakers are switched to function as sound input transducers to improve recognizer performance and to support recognizer features. A crossbar switch is selectively activated, either manually or under software control, to allow system loudspeakers to function as sound input transducers that supplement the recognition system microphone or microphone array. Using loudspeakers as “microphones” improves speech recognition in noisy environments, thus attaining better recognition performance with little added system cost. The loudspeakers, positioned in physically separate locations also provide spatial information that can be used to determine the location of the person speaking and thereby offer different functionality for different persons. Acoustic models are selected based on environmental and vehicle operating conditions and may be adapted dynamically using ambient information obtained using the loudspeakers as sound input transducers.

Type: Application

Filed: September 26, 2003

Publication date: March 31, 2005

Inventors: Robert Boman, Luca Rigazio, Brian Hanson, Rathinavelu Chengalvarayan
Speech data mining for call center management

Publication number: 20050010411

Abstract: A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and/or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions.

Type: Application

Filed: July 9, 2003

Publication date: January 13, 2005

Inventors: Luca Rigazio, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Speaker verification and speaker identification based on a priori knowledge

Patent number: 6697778

Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

Type: Grant

Filed: July 5, 2000

Date of Patent: February 24, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Method for additive and convolutional noise adaptation in automatic speech recognition using transformed matrices

Patent number: 6691091

Abstract: A noise adaptation system and method provide for noise adaptation in a speech recognition system. The method includes the steps of generating a reference model based on a training speech signal, and compensating the reference model for additive noise in the cepstral domain. The reference model is also compensated for convolutional noise in the cepstral domain. In one embodiment, the convolutional noise is compensated for by estimating a convolutional bias between the reference model and a target speech signal. The estimated convolutional bias is transformed with a channel adaptation matrix, and the transformed convolutional bias is added to the reference model in the cepstral domain.

Type: Grant

Filed: July 31, 2000

Date of Patent: February 10, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
Personalized agent for portable devices and cellular phone

Publication number: 20030157968

Abstract: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.

Type: Application

Filed: February 18, 2002

Publication date: August 21, 2003

Inventors: Robert Boman, Kirill Stoimenov, Roland Kuhn, Jean-Claude Junqua
Method for noise adaptation in automatic speech recognition using transformed matrices

Patent number: 6529872

Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.

Type: Grant

Filed: April 18, 2000

Date of Patent: March 4, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television

Patent number: 6480819

Abstract: A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.

Type: Grant

Filed: February 25, 1999

Date of Patent: November 12, 2002

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Robert Boman, Jean-Claude Junqua
Speaker verification and speaker identification based on eigenvoices

Patent number: 6141644

Abstract: Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.

Type: Grant

Filed: September 4, 1998

Date of Patent: October 31, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Esophageal speech injection noise detection and rejection

Patent number: 5946649

Abstract: The present invention eliminates injection noise in speech produced by esophageal speakers. A speech input signal is digitized. One copy of the digitized signal is used for analysis and the other is passed through a gain switch to an amplifier as output. A Fast Fourier Transform and a mean value of the digitized speech input signal is calculated. The Fast Fourier Transform (FFT) is passed through a morphological filter to produce a filtered spectrum. An occurrence of injection noise is detected by calculating a derivative of the filtered spectrum and determining from the mean value and the derivative a location and value of a largest peak and a second largest peak in the filtered spectrum. If the largest peak is lower in frequency than the second largest peak, and if all points above 2 KHz are less than the mean, then an occurrence of injection noise has been detected.

Type: Grant

Filed: April 16, 1997

Date of Patent: August 31, 1999

Assignee: Technology Research Association of Medical Welfare Apparatus

Inventors: Hector Raul Javkin, Michael Galler, Nancy Niedzielski, Robert Boman