Patents by Inventor Robert Boman
Robert Boman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9064161Abstract: A system and method for detecting the presence of known or unknown objects based on visual features is disclosed. In the preferred embodiment, the system is a checkout system for detecting items of merchandise on a shopping cart. The merchandise checkout system preferably includes a feature extractor for extracting visual features from a plurality of images; a motion detector configured to detect one or more groups of the visual features present in at least two of the plurality of images; a classifier to classify each of said groups of the visual features based on one or more classification criteria, wherein each of the one or more parameters is associated with one of said groups of visual features; and an alarm configured to generate an alert if the one or more parameters for any of said groups of the visual features satisfy one or more classification criteria.Type: GrantFiled: June 8, 2007Date of Patent: June 23, 2015Assignee: Datalogic ADC, Inc.Inventors: Robert Boman, Luis Goncalves, James Ostrowski
-
Publication number: 20110286628Abstract: A method of organizing a set of recognition models of known objects stored in a database of an object recognition system includes determining a classification model for each known object and grouping the classification models into multiple classification model groups. Each classification model group identifies a portion of the database that contains the recognition models of the known objects having classification models that are members of the classification model group. The method also includes computing a representative classification model for each classification model group. Each representative classification model is derived from the classification models that are members of the classification model group. When a target object is to be recognized, the representative classification models are compared to a classification model of the target object to enable selection of a subset of the recognition models of the known objects for comparison to a recognition model of the target object.Type: ApplicationFiled: May 13, 2011Publication date: November 24, 2011Inventors: Luis F. Goncalves, Jim Ostrowski, Robert Boman
-
Patent number: 7324943Abstract: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.Type: GrantFiled: October 2, 2003Date of Patent: January 29, 2008Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Luca Rigazio, Robert Boman, Patrick Nguyen, Jean-Claude Junqua
-
Publication number: 20050240324Abstract: The vehicular monitoring system obtains audio information from the speech of occupants within the vehicle and then processes that audio information to extract information about the behavior of the vehicle occupants. Using the behavioral information the system then assesses whether said behavior is in compliance with a predefined set of rules. Severe violations are reported to a third party via cellular telephone or other means. Less severe violations are recorded in a log file that is subsequently uploaded to a networked computer system for review and analysis. The behavioral rules used to assess violations may be modified by the administrative user.Type: ApplicationFiled: April 26, 2004Publication date: October 27, 2005Inventors: Robert Boman, Roland Kuhn, Brian Hanson
-
Publication number: 20050228663Abstract: A media production system includes a textual alignment module aligning multiple speech recordings to textual lines of a script based on speech recognition results. A navigation module responds to user navigation selections respective of the textual lines of the script by communicating to the user corresponding, line-specific portions of the multiple speech recordings. An editing module responds to user associations of multiple speech recordings with textual lines by accumulating line-specific portions of the multiple speech recordings in a combination recording based on at least one of relationships of textual lines in the script to the combination recording, and temporal alignments between the multiple speech recordings and the combination recording.Type: ApplicationFiled: March 31, 2004Publication date: October 13, 2005Inventors: Robert Boman, Patrick Nguyen, Jean-Claude Junqua
-
Publication number: 20050148339Abstract: A personal item monitoring system includes a monitor having a transmitter and a receiver located therein. At least one radio identification tag is adapted to be coupled to a personal item. Alternatively, the radio identification tag may be pre-installed into the personal item. The monitor emits a radio frequency received by the radio frequency identification tag, and the radio frequency identification tag emits a responding signal if within a detection range. The monitor then alerts a user if the radio identification tag leaves the range of detection.Type: ApplicationFiled: January 6, 2004Publication date: July 7, 2005Inventors: Robert Boman, Brian Hanson
-
Publication number: 20050114357Abstract: An indexing system for tagging a media stream is provided. The indexing system includes a plurality of inputs for defining at least one tag. A tagging system assigns the tag to the media stream. A tag analysis system selectively distributes tags for review and editing by members of the collaborative group. A tag database stores the tag and the media stream. Retrieval architecture can search the database using the tags.Type: ApplicationFiled: November 20, 2003Publication date: May 26, 2005Inventors: Rathinavelu Chengalvarayan, Philippe Morin, Robert Boman, Ted Applebaum
-
Patent number: 6895257Abstract: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.Type: GrantFiled: February 18, 2002Date of Patent: May 17, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Robert Boman, Kirill Stoimenov, Roland Kuhn, Jean-Claude Junqua
-
Patent number: 6889189Abstract: System speakers are switched to function as sound input transducers to improve recognizer performance and to support recognizer features. A crossbar switch is selectively activated, either manually or under software control, to allow system loudspeakers to function as sound input transducers that supplement the recognition system microphone or microphone array. Using loudspeakers as “microphones” improves speech recognition in noisy environments, thus attaining better recognition performance with little added system cost. The loudspeakers, positioned in physically separate locations also provide spatial information that can be used to determine the location of the person speaking and thereby offer different functionality for different persons. Acoustic models are selected based on environmental and vehicle operating conditions and may be adapted dynamically using ambient information obtained using the loudspeakers as sound input transducers.Type: GrantFiled: September 26, 2003Date of Patent: May 3, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Robert Boman, Luca Rigazio, Brian Hanson, Rathinavelu Chengalvarayan
-
Publication number: 20050075881Abstract: A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.Type: ApplicationFiled: October 2, 2003Publication date: April 7, 2005Inventors: Luca Rigazio, Robert Boman, Patrick Nguyen, Jean-Claude Junqua
-
Publication number: 20050071159Abstract: System speakers are switched to function as sound input transducers to improve recognizer performance and to support recognizer features. A crossbar switch is selectively activated, either manually or under software control, to allow system loudspeakers to function as sound input transducers that supplement the recognition system microphone or microphone array. Using loudspeakers as “microphones” improves speech recognition in noisy environments, thus attaining better recognition performance with little added system cost. The loudspeakers, positioned in physically separate locations also provide spatial information that can be used to determine the location of the person speaking and thereby offer different functionality for different persons. Acoustic models are selected based on environmental and vehicle operating conditions and may be adapted dynamically using ambient information obtained using the loudspeakers as sound input transducers.Type: ApplicationFiled: September 26, 2003Publication date: March 31, 2005Inventors: Robert Boman, Luca Rigazio, Brian Hanson, Rathinavelu Chengalvarayan
-
Publication number: 20050010411Abstract: A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and/or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions.Type: ApplicationFiled: July 9, 2003Publication date: January 13, 2005Inventors: Luca Rigazio, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
-
Patent number: 6697778Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.Type: GrantFiled: July 5, 2000Date of Patent: February 24, 2004Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
-
Patent number: 6691091Abstract: A noise adaptation system and method provide for noise adaptation in a speech recognition system. The method includes the steps of generating a reference model based on a training speech signal, and compensating the reference model for additive noise in the cepstral domain. The reference model is also compensated for convolutional noise in the cepstral domain. In one embodiment, the convolutional noise is compensated for by estimating a convolutional bias between the reference model and a target speech signal. The estimated convolutional bias is transformed with a channel adaptation matrix, and the transformed convolutional bias is added to the reference model in the cepstral domain.Type: GrantFiled: July 31, 2000Date of Patent: February 10, 2004Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
-
Publication number: 20030157968Abstract: Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.Type: ApplicationFiled: February 18, 2002Publication date: August 21, 2003Inventors: Robert Boman, Kirill Stoimenov, Roland Kuhn, Jean-Claude Junqua
-
Patent number: 6529872Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.Type: GrantFiled: April 18, 2000Date of Patent: March 4, 2003Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
-
Patent number: 6480819Abstract: A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.Type: GrantFiled: February 25, 1999Date of Patent: November 12, 2002Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Robert Boman, Jean-Claude Junqua
-
Patent number: 6141644Abstract: Speech models are constructed and trained upon the speech of known client speakers (and also impostor speakers, in the case of speaker verification). Parameters from these models are concatenated to define supervectors and a linear transformation upon these supervectors results in a dimensionality reduction yielding a low-dimensional space called eigenspace. The training speakers are then represented as points or distributions in eigenspace. Thereafter, new speech data from the test speaker is placed into eigenspace through a similar linear transformation and the proximity in eigenspace of the test speaker to the training speakers serves to authenticate or identify the test speaker.Type: GrantFiled: September 4, 1998Date of Patent: October 31, 2000Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Roland Kuhn, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
-
Patent number: 5946649Abstract: The present invention eliminates injection noise in speech produced by esophageal speakers. A speech input signal is digitized. One copy of the digitized signal is used for analysis and the other is passed through a gain switch to an amplifier as output. A Fast Fourier Transform and a mean value of the digitized speech input signal is calculated. The Fast Fourier Transform (FFT) is passed through a morphological filter to produce a filtered spectrum. An occurrence of injection noise is detected by calculating a derivative of the filtered spectrum and determining from the mean value and the derivative a location and value of a largest peak and a second largest peak in the filtered spectrum. If the largest peak is lower in frequency than the second largest peak, and if all points above 2 KHz are less than the mean, then an occurrence of injection noise has been detected.Type: GrantFiled: April 16, 1997Date of Patent: August 31, 1999Assignee: Technology Research Association of Medical Welfare ApparatusInventors: Hector Raul Javkin, Michael Galler, Nancy Niedzielski, Robert Boman