Patents by Inventor Dimitrios Dimitriadis

Dimitrios Dimitriadis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Sensor enhanced speech recognition

Patent number: 10083350

Abstract: A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user. Additionally, the system may obtain, from the visual content, metadata associated with the user and the environment of the user. The system may also include determining, based on the visual content and metadata, if the user is speaking. If the user is determined to be speaking, the system may obtain audio content associated with the user and the environment. The system may then adapt, based on the visual content, audio content, and metadata, one or more acoustic models that match the user and the environment. Once the one or more acoustic models are adapted and loaded, the system may enhance a speech recognition process or other process associated with the user.

Type: Grant

Filed: January 11, 2018

Date of Patent: September 25, 2018

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Dimitrios Dimitriadis, Donald J. Bowen, Mazin E. Gilbert, Horst J. Schroeter
PRE-DISTORTION SYSTEM FOR CANCELLATION OF NONLINEAR DISTORTION IN MOBILE DEVICES

Publication number: 20180262623

Abstract: A pre-distortion system for improved mobile device communications via cancellation of nonlinear distortion is disclosed. The pre-distortion system may transmit an acoustic signal from a network to a device, wherein the acoustic signal includes a linear signal and a nonlinear cancellation signal that cancels at least a portion of nonlinear distortions created once a loudspeaker in the device emits the linear signal. Thus, when a loudspeaker of a mobile device is operating and nonlinear distortions are generated by the loudspeaker or adjacent components of the mobile device in close proximity to the loudspeaker, the pre-distortion system may create one or more nonlinear cancellation signals in the network. The nonlinear cancellation signal may be combined with the linear signal sent to the loudspeaker to cancel the nonlinear distortion signal created by the loudspeaker emitting acoustic sounds from the linear signal. Thus, the nonlinear cancellation signal becomes a pre-distortion signal.

Type: Application

Filed: May 14, 2018

Publication date: September 13, 2018

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Horst J. Schroeter, Donald J. Bowen, Dimitrios Dimitriadis, Lusheng Ji
AUGMENTED MULTI-TIER CLASSIFIER FOR MULTI-MODAL VOICE ACTIVITY DETECTION

Publication number: 20180182415

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for detecting voice activity in a media signal in an augmented, multi-tier classifier architecture. A system configured to practice the method can receive, from a first classifier, a first voice activity indicator detected in a first modality for a human subject. Then, the system can receive, from a second classifier, a second voice activity indicator detected in a second modality for the human subject, wherein the first voice activity indicator and the second voice activity indicators are based on the human subject at a same time, and wherein the first modality and the second modality are different. The system can concatenate, via a third classifier, the first voice activity indicator and the second voice activity indicator with original features of the human subject, to yield a classifier output, and determine voice activity based on the classifier output.

Type: Application

Filed: February 12, 2018

Publication date: June 28, 2018

Inventors: Dimitrios DIMITRIADIS, Eric ZAVESKY, Matthew BURLICK
Exploiting Visual Information For Enhancing Audio Signals Via Source Separation And Beamforming

Publication number: 20180181812

Abstract: A system for exploiting visual information for enhancing audio signals via source separation and beamforming is disclosed. The system may obtain visual content associated with an environment of a user, and may extract, from the visual content, metadata associated with the environment. The system may determine a location of the user based on the extracted metadata. Additionally, the system may load, based on the location, an audio profile corresponding to the location of the user. The system may also load a user profile of the user that includes audio data associated with the user. Furthermore, the system may cancel, based on the audio profile and user profile, noise from the environment of the user. Moreover, the system may include adjusting, based on the audio profile and user profile, an audio signal generated by the user so as to enhance the audio signal during a communications session of the user.

Type: Application

Filed: February 26, 2018

Publication date: June 28, 2018

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Dimitrios Dimitriadis, Donald J. Bowen, Lusheng Ji, Horst J. Schroeter
SENSOR ENHANCED SPEECH RECOGNITION

Publication number: 20180137348

Abstract: A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user. Additionally, the system may obtain, from the visual content, metadata associated with the user and the environment of the user. The system may also include determining, based on the visual content and metadata, if the user is speaking. If the user is determined to be speaking, the system may obtain audio content associated with the user and the environment. The system may then adapt, based on the visual content, audio content, and metadata, one or more acoustic models that match the user and the environment. Once the one or more acoustic models are adapted and loaded, the system may enhance a speech recognition process or other process associated with the user.

Type: Application

Filed: January 11, 2018

Publication date: May 17, 2018

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Dimitrios Dimitriadis, Donald J. Bowen, Mazin E. Gilbert, Horst J. Schroeter
Pre-distortion system for cancellation of nonlinear distortion in mobile devices

Patent number: 9973633

Abstract: A pre-distortion system for improved mobile device communications via cancellation of nonlinear distortion is disclosed. The pre-distortion system may transmit an acoustic signal from a network to a device, wherein the acoustic signal includes a linear signal and a nonlinear cancellation signal that cancels at least a portion of nonlinear distortions created once a loudspeaker in the device emits the linear signal. Thus, when a loudspeaker of a mobile device is operating and nonlinear distortions are generated by the loudspeaker or adjacent components of the mobile device in close proximity to the loudspeaker, the pre-distortion system may create one or more nonlinear cancellation signals in the network. The nonlinear cancellation signal may be combined with the linear signal sent to the loudspeaker to cancel the nonlinear distortion signal created by the loudspeaker emitting acoustic sounds from the linear signal. Thus, the nonlinear cancellation signal becomes a pre-distortion signal.

Type: Grant

Filed: November 17, 2014

Date of Patent: May 15, 2018

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Horst J. Schroeter, Donald J. Bowen, Dimitrios Dimitriadis, Lusheng Ji
Exploiting visual information for enhancing audio signals via source separation and beamforming

Patent number: 9904851

Abstract: A system for exploiting visual information for enhancing audio signals via source separation and beamforming is disclosed. The system may obtain visual content associated with an environment of a user, and may extract, from the visual content, metadata associated with the environment. The system may determine a location of the user based on the extracted metadata. Additionally, the system may load, based on the location, an audio profile corresponding to the location of the user. The system may also load a user profile of the user that includes audio data associated with the user. Furthermore, the system may cancel, based on the audio profile and user profile, noise from the environment of the user. Moreover, the system may include adjusting, based on the audio profile and user profile, an audio signal generated by the user so as to enhance the audio signal during a communications session of the user.

Type: Grant

Filed: June 11, 2014

Date of Patent: February 27, 2018

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Dimitrios Dimitriadis, Donald J. Bowen, Lusheng Ji, Horst J. Schroeter
Augmented multi-tier classifier for multi-modal voice activity detection

Patent number: 9892745

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for detecting voice activity in a media signal in an augmented, multi-tier classifier architecture. A system configured to practice the method can receive, from a first classifier, a first voice activity indicator detected in a first modality for a human subject. Then, the system can receive, from a second classifier, a second voice activity indicator detected in a second modality for the human subject, wherein the first voice activity indicator and the second voice activity indicators are based on the human subject at a same time, and wherein the first modality and the second modality are different. The system can concatenate, via a third classifier, the first voice activity indicator and the second voice activity indicator with original features of the human subject, to yield a classifier output, and determine voice activity based on the classifier output.

Type: Grant

Filed: August 23, 2013

Date of Patent: February 13, 2018

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Dimitrios Dimitriadis, Eric Zavesky, Matthew Burlick
Sensor enhanced speech recognition

Patent number: 9870500

Abstract: A system for sensor enhanced speech recognition is disclosed. The system may obtain visual content or other content associated with a user and an environment of the user. Additionally, the system may obtain, from the visual content, metadata associated with the user and the environment of the user. The system may also include determining, based on the visual content and metadata, if the user is speaking. If the user is determined to be speaking, the system may obtain audio content associated with the user and the environment. The system may then adapt, based on the visual content, audio content, and metadata, one or more acoustic models that match the user and the environment. Once the one or more acoustic models are adapted and loaded, the system may enhance a speech recognition process or other process associated with the user.

Type: Grant

Filed: June 11, 2014

Date of Patent: January 16, 2018

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Dimitrios Dimitriadis, Donald J. Bowen, Mazin E. Gilbert, Horst J. Schroeter
SYSTEM AND METHOD OF USING NEURAL TRANSFORMS OF ROBUST AUDIO FEATURES FOR SPEECH PROCESSING

Publication number: 20170358298

Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.

Type: Application

Filed: August 29, 2017

Publication date: December 14, 2017

Inventors: Enrico Luigi BOCCHIERI, Dimitrios DIMITRIADIS
System and method of using neural transforms of robust audio features for speech processing

Patent number: 9754587

Abstract: A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.

Type: Grant

Filed: February 29, 2016

Date of Patent: September 5, 2017

Assignee: Nuance Communications, Inc.

Inventors: Enrico Luigi Bocchieri, Dimitrios Dimitriadis
METHOD AND APPARATUS FOR PROCESSING COMMANDS DIRECTED TO A MEDIA CENTER

Publication number: 20170244997

Abstract: A system that incorporates teachings of the subject disclosure may include, for example, a method for controlling a steering of a plurality of cameras to identify a plurality of potential sources, identifying the plurality of potential sources according to image data provided by the plurality of cameras, assigning a beam of a plurality of beams of a plurality of microphones to each of the plurality of potential sources, detecting a first command comprising one of a first audible cue based on signals from a portion of the plurality of microphones, a first visual cue based on image data from one of the plurality of cameras, or both for controlling a media center, and configuring the media center according to the first command. Other embodiments are disclosed.

Type: Application

Filed: May 8, 2017

Publication date: August 24, 2017

Inventors: DIMITRIOS DIMITRIADIS, HORST JUERGEN SCHROETER
SYSTEM AND METHOD FOR NETWORK BANDWIDTH MANAGEMENT FOR ADJUSTING AUDIO QUALITY

Publication number: 20170236527

Abstract: Disclosed herein are systems, methods, and computer-readable storage devices for processing audio signals. An example system configured to practice the method receives audio at a device to be transmitted to a remote speech processing system. The system analyzes one of noise conditions, need for an enhanced speech quality, and network load to yield an analysis. Based on the analysis, the system determines to bypass user-defined options for enhancing audio for speech processing. Then, based on the analysis, the system can modify an audio transmission parameter used to transmit the audio from the device to the remote speech processing system. The audio transmission parameter can be one of an amount of coding, a chosen codec, an amount of coding, or a number of audio channels, for example.

Type: Application

Filed: May 4, 2017

Publication date: August 17, 2017

Inventors: Dimitrios DIMITRIADIS, John CROCKETT, Horst Juergen SCHROETER
System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Patent number: 9728183

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Grant

Filed: November 10, 2015

Date of Patent: August 8, 2017

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
Method and apparatus for processing commands directed to a media center

Patent number: 9678713

Abstract: A system that incorporates teachings of the subject disclosure may include, for example, a method for controlling a steering of a plurality of cameras to identify a plurality of potential sources, identifying the plurality of potential sources according to image data provided by the plurality of cameras, assigning a beam of a plurality of beams of a plurality of microphones to each of the plurality of potential sources, detecting a first command comprising one of a first audible cue based on signals from a portion of the plurality of microphones, a first visual cue based on image data from one of the plurality of cameras, or both for controlling a media center, and configuring the media center according to the first command. Other embodiments are disclosed.

Type: Grant

Filed: October 9, 2012

Date of Patent: June 13, 2017

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Dimitrios Dimitriadis, Horst Juergen Schroeter
Navigation route updates

Patent number: 9664525

Abstract: Concepts and technologies are disclosed herein for providing navigation routes and/or providing navigation route updates. According to various embodiments of the concepts and technologies disclosed herein, a navigation application can be configured to obtain route data from a routing service. The routing service can be configured to use navigation data locally stored and/or obtained from a number of sources to generate navigation routes and/or to update navigation routes. The generated and/or updated navigation routes can be provided to the user device as route data that can be used to provide navigation directions to a user.

Type: Grant

Filed: August 27, 2014

Date of Patent: May 30, 2017

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Dimitrios Dimitriadis
System and method for network bandwidth management for adjusting audio quality

Patent number: 9646626

Abstract: Disclosed herein are systems, methods, and computer-readable storage devices for processing audio signals. An example system configured to practice the method receives audio at a device to be transmitted to a remote speech processing system. The system analyzes one of noise conditions, need for an enhanced speech quality, and network load to yield an analysis. Based on the analysis, the system determines to bypass user-defined options for enhancing audio for speech processing. Then, based on the analysis, the system can modify an audio transmission parameter used to transmit the audio from the device to the remote speech processing system. The audio transmission parameter can be one of an amount of coding, a chosen codec, an amount of coding, or a number of audio channels, for example.

Type: Grant

Filed: November 22, 2013

Date of Patent: May 9, 2017

Assignees: AT&T Intellectual Property I, L.P., AT&T Mobility II LLC

Inventors: Dimitrios Dimitriadis, John Crockett, Horst Juergen Schroeter
Real-time emotion tracking system

Patent number: 9570092

Abstract: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A number of segments of the audio signal are analyzed based on separate lexical and acoustic evaluations, and, for each segment, an emotional state and a confidence score of the emotional state are determined. A current emotional state of the audio signal is tracked for each of the number of segments. For a particular segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and a comparison of the confidence score of the particular segment to a predetermined threshold.

Type: Grant

Filed: April 26, 2016

Date of Patent: February 14, 2017

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Dimitrios Dimitriadis, Mazin E. Gilbert, Taniya Mishra, Horst J. Schroeter
System and method for speech recognition modeling for mobile voice search

Patent number: 9558738

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating an acoustic model for use in speech recognition. A system configured to practice the method first receives training data and identifies non-contextual lexical-level features in the training data. Then the system infers sentence-level features from the training data and generates a set of decision trees by node-splitting based on the non-contextual lexical-level features and the sentence-level features. The system decorrelates training vectors, based on the training data, for each decision tree in the set of decision trees to approximate full-covariance Gaussian models, and then can train an acoustic model for use in speech recognition based on the training data, the set of decision trees, and the training vectors.

Type: Grant

Filed: March 8, 2011

Date of Patent: January 31, 2017

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Enrico Bocchieri, Diamantino Antonio Caseiro, Dimitrios Dimitriadis
System and method for building and evaluating automatic speech recognition via an application programmer interface

Patent number: 9484018

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction.

Type: Grant

Filed: November 23, 2010

Date of Patent: November 1, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Enrico Bocchieri, Dimitrios Dimitriadis, Horst J. Schroeter

prev 1 2 3 4 next