Patents by Inventor Ross G. Cutler
Ross G. Cutler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240428768Abstract: This document relates to distributed teleconferencing. Some implementations can employ adaptive audio or video enhancement to address scenarios where audio enhancement can tend to remove desirable sounds. For instance, adaptive audio enhancement can involve detecting the presence of a sound, such as clapping, and modifying audio enhancement so that the sound is retained in an enhanced audio signal. Adaptive video processing can involve detecting the presence of the sound and adding a graphical identifier to a video signal that conveys the presence of that sound.Type: ApplicationFiled: June 20, 2023Publication date: December 26, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. CUTLER, Harishchandra DUBEY, Vishak GOPAL
-
Publication number: 20240428380Abstract: This document relates to personalized image or video processing. For example, the disclosed implementations can identify a designated user of a computing device that participates in a video call with other users. When another person appears in a video feed captured by the computing device, the other person can be removed. This can avoid distractions that can be caused, for example, by family members or pets that inadvertently walk into the field of view while a designated user is participating in a video call. Similar techniques can be employed to remove people other than designated users from still images.Type: ApplicationFiled: June 20, 2023Publication date: December 26, 2024Applicant: Microsoft Technology Licensing, LLCInventor: Ross G. CUTLER
-
Publication number: 20240428803Abstract: This document relates to active speaker detection using distributed devices. For example, the disclosed implementations can employ personal devices of one or more users to detect when those users are speaking during a call with other users. Then, a camera on the personal device can be employed to obtain a front-facing view of the user, which can be provided to other call participants. In some cases, a microphone and/or camera on the user's device are employed to detect when the user is actively speaking.Type: ApplicationFiled: June 20, 2023Publication date: December 26, 2024Applicant: Microsoft Technology Licensing, LLCInventor: Ross G. CUTLER
-
Publication number: 20240406621Abstract: This document relates to distributed devices teleconferencing. Some implementations can employ adaptive microphone selection based on signal characteristics such as signal-to-noise ratios or speech quality, and/or based on a microphone affinity approach. The selected microphone signals can be synchronized and mixed to generate a playback signal that is sent to a remote device. Further implementations can perform proximity-based mixing, where microphone signals received from devices in a particular room can be omitted from playback signals transmitted to other devices in the same room. These techniques can allow enhanced call quality for teleconferencing sessions where co-located users can employ their own devices to participate in a call with other users.Type: ApplicationFiled: May 31, 2023Publication date: December 5, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. CUTLER, Hong Wang SODOMA, Robert Andreas AICHNER, Vinod PRAKASH, Warren Michael LAM
-
Publication number: 20240005939Abstract: Systems, methods, and computer-readable storage devices are disclosed for personalizing speech enhancement components without enrollment in speech communication systems. One method including: receiving audio data, the audio data including speech, and the audio data to be processed by at least one speech enhancement component; determining, without requiring a user to enroll, whether the speech of the audio data includes one or both of near-field speech and far-field speech; and changing one or more of the at least one speech enhancement component based on determining the speech of the audio data includes one or both of near-field speech and far-field speech.Type: ApplicationFiled: June 30, 2022Publication date: January 4, 2024Applicant: Microsoft Technology Licensing, LLCInventor: Ross G. CUTLER
-
Publication number: 20230419986Abstract: Systems, methods, and computer-readable storage devices are disclosed for optimizing speech enhancement components to use in speech communication systems using non-intrusive speech quality assessment. One method including: receiving audio data, the audio data including speech; and the audio data having been processed by at least one speech enhancement component; detecting a first quality of the speech of the audio data using a trained non-intrusive speech quality assessment (NISQA) model, the trained NISQA model trained to detect quality of speech automatically; and changing one or more of the at least one speech enhancement component based on the detected first quality of the speech.Type: ApplicationFiled: June 24, 2022Publication date: December 28, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. CUTLER, William D. FALLAS CORDERO
-
Publication number: 20230419987Abstract: Systems, methods, and computer-readable storage devices are disclosed for optimizing speech enhancement components to use in speech communication systems using non-intrusive speech quality assessment. One method including: receiving, from a computing device over a network, audio data, the audio data including speech; detecting a first quality of the speech of the audio data using a trained non-intrusive speech quality assessment (NISQA) model, the trained NISQA model trained to detect quality of speech automatically; determining whether the computing device is a low-quality endpoint based on the first quality of speech of the audio data; and transferring, from the computing device over the network, at least one speech enhancement component to at least one server device when the computing device is determined to be a low-quality endpoint.Type: ApplicationFiled: December 1, 2022Publication date: December 28, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. CUTLER, William D. FALLAS CORDERO
-
Patent number: 10930262Abstract: A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user and generating audio data representing the captured speech by the user; encoding the audio data for transmission to the remote device via the communication network; converting the audio data to text data representing the captured speech; and transmitting, during the communication session, the encoded audio data and the text data to the remote device via the communication network. The device thus can provide the text data representing the captured speech when a quality of the encoded audio signal received by the remote device is below a predetermined level.Type: GrantFiled: September 30, 2018Date of Patent: February 23, 2021Assignee: Microsoft Technology Licensing, LLC.Inventors: Ross G. Cutler, Sriram Srinivasan, Ramin Mehran, Karlton David Sequeira, Jayant Ajit Gupchup, Senthil K. Velayutham
-
Publication number: 20190073993Abstract: A device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed by the processor, cause the processor to control the device to perform functions of capturing a speech by a user; generating audio data representing the captured speech by a user; generating, based on the audio data, text data representing at least a portion of the captured speech; and transmitting, via a communication channel, the audio data and text data to the remote device. The device thus can provide the text data representing the captured speech when a quality of the audio signal received by the remote device is below a predetermined level.Type: ApplicationFiled: October 31, 2018Publication date: March 7, 2019Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ross G. Cutler, Sriram Srinivasan, Ramin Mehran, Karlton David Sequeira, Jayant Ajit Gupchup, Senthil K. Velayutham
-
Publication number: 20190035383Abstract: A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user and generating audio data representing the captured speech by the user; encoding the audio data for transmission to the remote device via the communication network; converting the audio data to text data representing the captured speech; and transmitting, during the communication session, the encoded audio data and the text data to the remote device via the communication network. The device thus can provide the text data representing the captured speech when a quality of the encoded audio signal received by the remote device is below a predetermined level.Type: ApplicationFiled: September 30, 2018Publication date: January 31, 2019Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ross G. Cutler, Sriram Srinivasan, Ramin Mehran, Karlton David Sequeira, Jayant Ajit Gupchup, Senthil K. Velayutham
-
Publication number: 20180365809Abstract: A privacy image generation system may use a light field camera that includes an array of cameras or an RGBZ camera(s)) is used to capture images and display images according to a selected privacy mode. The privacy mode may include a blur background mode that can be automatically selected based on the meeting type, participants, location, and device type. A region of interest and/or an object(s) of interest (e.g. one or more persons in a foreground) is determined and the privacy image generation system is configured to clearly show the region/object of interest and obscure or replace the background by combining multiple images. The displayed image includes the region/object(s) of interest clearly shown (e.g. in focus) and any objects in a background of the combined image shown having a limited depth of field (e.g. blurry/not in focus) and/or blurred due to the combination of the multiple images.Type: ApplicationFiled: August 24, 2018Publication date: December 20, 2018Inventors: Ross G. Cutler, Ramin Mehran
-
Publication number: 20180367446Abstract: Technologies are described for enhancement of call qualify in online communications through deployment of two or more network interface devices. Endpoint to endpoint or multiple endpoint communications managed by a multipoint control unit (MCU) communications may be facilitated using two or more network interface devices on either or both ends of a communication path. Received signals may be aggregated to improve signal quality. Network interface devices may be integrated to an endpoint, external modules, or available through combination of two endpoints (e.g., a computer connected to an online communication speaker phone). Network interface device configuration and activation may be automatically performed for a seamless operation transparent to a user.Type: ApplicationFiled: June 16, 2017Publication date: December 20, 2018Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventor: Ross G. CUTLER
-
Patent number: 10147415Abstract: Content is received at a receiving equipment from a transmitting user terminal over a network in a communication session between a transmitting user and a receiving user. The received content comprises audio data representing speech spoken by a voice of the transmitting user, and further comprises text data generated from speech spoken by the voice of the transmitting user during the communication session. At the receiving equipment, at least a portion of the received text data is converted to artificially-generated audible speech based on a model of the transmitting user's voice stored at the receiving equipment (and in embodiments in dependence on the receive audio quality). The received audio data and the artificially-generated speech are supplied to be played out to the receiving user through one or more speakers.Type: GrantFiled: February 2, 2017Date of Patent: December 4, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ross G. Cutler, Sriram Srinivasan, Ramin Mehran, Karlton David Sequeira, Jayant Ajit Gupchup, Senthil K. Velayutham
-
Publication number: 20180218727Abstract: Content is received at a receiving equipment from a transmitting user terminal over a network in a communication session between a transmitting user and a receiving user. The received content comprises audio data representing speech spoken by a voice of the transmitting user, and further comprises text data generated from speech spoken by the voice of the transmitting user during the communication session. At the receiving equipment, at least a portion of the received text data is converted to artificially-generated audible speech based on a model of the transmitting user's voice stored at the receiving equipment (and in embodiments in dependence on the receive audio quality). The received audio data and the artificially-generated speech are supplied to be played out to the receiving user through one or more speakers.Type: ApplicationFiled: February 2, 2017Publication date: August 2, 2018Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. Cutler, Sriram Srinivasan, Ramin Mehran, Karlton David Sequeira, Jayant Ajit Gupchup, Senthil K. Velayutham
-
Publication number: 20170301067Abstract: A privacy camera, such as a light field camera that includes an array of cameras or an RGBZ camera(s)) is used to capture images and display images according to a selected privacy mode. The privacy mode may include a blur background mode and a background replacement mode and can be automatically selected based on the meeting type, participants, location, and device type. A region of interest and/or an object(s) of interest (e.g. one or more persons in a foreground) is determined and the privacy camera is configured to clearly show the region/object of interest and obscure or replace the background according to the selected privacy mode. The displayed image includes the region/object(s) of interest clearly shown (e.g. in focus) and any objects in a background of the combined image shown having a limited depth of field (e.g. blurry/not in focus) and/or the background replaced with another image and/or fill.Type: ApplicationFiled: June 30, 2017Publication date: October 19, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Ross G. Cutler, Ramin Mehran
-
Patent number: 9516417Abstract: A boundary binaural microphone array includes a pair of microphones spaced from one another by a distance between approximately 5 cm and 30 cm. The boundary binaural microphone array has a structural support that locates the microphones no more than approximately 4 cm off of a surface upon which the array is placed. The microphones are separated by a sound barrier that provides an interaural level difference in the amplitudes of the sound signals sensed by the two microphones.Type: GrantFiled: January 2, 2013Date of Patent: December 6, 2016Assignee: Microsoft Technology Licensing, LLCInventor: Ross G. Cutler
-
Patent number: 9071895Abstract: Architecture for exploiting satellite microphones and employing other techniques of conference room camera/microphone systems to significantly improve the true positive rate (reduce false positives) in sound source localization (SSL). Techniques for realizing the improvement include using an LED emitter to determine the precise location of the satellite microphones on a table, using the base SSL and external sounds to determine the approximate location of the satellite microphone on the table, using the satellite microphone phase to improve the SSL performance, using the satellite microphone amplitude to improve the active speaker detector (ASD) performance, and using the satellite microphones to estimate camera zoom.Type: GrantFiled: November 19, 2012Date of Patent: June 30, 2015Assignee: Microsoft Technology Licensing, LLCInventor: Ross G. Cutler
-
Patent number: 8773499Abstract: A dynamically adjustable framed view of occupants in a room is captured through an automatic framing system. The system employs a camera system, including a pan/tilt/zoom (PTZ) camera and one or more depth cameras, to automatically locate occupants in a room and adjust the PTZ camera's pan, tilt, and zoom settings to focus in on the occupants and center them in the main video frame. The depth cameras may distinguish between occupants and inanimate objects and adaptively determine the location of the occupants in the room. The PTZ camera may be calibrated with the depth cameras in order to use the location information determined by the depth cameras to automatically center the occupants in the main video frame for a framed view. Additionally, the system may track position changes in the room and may dynamically adjust and update the framed view when changes occur.Type: GrantFiled: June 24, 2011Date of Patent: July 8, 2014Assignee: Microsoft CorporationInventors: Josh Watson, Simone Leorin, Ross G. Cutler
-
Publication number: 20140185814Abstract: A boundary binaural microphone array includes a pair of microphones spaced from one another by a distance between approximately 5 cm and 30 cm. The boundary binaural microphone array has a structural support that locates the microphones no more than approximately 4 cm off of a surface upon which the array is placed. The microphones are separated by a sound barrier that provides an interaural level difference in the amplitudes of the sound signals sensed by the two microphones.Type: ApplicationFiled: January 2, 2013Publication date: July 3, 2014Applicant: MICROSOFT CORPORATIONInventor: Ross G. Cutler
-
Patent number: 8749650Abstract: Embodiments of the invention compensate for the movement of a meeting capture device during a live meeting when performing speaker indexing of a recorded meeting. In one example, a first position of a capture device is determined. A second position of the capture device is determined after the capture device has been moved from the first position to the second position. The movement data associated with movement of the capture device from the first position to the second position is determined. The movement data is outputted and used in speaker indexing of the recorded meeting.Type: GrantFiled: December 7, 2012Date of Patent: June 10, 2014Assignee: Microsoft CorporationInventor: Ross G. Cutler