Patents by Inventor Ananth Sankar

Ananth Sankar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating structured text content using speech recognition models

Patent number: 12315624

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Grant

Filed: August 15, 2023

Date of Patent: May 27, 2025

Assignee: Google LLC

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Utilizing visual and textual aspects of images with recommendation systems

Patent number: 12008331

Abstract: Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.

Type: Grant

Filed: December 23, 2021

Date of Patent: June 11, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xun Luan, Aman Gupta, Sirjan Kafle, Ananth Sankar, Di Wen, Saurabh Kataria, Ying Xuan, Sakshi Verma, Bharat Kumar Jain, Xue Xia, Bhargavkumar Kanubhai Patel, Vipin Gupta, Nikita Gupta
GENERATING STRUCTURED TEXT CONTENT USING SPEECH RECOGNITION MODELS

Publication number: 20230386652

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Application

Filed: August 15, 2023

Publication date: November 30, 2023

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Generating structured text content using speech recognition models

Patent number: 11763936

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Grant

Filed: December 4, 2020

Date of Patent: September 19, 2023

Assignee: Google LLC

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
UTILIZING VISUAL AND TEXTUAL ASPECTS OF IMAGES WITH RECOMMENDATION SYSTEMS

Publication number: 20230206010

Abstract: Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.

Type: Application

Filed: December 23, 2021

Publication date: June 29, 2023

Inventors: Xun Luan, Aman Gupta, Sirjan Kafle, Ananth Sankar, Di Wen, Saurabh Kataria, Ying Xuan, Sakshi Verma, Bharat Kumar Jain, Xue Xia, Bhargavkumar Kanubhai Patel, Vipin Gupta, Nikita Gupta
Partial-video near-duplicate detection

Patent number: 11438639

Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.

Type: Grant

Filed: March 3, 2020

Date of Patent: September 6, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
PARTIAL-VIDEO NEAR-DUPLICATE DETECTION

Publication number: 20210281891

Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.

Type: Application

Filed: March 3, 2020

Publication date: September 9, 2021

Inventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
GENERATING STRUCTURED TEXT CONTENT USING SPEECH RECOGNITION MODELS

Publication number: 20210090724

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Generating structured text content using speech recognition models

Patent number: 10860685

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Grant

Filed: November 28, 2016

Date of Patent: December 8, 2020

Assignee: Google LLC

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
GENERATING STRUCTURED TEXT CONTENT USING SPEECH RECOGNITION MODELS

Publication number: 20180150605

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Application

Filed: November 28, 2016

Publication date: May 31, 2018

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Crowd sourcing audio transcription via re-speaking

Patent number: 9418660

Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.

Type: Grant

Filed: January 15, 2014

Date of Patent: August 16, 2016

Assignee: Cisco Technology, Inc.

Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
Crowd Sourcing Audio Transcription Via Re-Speaking

Publication number: 20150199966

Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.

Type: Application

Filed: January 15, 2014

Publication date: July 16, 2015

Applicant: Cisco Technology, Inc.

Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
Speaker segmentation and recognition based on list of speakers

Patent number: 9058806

Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

Type: Grant

Filed: September 10, 2012

Date of Patent: June 16, 2015

Assignee: CISCO TECHNOLOGY, INC.

Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
System and method for question detection based video segmentation, search and collaboration in a video processing environment

Patent number: 8886011

Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.

Type: Grant

Filed: December 7, 2012

Date of Patent: November 11, 2014

Assignee: Cisco Technology, Inc.

Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
SYSTEM AND METHOD FOR QUESTION DETECTION BASED VIDEO SEGMENTATION, SEARCH AND COLLABORATION IN A VIDEO PROCESSING ENVIRONMENT

Publication number: 20140161416

Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.

Type: Application

Filed: December 7, 2012

Publication date: June 12, 2014

Applicant: Cisco Technology, Inc.

Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
SYSTEM AND METHOD FOR ENHANCING METADATA IN A VIDEO PROCESSING ENVIRONMENT

Publication number: 20140074866

Abstract: A method is provided in one example embodiment and includes detecting user interaction associated with a video file; extracting interaction information that is based on the user interaction associated with the video file; and enhancing the metadata based on the interaction information. In more particular embodiments, the enhancing can include generating additional metadata associated with the video file. Additionally, the enhancing can include determining relevance values associated with the metadata.

Type: Application

Filed: September 10, 2012

Publication date: March 13, 2014

Applicant: Cisco Technology, Inc.

Inventors: Sandipkumar V. Shah, Ananth Sankar
SYSTEM AND METHOD FOR IMPROVING SPEAKER SEGMENTATION AND RECOGNITION ACCURACY IN A MEDIA PROCESSING ENVIRONMENT

Publication number: 20140074471

Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.

Type: Application

Filed: September 10, 2012

Publication date: March 13, 2014

Applicant: CISCO TECHNOLOGY, INC.

Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
Dynamic backlight control for video displays

Publication number: 20130342433

Abstract: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.

Type: Application

Filed: February 4, 2011

Publication date: December 26, 2013

Inventors: Ananth Sankar, Anurag Bist
SYSTEM AND METHOD FOR JOINT SPEAKER AND SCENE RECOGNITION IN A VIDEO/AUDIO PROCESSING ENVIRONMENT

Publication number: 20130300939

Abstract: An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.

Type: Application

Filed: May 11, 2012

Publication date: November 14, 2013

Inventors: Jim Chen Chou, Sachin Kajarekar, Jason J. Catchpole, Ananth Sankar
METHOD AND APPARATUS FOR DISCOVERING AND LABELING SPEAKERS IN A LARGE AND GROWING COLLECTION OF VIDEOS WITH MINIMAL USER EFFORT

Publication number: 20130144414

Abstract: In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.

Type: Application

Filed: December 6, 2011

Publication date: June 6, 2013

Applicant: Cisco Technology, Inc.

Inventors: Sachin Kajarekar, Ananth Sankar, Sattish Gannu, Aparna Khare

1 2 next