Patents by Inventor Ananth Sankar
Ananth Sankar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230386652Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.Type: ApplicationFiled: August 15, 2023Publication date: November 30, 2023Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
-
Patent number: 11763936Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.Type: GrantFiled: December 4, 2020Date of Patent: September 19, 2023Assignee: Google LLCInventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
-
Publication number: 20230206010Abstract: Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.Type: ApplicationFiled: December 23, 2021Publication date: June 29, 2023Inventors: Xun Luan, Aman Gupta, Sirjan Kafle, Ananth Sankar, Di Wen, Saurabh Kataria, Ying Xuan, Sakshi Verma, Bharat Kumar Jain, Xue Xia, Bhargavkumar Kanubhai Patel, Vipin Gupta, Nikita Gupta
-
Patent number: 11438639Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.Type: GrantFiled: March 3, 2020Date of Patent: September 6, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
-
Publication number: 20210281891Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.Type: ApplicationFiled: March 3, 2020Publication date: September 9, 2021Inventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
-
Publication number: 20210090724Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.Type: ApplicationFiled: December 4, 2020Publication date: March 25, 2021Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
-
Patent number: 10860685Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.Type: GrantFiled: November 28, 2016Date of Patent: December 8, 2020Assignee: Google LLCInventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
-
Publication number: 20180150605Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.Type: ApplicationFiled: November 28, 2016Publication date: May 31, 2018Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
-
Patent number: 9418660Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.Type: GrantFiled: January 15, 2014Date of Patent: August 16, 2016Assignee: Cisco Technology, Inc.Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
-
Publication number: 20150199966Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.Type: ApplicationFiled: January 15, 2014Publication date: July 16, 2015Applicant: Cisco Technology, Inc.Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
-
Patent number: 9058806Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.Type: GrantFiled: September 10, 2012Date of Patent: June 16, 2015Assignee: CISCO TECHNOLOGY, INC.Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
-
Patent number: 8886011Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.Type: GrantFiled: December 7, 2012Date of Patent: November 11, 2014Assignee: Cisco Technology, Inc.Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
-
Publication number: 20140161416Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.Type: ApplicationFiled: December 7, 2012Publication date: June 12, 2014Applicant: Cisco Technology, Inc.Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
-
Publication number: 20140074866Abstract: A method is provided in one example embodiment and includes detecting user interaction associated with a video file; extracting interaction information that is based on the user interaction associated with the video file; and enhancing the metadata based on the interaction information. In more particular embodiments, the enhancing can include generating additional metadata associated with the video file. Additionally, the enhancing can include determining relevance values associated with the metadata.Type: ApplicationFiled: September 10, 2012Publication date: March 13, 2014Applicant: Cisco Technology, Inc.Inventors: Sandipkumar V. Shah, Ananth Sankar
-
Publication number: 20140074471Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.Type: ApplicationFiled: September 10, 2012Publication date: March 13, 2014Applicant: CISCO TECHNOLOGY, INC.Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
-
Publication number: 20130342433Abstract: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.Type: ApplicationFiled: February 4, 2011Publication date: December 26, 2013Inventors: Ananth Sankar, Anurag Bist
-
Publication number: 20130300939Abstract: An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.Type: ApplicationFiled: May 11, 2012Publication date: November 14, 2013Inventors: Jim Chen Chou, Sachin Kajarekar, Jason J. Catchpole, Ananth Sankar
-
Publication number: 20130144414Abstract: In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.Type: ApplicationFiled: December 6, 2011Publication date: June 6, 2013Applicant: Cisco Technology, Inc.Inventors: Sachin Kajarekar, Ananth Sankar, Sattish Gannu, Aparna Khare
-
Publication number: 20120200484Abstract: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.Type: ApplicationFiled: February 4, 2011Publication date: August 9, 2012Inventors: Ananth Sankar, Anurag Bist
-
Patent number: 7873229Abstract: In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.Type: GrantFiled: July 31, 2006Date of Patent: January 18, 2011Assignee: Moxair, Inc.Inventors: Ananth Sankar, David Romacho Rosell, Anurag Bist