Patents by Inventor Ananth Sankar

Ananth Sankar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230386652
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
    Type: Application
    Filed: August 15, 2023
    Publication date: November 30, 2023
    Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
  • Patent number: 11763936
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: September 19, 2023
    Assignee: Google LLC
    Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
  • Publication number: 20230206010
    Abstract: Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: Xun Luan, Aman Gupta, Sirjan Kafle, Ananth Sankar, Di Wen, Saurabh Kataria, Ying Xuan, Sakshi Verma, Bharat Kumar Jain, Xue Xia, Bhargavkumar Kanubhai Patel, Vipin Gupta, Nikita Gupta
  • Patent number: 11438639
    Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.
    Type: Grant
    Filed: March 3, 2020
    Date of Patent: September 6, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
  • Publication number: 20210281891
    Abstract: Methods, systems, and computer programs are presented for detecting near duplicates and partial matches of videos. One method includes an operation for receiving a video containing frames. For each frame, keypoints are determined within the frame. For each keypoint, a horizontal gradient vector is calculated based on a horizontal gradient at the keypoint and a vertical gradient vector is calculated based on a vertical gradient at the keypoint. The horizontal and vertical gradients are binary vectors. Further, a keypoint description is generated for each keypoint based on the horizontal gradient vector and the vertical gradient vector. Further, the frames are matched to frames of videos in a video library based on the keypoint descriptions of the keypoints in the frame in the videos in the video library. Further, a determination is made if the video has near duplicates in the video library based on the matching.
    Type: Application
    Filed: March 3, 2020
    Publication date: September 9, 2021
    Inventors: Sumit Srivastava, Suhit Sinha, Ananth Sankar
  • Publication number: 20210090724
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 25, 2021
    Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
  • Patent number: 10860685
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
    Type: Grant
    Filed: November 28, 2016
    Date of Patent: December 8, 2020
    Assignee: Google LLC
    Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
  • Publication number: 20180150605
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.
    Type: Application
    Filed: November 28, 2016
    Publication date: May 31, 2018
    Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
  • Patent number: 9418660
    Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
    Type: Grant
    Filed: January 15, 2014
    Date of Patent: August 16, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
  • Publication number: 20150199966
    Abstract: Speech audio that is intended for transcription into textual form is received. The received speech audio is divided into first speech segments. A plurality of speakers is identified. A speaker is configured for repeating in spoken form a first speech segment that the speaker has listened to. A subset of speakers is determined for sending each first speech segment. Each first speech segment is sent to the subset of speakers determined for the particular first speech segment. The second speech segments are received from the speakers. The second speech segment is a re-spoken version of a first speech segment that has been generated by a speaker by repeating in spoken form the first speech segment. The second speech segments are processed to generate partial transcripts. The partial transcripts are combined to generate a complete transcript that is a textual representation corresponding to the received speech audio.
    Type: Application
    Filed: January 15, 2014
    Publication date: July 16, 2015
    Applicant: Cisco Technology, Inc.
    Inventors: Matthias Paulik, Vivek Halder, Ananth Sankar
  • Patent number: 9058806
    Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: June 16, 2015
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
  • Patent number: 8886011
    Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: November 11, 2014
    Assignee: Cisco Technology, Inc.
    Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
  • Publication number: 20140161416
    Abstract: An example method is provided and includes receiving a video bitstream in a network environment; detecting a question in a decoded audio portion of a video bitstream; and marking a segment of the video bitstream with a tag. The tag may correspond to a location of the question in the video bitstream, and can facilitate consumption of the video bitstream. The method can further include detecting keywords in the question, and combining the keywords to determine a content of the question. In specific embodiments, the method can also include receiving the question and a corresponding answer from a user interaction, crowdsourcing the question by a plurality of users, counting a number of questions in the video bitstream and other features.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 12, 2014
    Applicant: Cisco Technology, Inc.
    Inventors: Jim Chen Chou, Ananth Sankar, Sachin Kajarekar
  • Publication number: 20140074866
    Abstract: A method is provided in one example embodiment and includes detecting user interaction associated with a video file; extracting interaction information that is based on the user interaction associated with the video file; and enhancing the metadata based on the interaction information. In more particular embodiments, the enhancing can include generating additional metadata associated with the video file. Additionally, the enhancing can include determining relevance values associated with the metadata.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 13, 2014
    Applicant: Cisco Technology, Inc.
    Inventors: Sandipkumar V. Shah, Ananth Sankar
  • Publication number: 20140074471
    Abstract: A method is provided and includes estimating an approximate list of potential speakers in a file from one or more applications. The file (e.g., an audio file, video file, or any suitable combination thereof) includes a recording of a plurality of speakers. The method also includes segmenting the file according to the approximate list of potential speakers such that each segment corresponds to at least one speaker; and recognizing particular speakers in the file based on the approximate list of potential speakers.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 13, 2014
    Applicant: CISCO TECHNOLOGY, INC.
    Inventors: Ananth Sankar, Sachin Kajarekar, Satish K. Gannu
  • Publication number: 20130342433
    Abstract: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.
    Type: Application
    Filed: February 4, 2011
    Publication date: December 26, 2013
    Inventors: Ananth Sankar, Anurag Bist
  • Publication number: 20130300939
    Abstract: An example method is provided and includes receiving a media file that includes video data and audio data; determining an initial scene sequence in the media file; determining an initial speaker sequence in the media file; and updating a selected one of the initial scene sequence and the initial speaker sequence in order to generate an updated scene sequence and an updated speaker sequence respectively. The initial scene sequence is updated based on the initial speaker sequence, and wherein the initial speaker sequence is updated based on the initial scene sequence.
    Type: Application
    Filed: May 11, 2012
    Publication date: November 14, 2013
    Inventors: Jim Chen Chou, Sachin Kajarekar, Jason J. Catchpole, Ananth Sankar
  • Publication number: 20130144414
    Abstract: In one embodiment, an audio stream is partitioned into a plurality of segments such that the plurality of segments are clustered into one or more clusters, each of the one or more clusters identifying a subset of the plurality of segments in the audio stream and corresponding to one of a first set of one or more speaker models, each speaker model in the first set of speaker models representing one of a first set of hypothetical speakers. The speaker models in the first set of speaker models are compared with a second set of one or more speaker models, where each speaker model in the second set of speaker models represents one of a second set of hypothetical speakers. Labels associated with one or more speaker models in the second set of speaker models are propagated to one or more speaker models in the first set of speaker models according to a result of the comparing step.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Applicant: Cisco Technology, Inc.
    Inventors: Sachin Kajarekar, Ananth Sankar, Sattish Gannu, Aparna Khare
  • Publication number: 20120200484
    Abstract: Extended operation of battery-powered devices including a visual display such as an LCD screen in a cell phone or a personal media player depends on low power consumption of the display device. For saving display power, dynamic backlight control can be used, involving adjustment of backlight brightness combined with transformation of video data to be displayed. When displaying a video or movie, in the interest of minimizing perceived flicker, dynamic changes in backlight brightness can be limited to coincide with scene changes. Video scene changes can be determined prior to their ultimate use in a client device, and available scene-change information can be downloaded along with the video to the client device. Alternatively, scene-change information as determined on the client device or elsewhere can be stored on the client device for later use during actual video display.
    Type: Application
    Filed: February 4, 2011
    Publication date: August 9, 2012
    Inventors: Ananth Sankar, Anurag Bist
  • Patent number: 7873229
    Abstract: In visual display devices such as LCD devices with backlight illumination, the backlight typically consumes most of device battery power. In the interest of displaying a given pixel pattern at a minimized backlight level, the pattern can be transformed while maintaining image quality, with a transform determined from pixel luminance statistics. Aside from, or in addition to being used for such minimizing, a transform also can be used for image enhancement, for a displayed image better to meet a visual perception quality. In either case, the transform preferably is constrained for enforcing one or several display attributes. In a network setting, the technique can be implemented in distributed fashion, so that subtasks of the technique are performed by different, interconnected processors such as server, client and proxy processors.
    Type: Grant
    Filed: July 31, 2006
    Date of Patent: January 18, 2011
    Assignee: Moxair, Inc.
    Inventors: Ananth Sankar, David Romacho Rosell, Anurag Bist