Patents by Inventor Gaurav Mittal

Gaurav Mittal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12284405
    Abstract: A content delivery network (100) for streaming digital video content across a data network. The content delivery network (100) is configured to receive digital video content. The content delivery network is configured to store the digital video content in a storage format comprising a base layer (B) and an enhancement layer (E), wherein the base layer (B) is decodable to present the digital video content at a base level of video reproduction quality, and the enhancement layer (E) is decodable with the base layer to present the digital video content at an enhanced level of video reproduction quality which is higher than the base level of reproduction quality. The content delivery network (100) is configured to determine, based on a target quality which is to be provided to a client device, which layers to use in order to achieve the target quality; and to use the determined layers (B, E) to provide the client device with the digital content at the target level of quality.
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: April 22, 2025
    Assignee: V-NOVA INTERNATIONAL LIMITED
    Inventors: Gaurav Mittal, Simone Ferrara, Guido Meardi
  • Patent number: 12273543
    Abstract: A set of reconstruction elements useable to reconstruct a representation of a signal at a relatively high level of quality using data based on a representation of the signal at a relatively low level of quality is obtained. The representation at the relatively high level of quality is arranged as an array comprising at least first and second rows of signal elements. A reconstruction element is associated with a respective signal element in the set. A set of data elements is derived based on the set of reconstruction elements. At least one of the data elements is derived from at least two reconstruction elements associated with signal elements from the first row and a different number of reconstruction elements associated with signal elements from the second row.
    Type: Grant
    Filed: December 19, 2023
    Date of Patent: April 8, 2025
    Assignee: V-NOVA INTERNATIONAL LIMITED
    Inventors: Ivan Damnjanovic, Gaurav Mittal
  • Patent number: 12219160
    Abstract: A medical telepresence system comprising: an interface to receive a plurality of data feeds from a live medical procedure, at least one data feed comprising a video signal capturing the live medical procedure; a hierarchical encoder to encode the plurality of data feeds using a first tier-based hierarchical data coding scheme, wherein encoded data from the hierarchical encoder is decodable by a first set of computing devices for viewing, the first set of computing devices being communicatively coupled to the hierarchical encoder using a first network connection; a transcoder to convert from the first tier-based hierarchical data coding scheme to a second tier-based hierarchical data coding scheme, wherein encoded data from the transcoder is receivable by a second set of computing devices for viewing, the second set of computing devices being communicatively coupled to the transcoder using a second network connection, the second network connection being of a lower quality than the first network connection; and
    Type: Grant
    Filed: January 30, 2023
    Date of Patent: February 4, 2025
    Assignee: V-NOVA INTERNATIONAL LIMITED
    Inventors: Guido Meardi, Simone Ferrara, Gaurav Mittal
  • Patent number: 12219159
    Abstract: A method for encoding a first stream of video data comprising a plurality of frames of video, the method, for one or more of the plurality of frames of video, comprising the steps of: encoding in a hierarchical arrangement a frame of the video data, the hierarchical arrangement comprising a base layer of video data and a first enhancement layer of video data, said first enhancement layer of video data comprising a plurality of sub-layers of enhancement data, such that when encoded: the base layer of video data comprises data which when decoded renders the frame at a first, base, level of quality; and each sub-layer of enhancement data comprises data which, when decoded with the base layer, render the frame at a higher level of quality than the base level of quality; and wherein the steps of encoding the sub-layers of enhancement data comprises: quantizing the enhancement data at a determined initial level of quantization thereby creating a set of quantized enhancement data; associating to each of the pluralit
    Type: Grant
    Filed: January 20, 2023
    Date of Patent: February 4, 2025
    Assignee: V-NOVA INTERNATIONAL LIMITED
    Inventor: Gaurav Mittal
  • Patent number: 12192543
    Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
    Type: Grant
    Filed: December 21, 2023
    Date of Patent: January 7, 2025
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Gaurav Mittal, Ye Yu, Mei Chen, Junwen Chen
  • Publication number: 20240404279
    Abstract: A classifier model is trained for temporal action localization of video clips. A training video clip that includes actions of interest for identification is ingested into the classifier model. Action characteristics within frames of the video clip are identified. The actions correspond to known action classes. An actionness score is determined for each of the frames based upon the action characteristics identified within each of the frames. Class activation sequence (CAS) scores are determined for sequences of the frames based upon a presence or an absence of the action characteristics identified within each of the frames. Base confidence predictions of temporal locations of actions of interest within the video clip are produced by correlating each of the actionness scores with corresponding class activation scores for each of the frames in the sequences of frames.
    Type: Application
    Filed: May 30, 2023
    Publication date: December 5, 2024
    Inventors: Gaurav MITTAL, Ye YU, Matthew Brigham HALL, Sandra SAJEEV, Mei CHEN, Mamshad Nayeem RIZVE
  • Publication number: 20240397068
    Abstract: A set of reconstruction elements useable to reconstruct a representation of a signal at a relatively high level of quality using data based on a representation of the signal at a relatively low level of quality is obtained. The representation at the relatively high level of quality is arranged as an array comprising at least first and second rows of signal elements. A reconstruction element is associated with a respective signal element in the set. A set of data elements is derived based on the set of reconstruction elements. At least one of the data elements is derived from at least two reconstruction elements associated with signal elements from the first row and a different number of reconstruction elements associated with signal elements from the second row.
    Type: Application
    Filed: December 19, 2023
    Publication date: November 28, 2024
    Inventors: Ivan DAMNJANOVIC, Gaurav MITTAL
  • Patent number: 12087043
    Abstract: The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes.
    Type: Grant
    Filed: November 24, 2021
    Date of Patent: September 10, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gaurav Mittal, Ye Yu, Mei Chen, Jay Sanjay Patravali
  • Publication number: 20240290081
    Abstract: A computerized method trains and uses a multimodal fusion transformer (MFT) model for content moderation. Language modality data and vision modality data associated with a multimodal media source is received. Language embeddings are generated from the language modality data and vision embeddings are generated from the vision modality data. Both kinds of embeddings are generated using operations and/or processes that are specific to the associated modalities. The language embeddings and vision embeddings are combined into combined embeddings and the MFT model is used with those combined embeddings to generate a language semantic output token, a vision semantic output token, and a combined semantic output token. Contrastive loss data is generated using the three semantic output tokens and the MFT model is adjusted using that contrastive loss data. After the MFT model is trained sufficiently, it is configured to perform content moderation operations using semantic output tokens.
    Type: Application
    Filed: February 28, 2023
    Publication date: August 29, 2024
    Inventors: Ye YU, Gaurav MITTAL, Matthew Brigham HALL, Sandra SAJEEV, Mei CHEN, Jialin YUAN
  • Publication number: 20240244279
    Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
    Type: Application
    Filed: December 21, 2023
    Publication date: July 18, 2024
    Inventors: Gaurav MITTAL, Ye YU, Mei CHEN, Junwen CHEN
  • Publication number: 20240211547
    Abstract: A method of balancing a dataset for a machine learning model includes identifying confusing classes of few-shot classes for a machine learning model during validation. One of the confusing classes and an image from one of the few-shot classes are selected. An image perturbation is computed such that the selected image is classified as the selected confusing class. The selected image is modified with the computed perturbation. The modified selected image is added to a batch for training the machine learning model.
    Type: Application
    Filed: March 7, 2024
    Publication date: June 27, 2024
    Inventors: Gaurav MITTAL, Nikolaos KARIANAKIS, Victor Manuel FRAGOSO ROJAS, Mei CHEN, Jedrzej Jakub KOZERAWSKI
  • Patent number: 11960574
    Abstract: A method of balancing a dataset for a machine learning model includes identifying confusing classes of few-shot classes for a machine learning model during validation. One of the confusing classes and an image from one of the few-shot classes are selected. An image perturbation is computed such that the selected image is classified as the selected confusing class. The selected image is modified with the computed perturbation. The modified selected image is added to a batch for training the machine learning model.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: April 16, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Gaurav Mittal, Nikolaos Karianakis, Victor Manuel Fragoso Rojas, Mei Chen, Jedrzej Jakub Kozerawski
  • Patent number: 11895343
    Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: February 6, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gaurav Mittal, Ye Yu, Mei Chen, Junwen Chen
  • Publication number: 20240020854
    Abstract: Example solutions for video object segmentation (VOS) use a bilateral attention transformer in motion-appearance neighboring space, and perform a process that includes: receiving a video stream comprising a plurality of video frames in a sequence; receiving a first object mask for an initial video frame of the plurality of video frames; selecting a video frame of the plurality of video frames as a current query frame, the current query frame following, in the sequence, a reference frame of a reference frame set, wherein each reference frame has a corresponding object mask; using the current query frame and a video frame in the reference frame set, determining a bilateral attention; and using the bilateral attention, generating an object mask for the current query frame.
    Type: Application
    Filed: September 14, 2022
    Publication date: January 18, 2024
    Inventors: Ye YU, Gaurav MITTAL, Mei CHEN, Jialin YUAN
  • Patent number: 11877019
    Abstract: A video streaming client is configured to check whether a target version of a desired video content is available for streaming from a video streaming server, the target version being encoded to a target value of an encoding attribute. The video streaming client obtains a data communication speed to the video streaming server, and determines that the data communication speed is sufficient to stream and display the target version of the desired video content. The target value is less than a maximum value of the encoding attribute which is decodable by the video streaming client. The video streaming client is configured to select to stream the target version of the desired video content even though the data communication speed is sufficient to stream a version of the desired video content without playback interruption when encoded using a value of the encoding attribute which is higher than the target value.
    Type: Grant
    Filed: November 16, 2020
    Date of Patent: January 16, 2024
    Inventor: Gaurav Mittal
  • Patent number: 11856210
    Abstract: A set of reconstruction elements useable to reconstruct a representation of a signal at a relatively high level of quality using data based on a representation of the signal at a relatively low level of quality is obtained. The representation at the relatively high level of quality is arranged as an array comprising at least first and second rows of signal elements. A reconstruction element is associated with a respective signal element in the set. A set of data elements is derived based on the set of reconstruction elements. At least one of the data elements is derived from at least two reconstruction elements associated with signal elements from the first row and a different number of reconstruction elements associated with signal elements from the second row.
    Type: Grant
    Filed: June 9, 2021
    Date of Patent: December 26, 2023
    Inventors: Ivan Damnjanovic, Gaurav Mittal
  • Publication number: 20230396817
    Abstract: Example solutions for video frame action detection use a gated history and include: receiving a video stream comprising a plurality of video frames; grouping the plurality of video frames into a set of present video frames and a set of historical video frames, the set of present video frames comprising a current video frame; determining a set of attention weights for the set of historical video frames, the set of attention weights indicating how informative a video frame is for predicting action in the current video frame; weighting the set of historical video frames with the set of attention weights to produce a set of weighted historical video frames; and based on at least the set of weighted historical video frames and the set of present video frames, generating an action prediction for the current video frame.
    Type: Application
    Filed: June 28, 2022
    Publication date: December 7, 2023
    Inventors: Gaurav MITTAL, Ye YU, Mei CHEN, Junwen CHEN
  • Publication number: 20230336755
    Abstract: A medical telepresence system comprising: an interface to receive a plurality of data feeds from a live medical procedure, at least one data feed comprising a video signal capturing the live medical procedure; a hierarchical encoder to encode the plurality of data feeds using a first tier-based hierarchical data coding scheme, wherein encoded data from the hierarchical encoder is decodable by a first set of computing devices for viewing, the first set of computing devices being communicatively coupled to the hierarchical encoder using a first network connection; a transcoder to convert from the first tier-based hierarchical data coding scheme to a second tier-based hierarchical data coding scheme, wherein encoded data from the transcoder is receivable by a second set of computing devices for viewing, the second set of computing devices being communicatively coupled to the transcoder using a second network connection, the second network connection being of a lower quality than the first network connection; and
    Type: Application
    Filed: January 30, 2023
    Publication date: October 19, 2023
    Inventors: Guido MEARDI, Simone Ferrara, Gaurav Mittal
  • Publication number: 20230156204
    Abstract: A method for encoding a first stream of video data comprising a plurality of frames of video, the method, for one or more of the plurality of frames of video, comprising the steps of: encoding in a hierarchical arrangement a frame of the video data, the hierarchical arrangement comprising a base layer of video data and a first enhancement layer of video data, said first enhancement layer of video data comprising a plurality of sub-layers of enhancement data, such that when encoded: the base layer of video data comprises data which when decoded renders the frame at a first, base, level of quality; and each sub-layer of enhancement data comprises data which, when decoded with the base layer, render the frame at a higher level of quality than the base level of quality; and wherein the steps of encoding the sub-layers of enhancement data comprises: quantizing the enhancement data at a determined initial level of quantization thereby creating a set of quantized enhancement data; associating to each of the pluralit
    Type: Application
    Filed: January 20, 2023
    Publication date: May 18, 2023
    Inventor: Gaurav MITTAL
  • Publication number: 20230118073
    Abstract: Disclosed solutions for improved machine learning (ML) employ knowledge balancing self-distillation with adaptive mutual information (AMI). Examples include: for a neural network (NN) having a plurality of modules, determining a task objective for at least a final module of the plurality of modules; for the NN, determining a balancing objective using at least an output of the final module and an output of a first intermediate module of the plurality of modules; determining an overall objective, wherein determining the overall objective comprises combining the task objective with the balancing objective; and adjusting weights of the NN to minimize the overall objective. Balancing information may combine mutual information (between an intermediate module output and the output of the final module) with self-information (for the intermediate module output) to produce AMI. Adjusting weights of the NN during training, using the AMI, results in knowledge balancing self-distillation.
    Type: Application
    Filed: October 15, 2021
    Publication date: April 20, 2023
    Inventors: Ye YU, Gaurav MITTAL, Mei CHEN, Yu GONG