Patents by Inventor Wonmin Byeon

Wonmin Byeon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260148014
    Abstract: The hybrid-head architecture model can be used to train a language model (LM). It uses a combination of attention heads and state space models (SSMs) to improve the speed and efficiency of inferencing a received input sequence. This disclosure combines the high-resolution recall capabilities of attention heads with the efficient context summarization of SSM heads. The model can be separated into a set of layers, and the input sequence can be processed layer by layer. Each layer can have its own number of attention heads and SSM heads. Fine-tuning and optimization can be applied to each layer, as well as normalization and scaling. To further optimize the performance of the hybrid-head architecture model, learnable meta tokens can be used, which act as a learned cache for attention and SSM heads, enhancing the model's focus on salient information. The attention heads and the SSMs can be processed in parallel.
    Type: Application
    Filed: July 25, 2025
    Publication date: May 28, 2026
    Inventors: Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov
  • Patent number: 12614329
    Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
    Type: Grant
    Filed: July 7, 2022
    Date of Patent: April 28, 2026
    Assignee: Nvidia Corporation
    Inventors: Yeongho Seol, Simon Yuen, Dmitry Aleksandrovich Korobchenko, Mingquan Zhou, Ronan Browne, Wonmin Byeon
  • Patent number: 12586199
    Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object.
    Type: Grant
    Filed: May 1, 2023
    Date of Patent: March 24, 2026
    Assignee: NVIDIA Corporation
    Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Arash Vahdat, Wonmin Byeon
  • Publication number: 20250272959
    Abstract: Visual language processors that include an image encoder configured to convert an image into a low-resolution feature map, a feature refinement network configured to upsample the low-resolution feature map into a high-resolution feature map, and a visual-language connector configured to map an image-level feature map and a region-level feature map both derived from the high-resolution feature map into an embedding space of a language encoder.
    Type: Application
    Filed: February 27, 2025
    Publication date: August 28, 2025
    Applicant: NVIDIA Corp.
    Inventors: Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Simon Chong-Wee See, Jan Kautz, Sifei Liu
  • Publication number: 20250103906
    Abstract: One embodiment of the present invention sets forth a technique for performing meta-learning. The technique includes performing a first set of training iterations to convert a prediction learning network into a first trained prediction learning network based on a first support set of training data and executing a representation learning network and the first trained prediction learning network to generate a first set of supervised training output and a first set of self-supervised training output based on a first query set of training data corresponding to the first support set of training data. The technique also includes performing a first training iteration to convert the representation learning network into a first trained representation learning network based on a first loss associated with the first set of supervised training output and a second loss associated with the first set of self-supervised training output.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 27, 2025
    Inventors: Wonmin BYEON, Sudarshan BABU, Shalini DE MELLO, Jan KAUTZ
  • Publication number: 20250095350
    Abstract: One embodiment of the present invention sets forth a technique for executing a machine learning model. The technique includes performing a first set of training iterations to convert a prediction learning network into a first trained prediction learning network based on a first support set associated with a first set of classes. The technique also includes executing a first trained representation learning network to convert a first data sample into a first latent representation, where the first trained representation learning network is generated by training a representation learning network using a first query set, a first set of self-supervised losses, and a first set of supervised losses. The technique further includes executing the first trained prediction learning network to convert the first latent representation into a first prediction of a first class that is not included in the second set of classes.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wonmin BYEON, Sudarshan BABU, Shalini DE MELLO, Jan KAUTZ
  • Publication number: 20250094813
    Abstract: One embodiment of the present invention sets forth a technique for training a transformer neural network. The technique includes inputting a first task token and a first set of samples into the transformer neural network and training the transformer neural network using a first set of losses between predictions generated by the transformer neural network from the first task token and first set of samples as well as a first set of labels. The technique also includes converting the first task token into a second task token that is larger than the first task token, inputting the second task token and a second set of samples into the transformer neural network, and training the transformer neural network using a second set of losses between predictions generated by the transformer neural network from the second task token and the second set of samples as well as a second set of labels.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wonmin BYEON, Sudarshan BABU, Shalini DE MELLO, Jan KAUTZ
  • Publication number: 20250094819
    Abstract: One embodiment of the present invention sets forth a technique for executing a transformer neural network. The technique includes executing a first attention unit included in the transformer neural network to convert a first input token into a first query, a first key, and a first plurality of values, where each value included in the first plurality of values represents a sub-task associated with the transformer neural network. The technique also includes computing a first plurality of outputs associated with the first input token based on the first query, the first key, and the first plurality of values. The technique further includes performing a task associated with an input corresponding to the first input token based on the first input token and the first plurality of outputs.
    Type: Application
    Filed: September 20, 2023
    Publication date: March 20, 2025
    Inventors: Wonmin BYEON, Sudarshan BABU, Shalini DE MELLO, Jan KAUTZ
  • Patent number: 11989642
    Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
    Type: Grant
    Filed: September 26, 2022
    Date of Patent: May 21, 2024
    Assignee: NVIDIA Corporation
    Inventors: Ruben Villegas, Alejandro Troccoli, Iuri Frosio, Stephen Tyree, Wonmin Byeon, Jan Kautz
  • Publication number: 20240153093
    Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object.
    Type: Application
    Filed: May 1, 2023
    Publication date: May 9, 2024
    Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Arash Vahdat, Wonmin Byeon
  • Publication number: 20240127041
    Abstract: Systems and methods are disclosed related to a convolutional structured state space model (ConvSSM), which has a tensor-structured state but a continuous-time parameterization and linear state updates. The linearity may be exploited to use parallel scans for subquadratic parallelization across the spatiotemporal sequence. The ConvSSM effectively models long-range dependencies and, when followed by a nonlinear operation forms a spatiotemporal layer (ConvS5) that does not require compressing frames into tokens, can be efficiently parallelized across the sequence, provides an unbounded context, and enables fast autoregressive generation.
    Type: Application
    Filed: August 21, 2023
    Publication date: April 18, 2024
    Inventors: Jimmy Smith, Wonmin Byeon, Shalini De Mello
  • Publication number: 20240127075
    Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the costs associated with collecting and labeling real world datasets for use in training the model, computer processes can synthetically generate datasets which simulate real world data. The present disclosure improves the effectiveness of such synthetic datasets for training machine learning models used in real world applications, in particular by generating a synthetic dataset that is specifically targeted to a specified downstream task (e.g. a particular computer vision task, a particular natural language processing task, etc.).
    Type: Application
    Filed: June 21, 2023
    Publication date: April 18, 2024
    Applicant: NVIDIA Corporation
    Inventors: Shalini De Mello, Christian Jacobsen, Xunlei Wu, Stephen Tyree, Alice Li, Wonmin Byeon, Shangru Li
  • Publication number: 20240119361
    Abstract: One embodiment of a method for training a first machine learning model having a different architecture than a second machine learning model includes receiving a first data set, performing one or more operations to generate a second data set based on the first data set and the second machine learning model, wherein the second data set includes at least one feature associated with one or more tasks that the second machine learning model was previously trained to perform, and performing one or more operations to train the first machine learning model based on the second data set and the second machine learning model.
    Type: Application
    Filed: July 6, 2023
    Publication date: April 11, 2024
    Inventors: Hongxu YIN, Wonmin BYEON, Jan KAUTZ, Divyam MADAAN, Pavlo MOLCHANOV
  • Publication number: 20240013462
    Abstract: A deep neural network can be trained to output motion or deformation information for a character that is representative of the character uttering speech contained in audio input, which is accurate for an emotional state of the character. The character can have different facial components or regions (e.g., head, skin, eyes, tongue) modeled separately, such that the network can output motion or deformation information for each of these different facial components. During training, the network can be provided with emotion and/or style vectors that indicate information to be used in generating realistic animation for input speech, as may relate to one or more emotions to be exhibited by the character, a relative weighting of those emotions, and any style or adjustments to be made to how the character expresses that emotional state. The network output can be provided to a renderer to generate audio-driven facial animation that is emotion-accurate.
    Type: Application
    Filed: July 7, 2022
    Publication date: January 11, 2024
    Inventors: Yeongho Seol, Simon Yuen, Dmitry Aleksandrovich Korobchenko, Mingquan Zhou, Ronan Browne, Wonmin Byeon
  • Patent number: 11790633
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Grant
    Filed: July 1, 2021
    Date of Patent: October 17, 2023
    Assignee: NVIDIA Corporation
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
  • Publication number: 20230177810
    Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
    Type: Application
    Filed: June 29, 2022
    Publication date: June 8, 2023
    Inventors: Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz
  • Publication number: 20230153604
    Abstract: To assist a machine learning environment in modelling a complex physical simulation (such as a numerical simulation or physics simulation), a correlation between input coordinates is determined. For example, a discrete solution (e.g., the correlation between the plurality of input coordinates) may be obtained from a non-discrete (e.g., continuous) physics space by performing a conversion from the physics space to a grid space. This correlation is input along with the coordinates into a machine learning environment to obtain results from the simulation. As a result, instead of implementing resource and power-intensive simulations to solve these computation problems, a machine learning environment implemented using less power and computing resources may solve these computation problems in a faster and more efficient manner.
    Type: Application
    Filed: July 26, 2022
    Publication date: May 18, 2023
    Inventors: Wonmin Byeon, Benjamin Wu, Oliver Hennigh
  • Publication number: 20230146647
    Abstract: Apparatuses, systems, and techniques to perform and facilitate preservation of neural coding network weights over time. In at least one embodiment, a convolutional neural coding network is trained using a set of tasks such that said convolutional neural coding network retains an ability to perform inferencing based on tasks from previous training.
    Type: Application
    Filed: November 5, 2021
    Publication date: May 11, 2023
    Inventors: Wonmin Byeon, Shalini De Mello, Ankur Arjun Mali
  • Publication number: 20230088912
    Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
    Type: Application
    Filed: September 26, 2022
    Publication date: March 23, 2023
    Inventors: Ruben Villegas, Alejandro Troccoli, Iuri Frosio, Stephen Tyree, Wonmin Byeon, Jan Kautz
  • Publication number: 20230015989
    Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
    Type: Application
    Filed: July 1, 2021
    Publication date: January 19, 2023
    Inventors: Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz