Patents by Inventor Kevin Shih

Kevin Shih has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11902705
    Abstract: Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
    Type: Grant
    Filed: September 3, 2019
    Date of Patent: February 13, 2024
    Assignee: NVIDIA CORPORATION
    Inventors: Kevin Shih, Aysegul Dundar, Animesh Garg, Robert Pottorff, Andrew Tao, Bryan Catanzaro
  • Publication number: 20240038212
    Abstract: Disclosed are apparatuses, systems, and techniques that may use machine learning for implementing generative text-to-speech models. The techniques include identifying a mapping of speech characteristics (SC) on a target distribution of a latent variable using a non-linear transformation for at least a subset of the SC. Parameters of the non-linear transformation are determined using a neural network that approximates a statistics of the SC with a statistics predicted for the SC based on the identified mapping and the target distribution of the latent variable.
    Type: Application
    Filed: January 20, 2023
    Publication date: February 1, 2024
    Inventors: Kevin Shih, José Rafael Valle Gomes da Costa, Rohan Badlani, João Felipe Santos, Bryan Catanzaro
  • Patent number: 11869483
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: January 9, 2024
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230419947
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: August 15, 2023
    Publication date: December 28, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230402028
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: August 28, 2023
    Publication date: December 14, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Patent number: 11769481
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: September 26, 2023
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230110905
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: October 7, 2021
    Publication date: April 13, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20230113950
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Application
    Filed: October 7, 2021
    Publication date: April 13, 2023
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Publication number: 20220195509
    Abstract: Systems and methods for isothermal amplification of nucleic acids in a ratiometric manner are disclosed. Methods of isothermal amplification of nucleic acids are convenient for areas that lack reliable electricity and/or funding for precision lab equipment. However, these methods typically result in non-ratiometric amplification of target sequences, thus preventing quantitative analysis or application of the results, such as for diagnostic testing. Systems and methods herein provide a method of isothermal amplification that ratiometrically amplifies target sequences, thus expanding the reach of diagnostics into remote and/or economically disadvantaged areas.
    Type: Application
    Filed: May 1, 2020
    Publication date: June 23, 2022
    Applicant: The Board of Trustees of the Leland Stanford Junior University
    Inventors: Rhiju Das, Kevin Shih, Matthew Adrianowycz
  • Publication number: 20220114700
    Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights determined based, at least in part, on one or more sub-pixel offset values.
    Type: Application
    Filed: October 8, 2020
    Publication date: April 14, 2022
    Inventors: Shiqiu Liu, Robert Pottorff, Guilin Liu, Karan Sapra, Jon Barker, David Tarjan, Pekka Janis, Edvard Fagerholm, Lei Yang, Kevin Shih, Marco Salvi, Timo Roman, Andrew Tao, Bryan Catanzaro
  • Publication number: 20220114701
    Abstract: Apparatuses, systems, and techniques are presented to generate images. In at least one embodiment, one or more neural networks are used to generate one or more images using one or more pixel weights determined based, at least in part, on one or more sub-pixel offset values.
    Type: Application
    Filed: February 10, 2021
    Publication date: April 14, 2022
    Inventors: Shiqiu Liu, Robert Pottorff, Guilin Liu, Karan Sapra, Jon Barker, David Tarjan, Pekka Janis, Edvard Fagerholm, Lei Yang, Kevin Shih, Marco Salvi, Timo Roman, Andrew Tao, Bryan Catanzaro
  • Publication number: 20210067735
    Abstract: Apparatuses, systems, and techniques to enhance video. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having a higher frame rate, higher resolution, or reduced number of missing or corrupt video frames.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Fitsum Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
  • Publication number: 20210064925
    Abstract: Apparatuses, systems, and techniques to enhance video are disclosed. In at least one embodiment, one or more neural networks are used to create, from a first video, a second video having one or more additional video frames.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Kevin Shih, Aysegul Dundar, Animesh Garg, Robert Pottorff, Andrew Tao, Bryan Catanzaro
  • Publication number: 20190295228
    Abstract: A neural network architecture is disclosed for performing image in-painting using partial convolution operations. The neural network processes an image and a corresponding mask that identifies holes in the image utilizing partial convolution operations, where the mask is used by the partial convolution operation to zero out coefficients of the convolution kernel corresponding to invalid pixel data for the holes. The mask is updated after each partial convolution operation is performed in an encoder section of the neural network. In one embodiment, the neural network is implemented using an encoder-decoder framework with skip links to forward representations of the features at different sections of the encoder to corresponding sections of the decoder.
    Type: Application
    Filed: March 21, 2019
    Publication date: September 26, 2019
    Inventors: Guilin Liu, Fitsum A. Reda, Kevin Shih, Ting-Chun Wang, Andrew Tao, Bryan Catanzaro
  • Publication number: 20190297326
    Abstract: A neural network architecture is disclosed for performing video frame prediction using a sequence of video frames and corresponding pairwise optical flows. The neural network processes the sequence of video frames and optical flows utilizing three-dimensional convolution operations, where time (or multiple video frames in the sequence of video frames) provides the third dimension in addition to the two-dimensional pixel space of the video frames. The neural network generates a set of parameters used to predict a next video frame in the sequence of video frames by sampling a previous video frame utilizing spatially-displaced convolution operations. In one embodiment, the set of parameters includes a displacement vector and at least one convolution kernel per pixel. Generating a pixel value in the next video frame includes applying the convolution kernel to a corresponding patch of pixels in the previous video frame based on the displacement vector.
    Type: Application
    Filed: March 21, 2019
    Publication date: September 26, 2019
    Inventors: Fitsum A. Reda, Guilin Liu, Kevin Shih, Robert Kirby, Jonathan Barker, David Tarjan, Andrew Tao, Bryan Catanzaro
  • Patent number: 10315860
    Abstract: An article handling device is provided. In one embodiment, a system for article handling is provided. The system has first and second article handling devices that each have one or more gripping assemblies arranged around their periphery. The gripping assemblies are selectably movable between an open position and a closed position based on the relative position of the first and second article handling devices.
    Type: Grant
    Filed: May 18, 2017
    Date of Patent: June 11, 2019
    Assignee: The Procter and Gamble Company
    Inventors: Jeffrey Kyle Werner, Robert Scott Bollinger, Cedric Dsouza, Kevin Shih Shaw
  • Patent number: 10142495
    Abstract: A system and method for managed device data collection includes a data collector controller for control of monitoring activity of networked multifunction peripherals. A user interface includes a display rendering a plurality of processor rendered interactive user configuration screens. Displayed configuration screens solicit and receive corresponding user input. The configuring screens facilitate setting device user interaction including setting a network address, network connectivity testing, modification of device certificates, changing network settings, and a testing discovery, registration or data transfer mechanism for multifunction peripheral device data collection. A data storage stores user selection data received via rendered configuration screens and the processor outputs stored user selection data as configuration data for data collection from the multifunction peripherals.
    Type: Grant
    Filed: March 10, 2017
    Date of Patent: November 27, 2018
    Assignees: Kabushiki Kaisha Toshiba, Toshiba TEC Kabushiki Kaisha
    Inventors: Kevin Shih, Dehua Zhao
  • Publication number: 20180262629
    Abstract: A system and method for managed device data collection includes a data collector controller for control of monitoring activity of networked multifunction peripherals. A user interface includes a display rendering a plurality of processor rendered interactive user configuration screens. Displayed configuration screens solicit and receive corresponding user input. The configuring screens facilitate setting device user interaction including setting a network address, network connectivity testing, modification of device certificates, changing network settings, and a testing discovery, registration or data transfer mechanism for multifunction peripheral device data collection. A data storage stores user selection data received via rendered configuration screens and the processor outputs stored user selection data as configuration data for data collection from the multifunction peripherals.
    Type: Application
    Filed: March 10, 2017
    Publication date: September 13, 2018
    Inventors: Kevin SHIH, Dehua ZHAO
  • Publication number: 20170341878
    Abstract: An article handling device is provided. In one embodiment, a system for article handling is provided. The system has first and second article handling devices that each have one or more gripping assemblies arranged around their periphery. The gripping assemblies are selectably movable between an open position and a closed position based on the relative position of the first and second article handling devices.
    Type: Application
    Filed: May 18, 2017
    Publication date: November 30, 2017
    Inventors: Jeffrey Kyle WERNER, Robert Scott BOLLINGER, Cedric DSOUZA, Kevin Shih SHAW
  • Publication number: 20160217157
    Abstract: Products (e.g., books) often include a significant amount of informative textual information that can be used in identifying the item. An input query image is a photo (e.g., a picture taken using a mobile phone) of a product. The photo is taken from an arbitrary angle and orientation, and includes an arbitrary background (e.g., a background with significant clutter). From the query image, the identification server retrieves the corresponding clean catalog image from a database. For example, the database may be a product database having a name of the product, image of the product, price of the product, sales history for the product, or any suitable combination thereof. The retrieval is performed by both matching the image with the images in the database and matching text retrieved from the image with the text in the database.
    Type: Application
    Filed: December 17, 2015
    Publication date: July 28, 2016
    Inventors: Kevin Shih, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu