Patents by Inventor Yinhao ZHU

Yinhao ZHU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEPTH COMPLETION USING ATTENTION-BASED REFINEMENT OF FEATURES

Publication number: 20250148628

Abstract: Systems and techniques are provided for generating depth information from one or more images. For example, a process can include obtaining a first depth map corresponding to an input comprising an image of the one or more images and a sparse depth measurement. A three-dimensional (3D) point cloud can be generated based on the first depth map and multi-scale visual features of the input, wherein the 3D point cloud includes a plurality of 3D point features uplifted from the multi-scale visual features. At least a portion of the plurality of 3D point features can be processed using one or more self-attention layers to generate refined 3D point features. A two-dimensional (2D) projection of the refined 3D point features can be generated and a second depth map can be generated based on the 2D projection of the refined 3D point features.

Type: Application

Filed: April 11, 2024

Publication date: May 8, 2025

Inventors: Yunxiao SHI, Hong CAI, Manish Kumar SINGH, Shizhong Steve HAN, Yinhao ZHU, Fatih Murat PORIKLI
DEPTH ESTIMATION BASED ON FEATURE RECONSTRUCTION WITH ADAPTIVE MASKING AND MOTION PREDICTION

Publication number: 20250148633

Abstract: Systems and techniques are provided for generating depth information. For example, a process can include obtaining a first feature volume including visual features corresponding to each respective frame included in a first set of frames. A first query generator network can generate reconstruction features associated with a reconstructed feature volume corresponding to the first feature volume. Based on the first feature volume, a second query generator network can generate motion features associated with predicted future motion corresponding to the first feature volume. An initial depth prediction can be generated for each respective frame based on cross-attention between features of a depth prediction decoder, the reconstruction features, and the motion features. A refined depth prediction can be generated for each respective based on cross-attention between the initial depth prediction, the reconstruction features, and the motion features.

Type: Application

Filed: May 16, 2024

Publication date: May 8, 2025

Inventors: Rajeev YASARLA, Hong CAI, Risheek GARREPALLI, Yinhao ZHU, Jisoo JEONG, Yunxiao SHI, Manish Kumar SINGH, Fatih Murat PORIKLI
Three-dimensional object part segmentation using a machine learning model

Patent number: 12229883

Abstract: Systems and techniques are provided for part segmentation. For example, a process for performing part segmentation can include obtaining a three-dimensional capture of an object. The method can include generating one or more two-dimensional images of the object from the three-dimensional capture of the object. The method can further include processing the one or more two-dimensional images of the object to generate at least one two-dimensional bounding box associated with a part of the object. The method can include performing three-dimensional part segmentation of the part of the object based on a three-dimensional point cloud generated from the one or more two-dimensional images of the object and the at least one two-dimensional bounding box and based on semantically labeled super points which are merged into subgroups associated with the part of the object.

Type: Grant

Filed: March 1, 2023

Date of Patent: February 18, 2025

Assignee: QUALCOMM Incorporated

Inventors: Minghua Liu, Yinhao Zhu, Hong Cai, Fatih Murat Porikli, Hao Su
TEST-TIME SELF-SUPERVISED GUIDANCE FOR DIFFUSION MODELS

Publication number: 20240412493

Abstract: Systems and techniques are provided for processing image data. According to some aspects, a computing device can generate a gradient (e.g., a classifier gradient using a trained classifier) associated with a current sample. The computing device can combine the gradient with an iterative model estimated score function or data associated with the current sample to generate a score function estimate. The computing device can predict, using the diffusion machine learning model and based on the score function estimate, a new sample.

Type: Application

Filed: December 12, 2023

Publication date: December 12, 2024

Inventors: Risheek GARREPALLI, Yunxiao SHI, Hong CAI, Yinhao ZHU, Shubhankar Mangesh BORSE, Jisoo JEONG, Debasmit DAS, Manish Kumar SINGH, Rajeev YASARLA, Shizhong Steve HAN, Fatih Murat PORIKLI
PLANAR MESH RECONSTRUCTION USING IMAGES FROM MULTIPLE CAMERA POSES

Publication number: 20240386650

Abstract: Systems and techniques are provided for processing image data corresponding to a scene. A process can include generating a planar distance map including a planar distance value for each pixel of at least one image corresponding to the scene. Planar segmentation is performed based on the planar distance map, a normal map corresponding to the at least one image, and positional encoding information of the planar distance map. A triangular mesh fragment is initialized based on sampling points from each planar segment of a plurality of planar segments from the planar segmentation. Ray-triangle intersections are determined based on performing ray casting for a reconstructed planar mesh including a plurality of triangular mesh fragments each corresponding to a different image. A planar reconstruction and segmentation machine learning network is optimized for the scene, based on training the planar reconstruction and segmentation machine learning network using one or more loss functions.

Type: Application

Filed: November 14, 2023

Publication date: November 21, 2024

Inventors: Farhad GHAZVINIAN ZANJANI, Leyla MIRVAKHABOVA, Yinhao ZHU, Hong CAI, Fatih Murat PORIKLI
Neural image compression with controllable spatial bit allocation

Patent number: 12132919

Abstract: A processor-implemented method for image compression using an artificial neural network (ANN) includes receiving, at an encoder of the ANN, an image and a spatial segmentation map corresponding to the image. The spatial segmentation map indicates one or more regions of interest. The encoder compresses the image according to a controllable spatial bit allocation. The controllable spatial bit allocation is based on a learned quantization bin size.

Type: Grant

Filed: November 15, 2022

Date of Patent: October 29, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yang Yang, Hoang Cong Minh Le, Yinhao Zhu, Reza Pourreza, Amir Said, Yizhe Zhang, Taco Sebastiaan Cohen
Transformer-based architecture for transform coding of media

Patent number: 12120348

Abstract: Systems and techniques are described herein for processing media data using a neural network system. For instance, a process can include obtaining a latent representation of a frame of encoded image data and generating, by a plurality of decoder transformer layers of a decoder sub-network using the latent representation of the frame of encoded image data as input, a frame of decoded image data. At least one decoder transformer layer of the plurality of decoder transformer layers includes: one or more transformer blocks for generating one or more patches of features and determine self-attention locally within one or more window partitions and shifted window partitions applied over the one or more patches; and a patch un-merging engine for decreasing a respective size of each patch of the one or more patches.

Type: Grant

Filed: September 27, 2021

Date of Patent: October 15, 2024

Assignee: QUALCOMM INCORPORATED

Inventors: Yinhao Zhu, Yang Yang, Taco Sebastiaan Cohen
PHYSICALLY-BASED EMITTER ESTIMATION FOR INDOOR SCENES

Publication number: 20240303913

Abstract: Systems and techniques are provided for physical-based light estimation for inverse rendering of indoor scenes. For example, a computing device can obtain an estimated scene geometry based on a multi-view observation of a scene. The computing device can further obtain a light emission mask based on the multi-view observation of the scene. The computing device can also obtain an emitted radiance field based on the multi-view observation of the scene. The computing device can then determine, based on the light emission mask and the emitted radiance field, a geometry of at least one light source of the estimated scene geometry.

Type: Application

Filed: March 8, 2023

Publication date: September 12, 2024

Inventors: Yinhao ZHU, Rui ZHU, Hong CAI, Fatih Murat PORIKLI
Progressive data compression using artificial neural networks

Patent number: 12008731

Abstract: Certain aspects of the present disclosure provide techniques for compressing content using a neural network. An example method generally includes receiving content for compression. The content is encoded into a first latent code space through an encoder implemented by an artificial neural network trained to generate a latent space representation of the content. A first compressed version of the encoded content is generated using a first quantization bin size of a series of quantization bin sizes. A refined compressed version of the encoded content is generated by scaling the first compressed version of the encoded content into one or more second quantization bin sizes smaller than the first quantization bin size, conditioned at least on a value of the first compressed version of the encoded content. The refined compressed version of the encoded content is output for transmission.

Type: Grant

Filed: January 24, 2022

Date of Patent: June 11, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yadong Lu, Yang Yang, Yinhao Zhu, Amir Said, Taco Sebastiaan Cohen
SCALING FOR DEPTH ESTIMATION

Publication number: 20240177329

Abstract: Systems and techniques are provided for processing sensor data. For example, a process can include determining, using a trained machine learning system, a predicted depth map for an image, the predicted depth map including a respective predicted depth value for each pixel of the image. The process can further include obtaining depth values for the image, the depth values including depth values for less than all pixels of the image from a tracker configured to determine the depth values based on one or more feature points between frames. The process can further include scaling the predicted depth map for the image using and the depth values. The output of the process can be scale-correct depth prediction values.

Type: Application

Filed: October 4, 2023

Publication date: May 30, 2024

Inventors: Hong CAI, Yinhao ZHU, Jisoo JEONG, Yunxiao SHI, Fatih Murat PORIKLI
THREE-DIMENSIONAL OBJECT PART SEGMENTATION USING A MACHINE LEARNING MODEL

Publication number: 20240144589

Abstract: Systems and techniques are provided for part segmentation. For example, a process for performing part segmentation can include obtaining a three-dimensional capture of an object. The method can include generating one or more two-dimensional images of the object from the three-dimensional capture of the object. The method can further include processing the one or more two-dimensional images of the object to generate at least one two-dimensional bounding box associated with a part of the object. The method can include performing three-dimensional part segmentation of the part of the object based on a three-dimensional point cloud generated from the one or more two-dimensional images of the object and the at least one two-dimensional bounding box and based on semantically labeled super points which are merged into subgroups associated with the part of the object.

Type: Application

Filed: March 1, 2023

Publication date: May 2, 2024

Inventors: Minghua LIU, Yinhao ZHU, Hong CAI, Fatih Murat PORIKLI, Hao SU
Variable bit rate compression using neural network models

Patent number: 11943460

Abstract: A computer-implemented method for operating an artificial neural network (ANN) includes receiving an input by the ANN. The ANN generates a latent representation of the input. The latent representation is communicated according to a bit rate based on a learned latent scaling parameter. The latent scaling parameter is learned based on a channel index and a tradeoff parameter value that corresponds to a value that balances the bit rate and a distortion.

Type: Grant

Filed: January 11, 2022

Date of Patent: March 26, 2024

Assignee: QUALCOMM INCORPORATED

Inventors: Yadong Lu, Yang Yang, Yinhao Zhu, Amir Said, Reza Pourreza, Taco Sebastiaan Cohen
Data compression with a multi-scale autoencoder

Patent number: 11798197

Abstract: A method of image compression includes receiving an image. Multiple quantized latent representations are generated to represent features of the image. Each of the quantized latent representations has a different resolution and is generated at staggered timings. Each of the later generated quantized latent representations is conditioned on each of the prior generated quantized latent representations. The multiple quantized latent representations are decoded to reconstruct the image.

Type: Grant

Filed: March 12, 2021

Date of Patent: October 24, 2023

Assignee: QUALCOMM Incorporated

Inventors: Hoang Cong Minh Le, Reza Pourreza, Yang Yang, Yinhao Zhu, Amir Said, Yizhe Zhang, Taco Sebastiaan Cohen
ENTROPY CODING FOR NEURAL-BASED MEDIA COMPRESSION

Publication number: 20230262267

Abstract: This disclosure describes entropy coding techniques for media data coded using neural-based techniques. A media coder is configured to determine a probability distribution function parameter for a data element of a data stream coded by a neural-based media compression technique, wherein the probability distribution function parameter is a logarithmic function of a standard deviation of a probability distribution function of the data stream, determine a code vector based on the probability distribution function parameter, and entropy code the data element using the code vector.

Type: Application

Filed: February 11, 2022

Publication date: August 17, 2023

Inventors: Amir Said, Yinhao Zhu
FLOW-AGNOSTIC NEURAL VIDEO COMPRESSION

Publication number: 20230169694

Abstract: A processor-implemented method for video compression using an artificial neural network (ANN) includes receiving a video via the ANN. The ANN extracts a first set of features of a current frame of the video and a second set of features of a reference frame of the video. The ANN determines an estimate of correlation features between the first set of features of the current frame and the second set of features of the reference frame. The estimate of the correlation features are encoded and transmitted to a receiver.

Type: Application

Filed: October 27, 2022

Publication date: June 1, 2023

Inventors: Hoang Cong Minh LE, Reza POURREZA, Yang YANG, Yinhao ZHU, Amir SAID, Taco Sebastiaan COHEN
NEURAL IMAGE COMPRESSION WITH CONTROLLABLE SPATIAL BIT ALLOCATION

Publication number: 20230156207

Abstract: A processor-implemented method for image compression using an artificial neural network (ANN) includes receiving, at an encoder of the ANN, an image and a spatial segmentation map corresponding to the image. The spatial segmentation map indicates one or more regions of interest. The encoder compresses the image according to a controllable spatial bit allocation. The controllable spatial bit allocation is based on a learned quantization bin size.

Type: Application

Filed: November 15, 2022

Publication date: May 18, 2023

Inventors: Yang YANG, Hoang Cong Minh LE, Yinhao ZHU, Reza POURREZA, Amir SAID, Yizhe ZHANG, Taco Sebastiaan COHEN
Multi-scale optical flow for learned video compression

Patent number: 11638025

Abstract: Systems and techniques are described for encoding and/or decoding data based on motion estimation that applies variable-scale warping. An encoding device can receive an input frame and a reference frame that depict a scene at different times. The encoding device can generate an optical flow identifying movements in the scene between the two frames. The encoding device can generate a weight map identifying how finely or coarsely the reference frame can be warped for input frame prediction. The encoding device can generate encoded video data based on the optical flow and the weight map. A decoding device can generate a reconstructed optical flow and a reconstructed weight map from the encoded data. A decoding device can generate a prediction frame by warping the reference frame based on the reconstructed optical flow and the reconstructed weight map. The decoding device can generate a reconstructed input frame based on the prediction frame.

Type: Grant

Filed: March 19, 2021

Date of Patent: April 25, 2023

Assignee: QUALCOMM Incorporated

Inventors: Reza Pourreza, Amir Said, Yang Yang, Yinhao Zhu, Taco Sebastiaan Cohen
TRANSFORMER-BASED ARCHITECTURE FOR TRANSFORM CODING OF MEDIA

Publication number: 20230100413

Abstract: Systems and techniques are described herein for processing media data using a neural network system. For instance, a process can include obtaining a latent representation of a frame of encoded image data and generating, by a plurality of decoder transformer layers of a decoder sub-network using the latent representation of the frame of encoded image data as input, a frame of decoded image data. At least one decoder transformer layer of the plurality of decoder transformer layers includes: one or more transformer blocks for generating one or more patches of features and determine self-attention locally within one or more window partitions and shifted window partitions applied over the one or more patches; and a patch un-merging engine for decreasing a respective size of each patch of the one or more patches.

Type: Application

Filed: September 27, 2021

Publication date: March 30, 2023

Inventors: Yinhao ZHU, Yang YANG, Taco Sebastiaan COHEN
MULTI-SCALE OPTICAL FLOW FOR LEARNED VIDEO COMPRESSION

Publication number: 20220303568

Abstract: Systems and techniques are described for encoding and/or decoding data based on motion estimation that applies variable-scale warping. An encoding device can receive an input frame and a reference frame that depict a scene at different times. The encoding device can generate an optical flow identifying movements in the scene between the two frames. The encoding device can generate a weight map identifying how finely or coarsely the reference frame can be warped for input frame prediction. The encoding device can generate encoded video data based on the optical flow and the weight map. A decoding device can generate a reconstructed optical flow and a reconstructed weight map from the encoded data. A decoding device can generate a prediction frame by warping the reference frame based on the reconstructed optical flow and the reconstructed weight map. The decoding device can generate a reconstructed input frame based on the prediction frame.

Type: Application

Filed: March 19, 2021

Publication date: September 22, 2022

Inventors: Reza POURREZA, Amir SAID, Yang YANG, Yinhao ZHU, Taco Sebastiaan COHEN
DATA COMPRESSION WITH A MULTI-SCALE AUTOENCODER

Publication number: 20220292725

Abstract: A method of image compression includes receiving an image. Multiple quantized latent representations are generated to represent features of the image. Each of the quantized latent representations has a different resolution and is generated at staggered timings. Each of the later generated quantized latent representations is conditioned on each of the prior generated quantized latent representations. The multiple quantized latent representations are decoded to reconstruct the image.

Type: Application

Filed: March 12, 2021

Publication date: September 15, 2022

Inventors: Hoang Cong Minh LE, Reza POURREZA, Yang YANG, Yinhao ZHU, Amir SAID, Yizhe ZHANG, Taco Sebastiaan COHEN

1 2 next