Patents by Inventor Yida Wang

Yida Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient utilization of processing element array

Patent number: 12198041

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Grant

Filed: July 14, 2023

Date of Patent: January 14, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Hierarchical partitioning of operators

Patent number: 12182688

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Grant

Filed: November 27, 2019

Date of Patent: December 31, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
EFFICIENT RECOVERY FROM FAILURES DURING DISTRIBUTED TRAINING OF MACHINE LEARNING MODELS

Publication number: 20240428082

Abstract: A placement plan for training state checkpoints of a machine learning model is generated based at least in part on a number of training servers of a distributed training environment. The plan indicates, with respect to an individual server, one or more other servers at which replicas of training state checkpoints of the individual server are to be stored. During selected periods of one or more training iterations of the model, respective portions of a replica of a training state checkpoint of a first server are transmitted to a second server selected based on the placement plan. After an event causes disruption of the training iterations, one of the checkpoints generated at the first server is retrieved from the second server and used to resume the training iterations.

Type: Application

Filed: October 20, 2023

Publication date: December 26, 2024

Applicant: Amazon Technologies, Inc.

Inventors: Zhuang Wang, Zhen Jia, Shuai Zheng, Zhen Zhang, Xinwei Fu, Yida Wang
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20230359876

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

Type: Application

Filed: July 14, 2023

Publication date: November 9, 2023

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Performing hardware operator fusion

Patent number: 11809981

Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.

Type: Grant

Filed: November 27, 2019

Date of Patent: November 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Animesh Jain, Tobias Joseph Kastulus Edler von Koch, Yizhi Liu, Taemin Kim, Jindrich Zejda, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang
Unified optimization for convolutional neural network model inference on integrated graphics processing units

Patent number: 11797876

Abstract: Techniques for optimizing and deploying convolutional neural network (CNN) machine learning models for inference using integrated graphics processing units are described. A model compilation system optimizes CNN models using optimized vision-specific operators as well as both graph-level tuning and tensor-level tuning to explore the optimization space for achieving heightened performance. The model compilation system may also implement a heuristic-based two-stage technique for falling back certain operators of CNN models to use CPUs when needed or otherwise beneficial.

Type: Grant

Filed: June 26, 2019

Date of Patent: October 24, 2023

Assignee: Amazon Technologies, Inc

Inventors: Leyuan Wang, Yida Wang, Mu Li, Zhi Chen, Yizhi Liu, Yao Wang
Efficient utilization of processing element array

Patent number: 11741350

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
HIERARCHICAL PARTITIONING OF OPERATORS

Publication number: 20210158131

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Inventors: Animesh Jain, Yizhi Liu, Hongbin Zheng, Jeffrey T. Huynh, Haichen Li, Drazen Borkovic, Jindrich Zejda, Richard John Heaton, Randy Renfu Huang, Zhi Chen, Yida Wang
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY

Publication number: 20210158132

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

Type: Application

Filed: November 27, 2019

Publication date: May 27, 2021

Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
Systems and methods for reducing memory bandwidth using low quality tiles

Patent number: 10410398

Abstract: Systems and methods are disclosed for displaying data on a display device. An example method of displaying data on a display device includes computing a texture based on a difference between a high quality (HQ) tile and a corresponding low quality (LQ) tile. The method also includes storing the texture into an alpha channel of the LQ tile. The method further includes compositing the LQ tile onto the display device when an attribute of the alpha channel satisfies a threshold.

Type: Grant

Filed: February 20, 2015

Date of Patent: September 10, 2019

Assignee: QUALCOMM Incorporated

Inventors: Shiu Wai Hui, Veluppillai Arulesan, Yida Wang
Partial rasterization of web page tiles

Patent number: 9904665

Abstract: A method and device for rasterizing content includes partitioning a webpage into webpage tiles that are associated with a front buffer and a back buffer. A rasterized version of each of the tiles may be stored in the associated front buffer, and each may include new content. If a previous copy of the at least one tile is found in memory, the new content is rasterized onto the previous copy. If a previous copy is not found, and if the proportion of the new content of the at least one tile is less than a threshold, the new content is rasterized onto the front buffer. If the proportion of the new content is above the threshold, and if unchanged content in the at least one tile is complex, then the unchanged content is copied to the back buffer and the new content is rasterized onto the associated back buffer.

Type: Grant

Filed: October 15, 2015

Date of Patent: February 27, 2018

Assignee: QUALCOMM Innovation Center, Inc.

Inventors: Shiu Wai Hui, Yida Wang, Veluppillai Arulesan
Bandwidth reduction using vertex shader

Patent number: 9811940

Abstract: In graphics rendering, a texture tile is divided into a plurality of partitions, each partition having a plurality of vertices. A map indicates, for each partition, whether each partition comprises a constant color. Then the plurality of vertices are transferred to a vertex shader, which determines that at least one of the partitions comprises a constant color partition. A vertex shader applies a vertex transformation that associates a set of texel coordinates from the texture tile to each of the vertices of the constant color partition to generate a set of associated texel coordinates. A first coordinate of the set of associated texel coordinates is set to zero. A pixel shader interpolates the associated texel coordinates to generate an interpolated value and accesses a single texel in the constant color partition that corresponds to the interpolated value.

Type: Grant

Filed: September 15, 2015

Date of Patent: November 7, 2017

Assignee: QUALCOMM Innovation Center, Inc.

Inventors: Shiu Wai Hui, Yida Wang, Stewart Chao
Bandwidth reduction using texture lookup by adaptive shading

Patent number: 9569862

Abstract: An example method of providing a solid texture map to a graphics processing unit (GPU) includes dividing a tile of renderable content into a plurality of partitions. The method also includes determining that a set of partitions of the plurality of partitions is a solid color. The method further includes generating a solid texture map indicating that the set of partitions of the plurality of partitions is a solid color. The method also includes providing access to the solid texture map to a GPU.

Type: Grant

Filed: February 23, 2015

Date of Patent: February 14, 2017

Assignee: QUALCOMM Incorporated

Inventors: Yida Wang, Shiu Wai Hui, Stewart Chao
SYSTEMS AND METHODS FOR REDUCING MEMORY BANDWIDTH USING LOW QUALITY TILES

Publication number: 20160247310

Abstract: Systems and methods are disclosed for displaying data on a display device. An example method of displaying data on a display device includes computing a texture based on a difference between a high quality (HQ) tile and a corresponding low quality (LQ) tile. The method also includes storing the texture into an alpha channel of the LQ tile. The method further includes compositing the LQ tile onto the display device when an attribute of the alpha channel satisfies a threshold.

Type: Application

Filed: February 20, 2015

Publication date: August 25, 2016

Inventors: Shiu Wai Hui, Veluppillai Arulesan, Yida Wang
BANDWIDTH REDUCTION USING VERTEX SHADER

Publication number: 20160140737

Abstract: In graphics rendering, a texture tile is divided into a plurality of partitions, each partition having a plurality of vertices. A map indicates, for each partition, whether each partition comprises a constant color. Then the plurality of vertices are transferred to a vertex shader, which determines that at least one of the partitions comprises a constant color partition. A vertex shader applies a vertex transformation that associates a set of texel coordinates from the texture tile to each of the vertices of the constant color partition to generate a set of associated texel coordinates. A first coordinate of the set of associated texel coordinates is set to zero. A pixel shader interpolates the associated texel coordinates to generate an interpolated value and accesses a single texel in the constant color partition that corresponds to the interpolated value.

Type: Application

Filed: September 15, 2015

Publication date: May 19, 2016

Inventors: Shiu Wai Hui, Yida Wang, Stewart Chao
PARTIAL RASTERIZATION OF WEB PAGE TILES

Publication number: 20160110323

Abstract: A method and device for rasterizing content includes partitioning a webpage into webpage tiles that are associated with a front buffer and a back buffer. A rasterized version of each of the tiles may be stored in the associated front buffer, and each may include new content. If a previous copy of the at least one tile is found in memory, the new content is rasterized onto the previous copy. If a previous copy is not found, and if the proportion of the new content of the at least one tile is less than a threshold, the new content is rasterized onto the front buffer. If the proportion of the new content is above the threshold, and if unchanged content in the at least one tile is complex, then the unchanged content is copied to the back buffer and the new content is rasterized onto the associated back buffer.

Type: Application

Filed: October 15, 2015

Publication date: April 21, 2016

Inventors: Shiu Wai Hui, Yida Wang, Veluppillai Arulesan
BANDWIDTH REDUCTION USING TEXTURE LOOKUP BY ADAPTIVE SHADING

Publication number: 20160048980

Abstract: An example method of providing a solid texture map to a graphics processing unit (GPU) includes dividing a tile of renderable content into a plurality of partitions. The method also includes determining that a set of partitions of the plurality of partitions is a solid color. The method further includes generating a solid texture map indicating that the set of partitions of the plurality of partitions is a solid color. The method also includes providing access to the solid texture map to a GPU.

Type: Application

Filed: February 23, 2015

Publication date: February 18, 2016

Inventors: Yida Wang, Shiu Wai Hui, Stewart Chao