Patents by Inventor Shang-Tse Chuang

Shang-Tse Chuang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

EFFICIENT TENSOR OPERATIONS USING SLICING

Publication number: 20250068914

Abstract: A system and method of performing tensor operations with a multi-step operation processing system in a memory-efficient manner. The method includes the stages of dividing an N-dimensional tensor into a set of tensor slices. The tensor slices consist of one or more consecutive rows. The tensor slices may further be segmented. The tensor slice segments, along with the dependency data, form based on the tensor dependencies are used for an tensor operation computation to generate a first result. Each processed slice segment is fused into a result slice by removing extra data used in the computation. This process is repeated for each slice to be processed and combined into a final processed tensor result.

Type: Application

Filed: October 30, 2024

Publication date: February 27, 2025

Inventors: Suhail Ibrahim Alnahari, Kai-Er Chuang, Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
Grid Sampling Methodologies in Neural Network Processor Systems

Publication number: 20250061169

Abstract: A system and method for performing tensor transforms on customized digital hardware. The system is comprised of a plurality of computational blocks, a control unit, and memory. The control unit is configured to read the generated source tensor slices for the generated indexes and read from memory the source tensor data and weight data interpolate the tensor slices based on the weights according to a transformation guide. The source tensor data and tensor weights are sent to multiple computational blocks for parallel generation of interpolated output tensor data. The data in memory can be configured so that only one address is needed to read multiple tensor dimensions or a tensor slice. Additionally, the memory can be configured to accept multiple memory addresses in parallel. The computational block output provides a grid-sampled output tensor.

Type: Application

Filed: August 18, 2023

Publication date: February 20, 2025

Inventors: Jeff Xue, Siyad Ma, Shang-Tse Chuang, Sharad Chole
Method and apparatus for scheduling matrix operations in digital processing systems

Patent number: 12229589

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.

Type: Grant

Filed: May 7, 2020

Date of Patent: February 18, 2025

Assignee: Expedera, Inc.

Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
Systems and Processes for Data Reshape and Transport Using Matrix Processor Circuits

Publication number: 20250053614

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.

Type: Application

Filed: October 29, 2024

Publication date: February 13, 2025

Inventors: Ramteja Tadishetti, Vaibhav Vivek Kamat, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
MACHINE LEARNING MODEL SCALABILITY WITH DISTRIBUTED MULTI-LAYER PROCESSING

Publication number: 20250045122

Abstract: Machine learning model scalability with distributed multi-layer processing is disclosed herein. A method for processing and deploying machine learning models that enhances scalability and efficiency by executing a subset of a neural network on each of a plurality of interconnected processing units. The method involves partitioning compute tasks across these processing units to reduce latency, including broadcast and reduction processes for inputs and outputs. It also includes managing the allocation of samples in a batch to specific master processing units within the distributed arrangement and synchronizing computation between fully connected layers within each processing unit. Additionally, the method implements data reduction during the transfer of data across the processing units, wherein data is accumulated with a current processing unit's partial sum as it is transferred to the destination processing unit.

Type: Application

Filed: July 30, 2024

Publication date: February 6, 2025

Inventors: Shang-Tse Chuang, Siyad Chih-Hua Ma, Sharad Vasantrao Chole, Costas Calamvokis
Methods and Apparatus For Recomputing Neural Networks

Publication number: 20250045569

Abstract: Disclosed are systems and methods for processing a multilayer neural network incorporating skip connections while reducing the memory footprint and processing time of processing a neural network. The method comprises loading within a memory partition with a portion of an input tensor and a portion of layer weights associated with computing a portion of the one or more intermediate layer tensors associated with a first portion of the skip connection tensor. Next, a neural processing unit is used to recompute portions of the skip connection tensor using the portion of the input tensor and associated weights. Upon completion, the memory utilized for the recomputing is free for further computations.

Type: Application

Filed: July 10, 2024

Publication date: February 6, 2025

Inventors: Ramteja Tadishetti, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
Methods and Devices for Clock Forwarding and Realignment

Publication number: 20250013259

Abstract: Disclosed are semiconductor devices that implement relaxed clock forwarding between logic blocks. In one embodiment the system includes a set of logic blocks forming a first processing path. Another set of logic blocks form additional processing paths. A clock is configured to forward the data between the logic blocks asynchronously in the first processing path in which the data is asynchronously forwarded between logic blocks. This forwarding is asynchronous with the additional processing paths data and clock. The ends or last logic block in each path can be synchronized using a synchronizer component. The synchronizer can be a plurality of asynchronous FIFOs. In one embodiment, logic blocks form a matrix and the processing paths are along the rows or columns of the matrix.

Type: Application

Filed: June 28, 2024

Publication date: January 9, 2025

Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
Feature extraction with a convolutional neural network

Patent number: 12182717

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.

Type: Grant

Filed: October 18, 2021

Date of Patent: December 31, 2024

Assignee: Expedera, Inc.

Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
Methods And Apparatus For Matrix Processing In A Neural Network Processing System

Publication number: 20240427839

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. Thus, this document discloses apparatus and methods for efficiently processing matrix operations.

Type: Application

Filed: June 21, 2023

Publication date: December 26, 2024

Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
Methods And Apparatus For Managing Weight Data Accesses For Neural Network Processors

Publication number: 20240428046

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the neural processing circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading weight matrix data.

Type: Application

Filed: June 21, 2023

Publication date: December 26, 2024

Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
Power-Efficient Clocking and Clock Shaping

Publication number: 20240377853

Abstract: A power-efficient and clock-shaping clock structure for a digital semiconductor device. The device can include an array of logic blocks. A root-column clock trace is coupled to column-clock traces extending along each column of the array. The clock traces feed the logic block at evenly spaced points to control the delay time for the execution of the logic blocks. The root-column clock trace is fed a clock from a single endpoint that result in a propagation wave of logic blocks execution. The clock structure can include row-clock traces placed across the array rows and coupled to a root-row clock trace. Each logic block can receive a clock from the intersection of the column-clock trace and the row-clock trace. A clock input at a single point where the root-column clock trace and root-row clock trace meet.

Type: Application

Filed: May 10, 2023

Publication date: November 14, 2024

Inventors: Sharad Chole, Shang-Tse Chuang, Siyad Ma, Philippe Sarrazin
Systems and processes for organizing and controlling multiple matrix processor circuits

Patent number: 12141226

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.

Type: Grant

Filed: April 5, 2019

Date of Patent: November 12, 2024

Assignee: Expedera, Inc.

Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
Digital Processing Circuits and Methods of Matrix Operations in an Artificially Intelligent Environment

Publication number: 20240265234

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.

Type: Application

Filed: April 17, 2024

Publication date: August 8, 2024

Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
Method And Apparatus For Efficiently Processing Convolution Neural Network Operations

Publication number: 20240242078

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.

Type: Application

Filed: October 18, 2021

Publication date: July 18, 2024

Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
Methods and apparatus for accessing external memory in a neural network processing system

Patent number: 12008463

Abstract: Artificial intelligence is an extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.

Type: Grant

Filed: June 23, 2022

Date of Patent: June 11, 2024

Assignee: EXPEDERA, INC.

Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
Methods and apparatus for constructing digital circuits for performing matrix operations

Patent number: 11983616

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.

Type: Grant

Filed: October 1, 2018

Date of Patent: May 14, 2024

Assignee: Expedera, Inc.

Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
METHOD AND APPARATUS FOR USING A PACKET ARCHITECTURE TO PROCESS NEURAL NETWORKS IN A NEURAL PROCESSING UNIT

Publication number: 20240152761

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming field. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance for AI applications. Specifically, artificial intelligence generally requires large numbers of matrix operations such that specialized matrix processor circuits can greatly improve performance. To efficiently execute all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with a stream of data and instructions to process or else the matrix processor circuits end up idle. Thus, this document discloses packet architecture for efficiently creating and supplying neural network processors with work packets to process.

Type: Application

Filed: October 20, 2022

Publication date: May 9, 2024

Applicant: Expedera, Inc.

Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
Methods and Apparatus for Accessing External Memory in a Neural Network Processing System

Publication number: 20230023859

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.

Type: Application

Filed: June 23, 2022

Publication date: January 26, 2023

Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
Method and apparatus for efficiently processing convolution neural network operations

Patent number: 11151416

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.

Type: Grant

Filed: September 11, 2019

Date of Patent: October 19, 2021

Assignee: Expedera, Inc.

Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
Method And Apparatus For Efficiently Processing Convolution Neural Network Operations

Publication number: 20210073585

Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.

Type: Application

Filed: September 11, 2019

Publication date: March 11, 2021

Applicant: Expedera, Inc.

Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma

1 2 3 4 next