Patents by Inventor Shang-Tse Chuang
Shang-Tse Chuang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250068914Abstract: A system and method of performing tensor operations with a multi-step operation processing system in a memory-efficient manner. The method includes the stages of dividing an N-dimensional tensor into a set of tensor slices. The tensor slices consist of one or more consecutive rows. The tensor slices may further be segmented. The tensor slice segments, along with the dependency data, form based on the tensor dependencies are used for an tensor operation computation to generate a first result. Each processed slice segment is fused into a result slice by removing extra data used in the computation. This process is repeated for each slice to be processed and combined into a final processed tensor result.Type: ApplicationFiled: October 30, 2024Publication date: February 27, 2025Inventors: Suhail Ibrahim Alnahari, Kai-Er Chuang, Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Publication number: 20250061169Abstract: A system and method for performing tensor transforms on customized digital hardware. The system is comprised of a plurality of computational blocks, a control unit, and memory. The control unit is configured to read the generated source tensor slices for the generated indexes and read from memory the source tensor data and weight data interpolate the tensor slices based on the weights according to a transformation guide. The source tensor data and tensor weights are sent to multiple computational blocks for parallel generation of interpolated output tensor data. The data in memory can be configured so that only one address is needed to read multiple tensor dimensions or a tensor slice. Additionally, the memory can be configured to accept multiple memory addresses in parallel. The computational block output provides a grid-sampled output tensor.Type: ApplicationFiled: August 18, 2023Publication date: February 20, 2025Inventors: Jeff Xue, Siyad Ma, Shang-Tse Chuang, Sharad Chole
-
Patent number: 12229589Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.Type: GrantFiled: May 7, 2020Date of Patent: February 18, 2025Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20250053614Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: October 29, 2024Publication date: February 13, 2025Inventors: Ramteja Tadishetti, Vaibhav Vivek Kamat, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20250045122Abstract: Machine learning model scalability with distributed multi-layer processing is disclosed herein. A method for processing and deploying machine learning models that enhances scalability and efficiency by executing a subset of a neural network on each of a plurality of interconnected processing units. The method involves partitioning compute tasks across these processing units to reduce latency, including broadcast and reduction processes for inputs and outputs. It also includes managing the allocation of samples in a batch to specific master processing units within the distributed arrangement and synchronizing computation between fully connected layers within each processing unit. Additionally, the method implements data reduction during the transfer of data across the processing units, wherein data is accumulated with a current processing unit's partial sum as it is transferred to the destination processing unit.Type: ApplicationFiled: July 30, 2024Publication date: February 6, 2025Inventors: Shang-Tse Chuang, Siyad Chih-Hua Ma, Sharad Vasantrao Chole, Costas Calamvokis
-
Publication number: 20250045569Abstract: Disclosed are systems and methods for processing a multilayer neural network incorporating skip connections while reducing the memory footprint and processing time of processing a neural network. The method comprises loading within a memory partition with a portion of an input tensor and a portion of layer weights associated with computing a portion of the one or more intermediate layer tensors associated with a first portion of the skip connection tensor. Next, a neural processing unit is used to recompute portions of the skip connection tensor using the portion of the input tensor and associated weights. Upon completion, the memory utilized for the recomputing is free for further computations.Type: ApplicationFiled: July 10, 2024Publication date: February 6, 2025Inventors: Ramteja Tadishetti, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20250013259Abstract: Disclosed are semiconductor devices that implement relaxed clock forwarding between logic blocks. In one embodiment the system includes a set of logic blocks forming a first processing path. Another set of logic blocks form additional processing paths. A clock is configured to forward the data between the logic blocks asynchronously in the first processing path in which the data is asynchronously forwarded between logic blocks. This forwarding is asynchronous with the additional processing paths data and clock. The ends or last logic block in each path can be synchronized using a synchronizer component. The synchronizer can be a plurality of asynchronous FIFOs. In one embodiment, logic blocks form a matrix and the processing paths are along the rows or columns of the matrix.Type: ApplicationFiled: June 28, 2024Publication date: January 9, 2025Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Patent number: 12182717Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: GrantFiled: October 18, 2021Date of Patent: December 31, 2024Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20240427839Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. Thus, this document discloses apparatus and methods for efficiently processing matrix operations.Type: ApplicationFiled: June 21, 2023Publication date: December 26, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240428046Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the neural processing circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading weight matrix data.Type: ApplicationFiled: June 21, 2023Publication date: December 26, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240377853Abstract: A power-efficient and clock-shaping clock structure for a digital semiconductor device. The device can include an array of logic blocks. A root-column clock trace is coupled to column-clock traces extending along each column of the array. The clock traces feed the logic block at evenly spaced points to control the delay time for the execution of the logic blocks. The root-column clock trace is fed a clock from a single endpoint that result in a propagation wave of logic blocks execution. The clock structure can include row-clock traces placed across the array rows and coupled to a root-row clock trace. Each logic block can receive a clock from the intersection of the column-clock trace and the row-clock trace. A clock input at a single point where the root-column clock trace and root-row clock trace meet.Type: ApplicationFiled: May 10, 2023Publication date: November 14, 2024Inventors: Sharad Chole, Shang-Tse Chuang, Siyad Ma, Philippe Sarrazin
-
Patent number: 12141226Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: GrantFiled: April 5, 2019Date of Patent: November 12, 2024Assignee: Expedera, Inc.Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Publication number: 20240265234Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: ApplicationFiled: April 17, 2024Publication date: August 8, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240242078Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: ApplicationFiled: October 18, 2021Publication date: July 18, 2024Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Patent number: 12008463Abstract: Artificial intelligence is an extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.Type: GrantFiled: June 23, 2022Date of Patent: June 11, 2024Assignee: EXPEDERA, INC.Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
-
Patent number: 11983616Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: GrantFiled: October 1, 2018Date of Patent: May 14, 2024Assignee: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240152761Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming field. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance for AI applications. Specifically, artificial intelligence generally requires large numbers of matrix operations such that specialized matrix processor circuits can greatly improve performance. To efficiently execute all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with a stream of data and instructions to process or else the matrix processor circuits end up idle. Thus, this document discloses packet architecture for efficiently creating and supplying neural network processors with work packets to process.Type: ApplicationFiled: October 20, 2022Publication date: May 9, 2024Applicant: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20230023859Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: June 23, 2022Publication date: January 26, 2023Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
-
Patent number: 11151416Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: GrantFiled: September 11, 2019Date of Patent: October 19, 2021Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20210073585Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: ApplicationFiled: September 11, 2019Publication date: March 11, 2021Applicant: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma