Patents by Inventor Sharad Vasantrao Chole
Sharad Vasantrao Chole has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250068914Abstract: A system and method of performing tensor operations with a multi-step operation processing system in a memory-efficient manner. The method includes the stages of dividing an N-dimensional tensor into a set of tensor slices. The tensor slices consist of one or more consecutive rows. The tensor slices may further be segmented. The tensor slice segments, along with the dependency data, form based on the tensor dependencies are used for an tensor operation computation to generate a first result. Each processed slice segment is fused into a result slice by removing extra data used in the computation. This process is repeated for each slice to be processed and combined into a final processed tensor result.Type: ApplicationFiled: October 30, 2024Publication date: February 27, 2025Inventors: Suhail Ibrahim Alnahari, Kai-Er Chuang, Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Patent number: 12229589Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.Type: GrantFiled: May 7, 2020Date of Patent: February 18, 2025Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20250053614Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: October 29, 2024Publication date: February 13, 2025Inventors: Ramteja Tadishetti, Vaibhav Vivek Kamat, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20250045569Abstract: Disclosed are systems and methods for processing a multilayer neural network incorporating skip connections while reducing the memory footprint and processing time of processing a neural network. The method comprises loading within a memory partition with a portion of an input tensor and a portion of layer weights associated with computing a portion of the one or more intermediate layer tensors associated with a first portion of the skip connection tensor. Next, a neural processing unit is used to recompute portions of the skip connection tensor using the portion of the input tensor and associated weights. Upon completion, the memory utilized for the recomputing is free for further computations.Type: ApplicationFiled: July 10, 2024Publication date: February 6, 2025Inventors: Ramteja Tadishetti, Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20250045122Abstract: Machine learning model scalability with distributed multi-layer processing is disclosed herein. A method for processing and deploying machine learning models that enhances scalability and efficiency by executing a subset of a neural network on each of a plurality of interconnected processing units. The method involves partitioning compute tasks across these processing units to reduce latency, including broadcast and reduction processes for inputs and outputs. It also includes managing the allocation of samples in a batch to specific master processing units within the distributed arrangement and synchronizing computation between fully connected layers within each processing unit. Additionally, the method implements data reduction during the transfer of data across the processing units, wherein data is accumulated with a current processing unit's partial sum as it is transferred to the destination processing unit.Type: ApplicationFiled: July 30, 2024Publication date: February 6, 2025Inventors: Shang-Tse Chuang, Siyad Chih-Hua Ma, Sharad Vasantrao Chole, Costas Calamvokis
-
Publication number: 20250013259Abstract: Disclosed are semiconductor devices that implement relaxed clock forwarding between logic blocks. In one embodiment the system includes a set of logic blocks forming a first processing path. Another set of logic blocks form additional processing paths. A clock is configured to forward the data between the logic blocks asynchronously in the first processing path in which the data is asynchronously forwarded between logic blocks. This forwarding is asynchronous with the additional processing paths data and clock. The ends or last logic block in each path can be synchronized using a synchronizer component. The synchronizer can be a plurality of asynchronous FIFOs. In one embodiment, logic blocks form a matrix and the processing paths are along the rows or columns of the matrix.Type: ApplicationFiled: June 28, 2024Publication date: January 9, 2025Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Patent number: 12182717Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: GrantFiled: October 18, 2021Date of Patent: December 31, 2024Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20240428046Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the neural processing circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading weight matrix data.Type: ApplicationFiled: June 21, 2023Publication date: December 26, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240427839Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. Thus, this document discloses apparatus and methods for efficiently processing matrix operations.Type: ApplicationFiled: June 21, 2023Publication date: December 26, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Patent number: 12141226Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: GrantFiled: April 5, 2019Date of Patent: November 12, 2024Assignee: Expedera, Inc.Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Publication number: 20240320496Abstract: Artificial intelligence is an extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can greatly increase computational performance. Specifically, artificial intelligence generally requires a large flow of data from different types of memory. To maximize the process of a multilayer neural network, the reordering of data onto and out of a neural network processor, the computations by the matrix of processing elements within the neural network processor, and the synchronization of these activities are reordered.Type: ApplicationFiled: June 7, 2024Publication date: September 26, 2024Inventors: Ramteja Tadishetti, Steven Twu, Arthur Chang, Sharad Vasantrao Chole
-
Publication number: 20240265234Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: ApplicationFiled: April 17, 2024Publication date: August 8, 2024Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240242078Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: ApplicationFiled: October 18, 2021Publication date: July 18, 2024Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Patent number: 11983616Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can greatly increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: GrantFiled: October 1, 2018Date of Patent: May 14, 2024Assignee: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20240152761Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming field. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance for AI applications. Specifically, artificial intelligence generally requires large numbers of matrix operations such that specialized matrix processor circuits can greatly improve performance. To efficiently execute all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with a stream of data and instructions to process or else the matrix processor circuits end up idle. Thus, this document discloses packet architecture for efficiently creating and supplying neural network processors with work packets to process.Type: ApplicationFiled: October 20, 2022Publication date: May 9, 2024Applicant: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Patent number: 11151416Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: GrantFiled: September 11, 2019Date of Patent: October 19, 2021Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20210073585Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: ApplicationFiled: September 11, 2019Publication date: March 11, 2021Applicant: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20200371835Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.Type: ApplicationFiled: May 7, 2020Publication date: November 26, 2020Applicant: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20200226201Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: April 5, 2019Publication date: July 16, 2020Applicant: Expedera, Inc.Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Publication number: 20200104669Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can great increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: ApplicationFiled: October 1, 2018Publication date: April 2, 2020Applicant: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma