Patents by Inventor Shang-Tse Chuang
Shang-Tse Chuang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240152761Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming field. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance for AI applications. Specifically, artificial intelligence generally requires large numbers of matrix operations such that specialized matrix processor circuits can greatly improve performance. To efficiently execute all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with a stream of data and instructions to process or else the matrix processor circuits end up idle. Thus, this document discloses packet architecture for efficiently creating and supplying neural network processors with work packets to process.Type: ApplicationFiled: October 20, 2022Publication date: May 9, 2024Applicant: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Publication number: 20230023859Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: June 23, 2022Publication date: January 26, 2023Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
-
Patent number: 11151416Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: GrantFiled: September 11, 2019Date of Patent: October 19, 2021Assignee: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20210073585Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.Type: ApplicationFiled: September 11, 2019Publication date: March 11, 2021Applicant: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20200371835Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.Type: ApplicationFiled: May 7, 2020Publication date: November 26, 2020Applicant: Expedera, Inc.Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
-
Publication number: 20200226201Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.Type: ApplicationFiled: April 5, 2019Publication date: July 16, 2020Applicant: Expedera, Inc.Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
-
Publication number: 20200104669Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can great increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.Type: ApplicationFiled: October 1, 2018Publication date: April 2, 2020Applicant: Expedera, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
-
Patent number: 10042573Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.Type: GrantFiled: July 19, 2016Date of Patent: August 7, 2018Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang
-
Patent number: 9965211Abstract: Provided are a method, a non-transitory computer-readable storage device and an apparatus for managing use of a shared memory buffer that is partitioned into multiple banks and that stores incoming data received at multiple inputs in accordance with a multi-slice architecture. A particular bank is allocated to a corresponding slice. Received respective data packets are associated with corresponding slices based on which respective inputs they are received. Determine, based on a state of the shared memory buffer, to transfer contents of all occupied cells of the particular bank. Writes to the bank are stopped, contents of occupied cells are transferred to cells of one or more other banks associated with the particular bank's slice, information is stored indicating where the contents have been transferred, and the particular bank is returned to a shared pool after transferring is completed.Type: GrantFiled: September 8, 2016Date of Patent: May 8, 2018Assignee: Cisco Technology, Inc.Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Georges Akis, Felice Bonardi, Rong Pan
-
Publication number: 20180067683Abstract: Provided are a method, a non-transitory computer-readable storage device and an apparatus for managing use of a shared memory buffer that is partitioned into multiple banks and that stores incoming data received at multiple inputs in accordance with a multi-slice architecture. A particular bank is allocated to a corresponding slice. Received respective data packets are associated with corresponding slices based on which respective inputs they are received. Determine, based on a state of the shared memory buffer, to transfer contents of all occupied cells of the particular bank. Writes to the bank are stopped, contents of occupied cells are transferred to cells of one or more other banks associated with the particular bank's slice, information is stored indicating where the contents have been transferred, and the particular bank is returned to a shared pool after transferring is completed.Type: ApplicationFiled: September 8, 2016Publication date: March 8, 2018Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Georges Akis, Felice Bonardi, Rong Pan
-
Patent number: 9678669Abstract: Designing memory subsystems for integrated circuits can be time-consuming and costly task. To reduce development time and costs, an automated system and method for designing and constructing high-speed memory operations is disclosed. The automated system accepts a set of desired memory characteristics and then methodically selects different potential memory system design types and different implementations of each memory system design type. The potential memory system design types may include traditional memory systems, optimized traditional memory systems, intelligent memory systems, and hierarchical memory systems. A selected set of proposed memory systems that meet the specified set of desired memory characteristics is output to a circuit designer. When a circuit designer selects a proposed memory system, the automated system generates a complete memory system design, a model for the memory system, and a test suite for the memory system.Type: GrantFiled: November 18, 2013Date of Patent: June 13, 2017Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Sanjeev Joshi, Shang-Tse Chuang
-
Patent number: 9520178Abstract: Static random access memory (SRAM) circuits are used in most digital integrated circuits to store representations of data bits. To handle multiple concurrent memory requests, an efficient dual-port six transistor (6T) SRAM bit cell is proposed. The dual-port 6T SRAM cell uses independent word lines and bit lines such that the true/data side and the false/data-complement side of the SRAM bit cell may be accessed independently. Single-ended reads allow the two independent word lines and bit lines to handle two independent read operations in a single cycle using spatial domain multiplexing. Single-ended writes are enabled by adjusting the VDD power voltage supplied to a memory cell when writes are performed such that a single word line and bit line pair can be used write either a logical “0” or logical “1” into either side of the SRAM bit cell.Type: GrantFiled: August 20, 2015Date of Patent: December 13, 2016Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen
-
Publication number: 20160328170Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.Type: ApplicationFiled: July 19, 2016Publication date: November 10, 2016Inventors: Sundar Iyer, Shang-Tse Chuang
-
Patent number: 9442846Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.Type: GrantFiled: August 17, 2010Date of Patent: September 13, 2016Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang
-
Patent number: 9390212Abstract: Multi-port memory circuits are often required within modern digital integrated circuits to store data. Multi-port memory circuits allow multiple memory users to access the same memory cell simultaneously. Multi-port memory circuits are generally custom-designed in order to obtain the best performance or synthesized with logic synthesis tools for quick design. However, these two options for creating multi-port memory give integrated circuit designers a stark choice: invest a large amount of time and money to custom design an efficient multi-port memory system or allow logic synthesis tools to inefficiently create multi-port memory. An intermediate solution is disclosed that allows an efficient multi-port memory array to be created largely using standard circuit cell components and register transfer level hardware design language code.Type: GrantFiled: May 4, 2015Date of Patent: July 12, 2016Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen, Sanjeev Joshi, Adam Kablanian
-
Publication number: 20160179394Abstract: Designing memory subsystems for integrated circuits can be time-consuming and costly task. To reduce development time and costs, an automated system and method for designing and constructing high-speed memory operations is disclosed. The automated system accepts a set of desired memory characteristics and then methodically selects different potential memory system design types and different implementations of each memory system design type. The potential memory system design types may include traditional memory systems, optimized traditional memory systems, intelligent memory systems, and hierarchical memory systems. A selected set of proposed memory systems that meet the specified set of desired memory characteristics is output to a circuit designer. When a circuit designer selects a proposed memory system, the automated system generates a complete memory system design, a model for the memory system, and a test suite for the memory system.Type: ApplicationFiled: November 18, 2013Publication date: June 23, 2016Inventors: Sundar Iyer, Sanjeev Joshi, Shang-Tse Chuang
-
Patent number: 9293187Abstract: Dynamic memory systems require each memory cell to be continually refreshed. During a memory refresh operation, the refreshed memory cells cannot be accessed by a memory read or write operation. In multi-bank dynamic memory systems, concurrent refresh systems allow memory refresh circuitry to refresh memory banks that are not currently involved in memory access operations. To efficiently refresh memory banks and advanced round robin refresh system refreshes memory banks in a nominal round robin manner but skips memory banks blocked by memory access operations. Skipped memory banks are prioritized and then refreshed when they are no longer blocked.Type: GrantFiled: September 26, 2011Date of Patent: March 22, 2016Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang
-
Patent number: 9280464Abstract: A system and method for providing high-speed memory operations is disclosed. The technique uses virtualization of memory space to map a virtual address space to a larger physical address space wherein no memory bank conflicts will occur. The larger physical address space is used to prevent memory bank conflicts from occurring by moving the virtualized memory addresses of data being written to memory to a different location in physical memory that will eliminate a memory bank conflict. This allows the memory system to both store and read data in the same cycle with no conflicts.Type: GrantFiled: June 4, 2015Date of Patent: March 8, 2016Assignee: Cisco Technology, Inc.Inventors: Sundar Iyer, Shang-Tse Chuang
-
Publication number: 20150357030Abstract: Static random access memory (SRAM) circuits are used in most digital integrated circuits to store representations of data bits. To handle multiple concurrent memory requests, an efficient dual-port six transistor (6T) SRAM bit cell is proposed. The dual-port 6T SRAM cell uses independent word lines and bit lines such that the true/data side and the false/data-complement side of the SRAM bit cell may be accessed independently. Single-ended reads allow the two independent word lines and bit lines to handle two independent read operations in a single cycle using spatial domain multiplexing. Single-ended writes are enabled by adjusting the VDD power voltage supplied to a memory cell when writes are performed such that a single word line and bit line pair can be used write either a logical “0” or logical “1” into either side of the SRAM bit cell.Type: ApplicationFiled: August 20, 2015Publication date: December 10, 2015Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen
-
Publication number: 20150339227Abstract: A system and method for providing high-speed memory operations is disclosed. The technique uses virtualization of memory space to map a virtual address space to a larger physical address space wherein no memory bank conflicts will occur. The larger physical address space is used to prevent memory bank conflicts from occurring by moving the virtualized memory addresses of data being written to memory to a different location in physical memory that will eliminate a memory bank conflict. This allows the memory system to both store and read data in the same cycle with no conflicts.Type: ApplicationFiled: June 4, 2015Publication date: November 26, 2015Inventors: Sundar Iyer, Shang-Tse Chuang