Patents by Inventor Shang-Tse Chuang

Shang-Tse Chuang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240152761
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming field. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance for AI applications. Specifically, artificial intelligence generally requires large numbers of matrix operations such that specialized matrix processor circuits can greatly improve performance. To efficiently execute all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with a stream of data and instructions to process or else the matrix processor circuits end up idle. Thus, this document discloses packet architecture for efficiently creating and supplying neural network processors with work packets to process.
    Type: Application
    Filed: October 20, 2022
    Publication date: May 9, 2024
    Applicant: Expedera, Inc.
    Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
  • Publication number: 20230023859
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized matrix processor circuits can improve performance. To perform all these matrix operations, the matrix processor circuits must be quickly and efficiently supplied with data to process or else the matrix processor circuits end up idle or spending large amounts of time loading in different weight matrix data.
    Type: Application
    Filed: June 23, 2022
    Publication date: January 26, 2023
    Inventors: Siyad Ma, Shang-Tse Chuang, Sharad Chole
  • Patent number: 11151416
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: October 19, 2021
    Assignee: Expedera, Inc.
    Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
  • Publication number: 20210073585
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. One of the most important applications for artificial intelligence is object recognition and classification from digital images. Convolutional neural networks have proven to be a very effective tool for object recognition and classification from digital images. However, convolutional neural networks are extremely computationally intensive thus requiring high-performance processors, significant computation time, and significant energy consumption. To reduce the computation time and energy consumption a “cone of dependency” and “cone of influence” processing techniques are disclosed. These two techniques arrange the computations required in a manner that minimizes memory accesses such that computations may be performed in local cache memory. These techniques significantly reduce the time to perform the computations and the energy consumed by the hardware implementing a convolutional neural network.
    Type: Application
    Filed: September 11, 2019
    Publication date: March 11, 2021
    Applicant: Expedera, Inc.
    Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
  • Publication number: 20200371835
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is an extremely computationally intensive field such that performing artificial intelligence calculations can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence applications can be performed in parallel such that specialized linear algebra matrix processors can greatly increase computational performance. But even with linear algebra matrix processors; performance can be limited due to complex data dependencies. Without proper coordination, linear algebra matrix processors may end up idle or spending large amounts of time moving data around. Thus, this document discloses methods for efficiently scheduling linear algebra matrix processors.
    Type: Application
    Filed: May 7, 2020
    Publication date: November 26, 2020
    Applicant: Expedera, Inc.
    Inventors: Shang-Tse Chuang, Sharad Vasantrao Chole, Siyad Chih-Hua Ma
  • Publication number: 20200226201
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is extremely computationally intensive field such that it can be expensive, time consuming, and energy consuming. Fortunately, many of the calculations required for artificial intelligence can be performed in parallel such that specialized processors can great increase computational performance. Specifically, artificial intelligence generally requires large numbers of matrix operations to implement neural networks such that specialized Matrix Processor circuits can improve performance. But a neural network is more than a collection of matrix operations; it is a set of specifically coordinated matrix operations with complex data dependencies. Without proper coordination, Matrix Processor circuits may end up idle or spending large amounts of time loading in different weight matrix data.
    Type: Application
    Filed: April 5, 2019
    Publication date: July 16, 2020
    Applicant: Expedera, Inc.
    Inventors: Siyad Chih-Hua Ma, Shang-Tse Chuang, Sharad Vasantrao Chole
  • Publication number: 20200104669
    Abstract: Artificial intelligence is an increasingly important sector of the computer industry. However, artificial intelligence is very computationally intensive field. Fortunately, many of the required calculations can be performed in parallel such that specialized processors can great increase computation performance. In particular, Graphics Processor Units (GPUs) are often used in artificial intelligence. Although GPUs have helped, they are not ideal for artificial intelligence. Specifically, GPUs are used to compute matrix operations in one direction with a pipelined architecture. However, artificial intelligence is a field that uses both forward propagation computations and back propagation calculations. To efficiently perform artificial intelligence calculations, a symmetric matrix processing element is introduced. The symmetric matrix processing element can perform forward propagation and backward propagation calculations just as easily.
    Type: Application
    Filed: October 1, 2018
    Publication date: April 2, 2020
    Applicant: Expedera, Inc.
    Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Siyad Chih-Hua Ma
  • Patent number: 10042573
    Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.
    Type: Grant
    Filed: July 19, 2016
    Date of Patent: August 7, 2018
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang
  • Patent number: 9965211
    Abstract: Provided are a method, a non-transitory computer-readable storage device and an apparatus for managing use of a shared memory buffer that is partitioned into multiple banks and that stores incoming data received at multiple inputs in accordance with a multi-slice architecture. A particular bank is allocated to a corresponding slice. Received respective data packets are associated with corresponding slices based on which respective inputs they are received. Determine, based on a state of the shared memory buffer, to transfer contents of all occupied cells of the particular bank. Writes to the bank are stopped, contents of occupied cells are transferred to cells of one or more other banks associated with the particular bank's slice, information is stored indicating where the contents have been transferred, and the particular bank is returned to a shared pool after transferring is completed.
    Type: Grant
    Filed: September 8, 2016
    Date of Patent: May 8, 2018
    Assignee: Cisco Technology, Inc.
    Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Georges Akis, Felice Bonardi, Rong Pan
  • Publication number: 20180067683
    Abstract: Provided are a method, a non-transitory computer-readable storage device and an apparatus for managing use of a shared memory buffer that is partitioned into multiple banks and that stores incoming data received at multiple inputs in accordance with a multi-slice architecture. A particular bank is allocated to a corresponding slice. Received respective data packets are associated with corresponding slices based on which respective inputs they are received. Determine, based on a state of the shared memory buffer, to transfer contents of all occupied cells of the particular bank. Writes to the bank are stopped, contents of occupied cells are transferred to cells of one or more other banks associated with the particular bank's slice, information is stored indicating where the contents have been transferred, and the particular bank is returned to a shared pool after transferring is completed.
    Type: Application
    Filed: September 8, 2016
    Publication date: March 8, 2018
    Inventors: Sharad Vasantrao Chole, Shang-Tse Chuang, Georges Akis, Felice Bonardi, Rong Pan
  • Patent number: 9678669
    Abstract: Designing memory subsystems for integrated circuits can be time-consuming and costly task. To reduce development time and costs, an automated system and method for designing and constructing high-speed memory operations is disclosed. The automated system accepts a set of desired memory characteristics and then methodically selects different potential memory system design types and different implementations of each memory system design type. The potential memory system design types may include traditional memory systems, optimized traditional memory systems, intelligent memory systems, and hierarchical memory systems. A selected set of proposed memory systems that meet the specified set of desired memory characteristics is output to a circuit designer. When a circuit designer selects a proposed memory system, the automated system generates a complete memory system design, a model for the memory system, and a test suite for the memory system.
    Type: Grant
    Filed: November 18, 2013
    Date of Patent: June 13, 2017
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Sanjeev Joshi, Shang-Tse Chuang
  • Patent number: 9520178
    Abstract: Static random access memory (SRAM) circuits are used in most digital integrated circuits to store representations of data bits. To handle multiple concurrent memory requests, an efficient dual-port six transistor (6T) SRAM bit cell is proposed. The dual-port 6T SRAM cell uses independent word lines and bit lines such that the true/data side and the false/data-complement side of the SRAM bit cell may be accessed independently. Single-ended reads allow the two independent word lines and bit lines to handle two independent read operations in a single cycle using spatial domain multiplexing. Single-ended writes are enabled by adjusting the VDD power voltage supplied to a memory cell when writes are performed such that a single word line and bit line pair can be used write either a logical “0” or logical “1” into either side of the SRAM bit cell.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: December 13, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen
  • Publication number: 20160328170
    Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.
    Type: Application
    Filed: July 19, 2016
    Publication date: November 10, 2016
    Inventors: Sundar Iyer, Shang-Tse Chuang
  • Patent number: 9442846
    Abstract: A system and method for designing and constructing hierarchical memory systems is disclosed. A plurality of different algorithmic memory blocks are disclosed. Each algorithmic memory block includes a memory controller that implements a specific storage algorithm and a set of lower level memory components. Each of those lower level memory components may be constructed with another algorithmic memory block or with a fundamental memory block. By organizing algorithmic memory blocks in various different hierarchical organizations, may different complex memory systems that provide new features may be created.
    Type: Grant
    Filed: August 17, 2010
    Date of Patent: September 13, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang
  • Patent number: 9390212
    Abstract: Multi-port memory circuits are often required within modern digital integrated circuits to store data. Multi-port memory circuits allow multiple memory users to access the same memory cell simultaneously. Multi-port memory circuits are generally custom-designed in order to obtain the best performance or synthesized with logic synthesis tools for quick design. However, these two options for creating multi-port memory give integrated circuit designers a stark choice: invest a large amount of time and money to custom design an efficient multi-port memory system or allow logic synthesis tools to inefficiently create multi-port memory. An intermediate solution is disclosed that allows an efficient multi-port memory array to be created largely using standard circuit cell components and register transfer level hardware design language code.
    Type: Grant
    Filed: May 4, 2015
    Date of Patent: July 12, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen, Sanjeev Joshi, Adam Kablanian
  • Publication number: 20160179394
    Abstract: Designing memory subsystems for integrated circuits can be time-consuming and costly task. To reduce development time and costs, an automated system and method for designing and constructing high-speed memory operations is disclosed. The automated system accepts a set of desired memory characteristics and then methodically selects different potential memory system design types and different implementations of each memory system design type. The potential memory system design types may include traditional memory systems, optimized traditional memory systems, intelligent memory systems, and hierarchical memory systems. A selected set of proposed memory systems that meet the specified set of desired memory characteristics is output to a circuit designer. When a circuit designer selects a proposed memory system, the automated system generates a complete memory system design, a model for the memory system, and a test suite for the memory system.
    Type: Application
    Filed: November 18, 2013
    Publication date: June 23, 2016
    Inventors: Sundar Iyer, Sanjeev Joshi, Shang-Tse Chuang
  • Patent number: 9293187
    Abstract: Dynamic memory systems require each memory cell to be continually refreshed. During a memory refresh operation, the refreshed memory cells cannot be accessed by a memory read or write operation. In multi-bank dynamic memory systems, concurrent refresh systems allow memory refresh circuitry to refresh memory banks that are not currently involved in memory access operations. To efficiently refresh memory banks and advanced round robin refresh system refreshes memory banks in a nominal round robin manner but skips memory banks blocked by memory access operations. Skipped memory banks are prioritized and then refreshed when they are no longer blocked.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: March 22, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang
  • Patent number: 9280464
    Abstract: A system and method for providing high-speed memory operations is disclosed. The technique uses virtualization of memory space to map a virtual address space to a larger physical address space wherein no memory bank conflicts will occur. The larger physical address space is used to prevent memory bank conflicts from occurring by moving the virtualized memory addresses of data being written to memory to a different location in physical memory that will eliminate a memory bank conflict. This allows the memory system to both store and read data in the same cycle with no conflicts.
    Type: Grant
    Filed: June 4, 2015
    Date of Patent: March 8, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Shang-Tse Chuang
  • Publication number: 20150357030
    Abstract: Static random access memory (SRAM) circuits are used in most digital integrated circuits to store representations of data bits. To handle multiple concurrent memory requests, an efficient dual-port six transistor (6T) SRAM bit cell is proposed. The dual-port 6T SRAM cell uses independent word lines and bit lines such that the true/data side and the false/data-complement side of the SRAM bit cell may be accessed independently. Single-ended reads allow the two independent word lines and bit lines to handle two independent read operations in a single cycle using spatial domain multiplexing. Single-ended writes are enabled by adjusting the VDD power voltage supplied to a memory cell when writes are performed such that a single word line and bit line pair can be used write either a logical “0” or logical “1” into either side of the SRAM bit cell.
    Type: Application
    Filed: August 20, 2015
    Publication date: December 10, 2015
    Inventors: Sundar Iyer, Shang-Tse Chuang, Thu Nguyen
  • Publication number: 20150339227
    Abstract: A system and method for providing high-speed memory operations is disclosed. The technique uses virtualization of memory space to map a virtual address space to a larger physical address space wherein no memory bank conflicts will occur. The larger physical address space is used to prevent memory bank conflicts from occurring by moving the virtualized memory addresses of data being written to memory to a different location in physical memory that will eliminate a memory bank conflict. This allows the memory system to both store and read data in the same cycle with no conflicts.
    Type: Application
    Filed: June 4, 2015
    Publication date: November 26, 2015
    Inventors: Sundar Iyer, Shang-Tse Chuang