Patents by Inventor Sundeep Amirineni

Sundeep Amirineni has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11868875
    Abstract: Provided are systems and methods for operating a neural network processor, wherein the processor includes an input selector circuit that can be configured to select the data that will be input into the processor's computational array. In various implementations, the selector circuit can determine, for a row of the array, whether the row input will be the output from a buffer memory or data that the input selector circuit has selected for a different row. The row can receive an input feature map from a set of input data or an input feature map that was selected for inputting into a different row, such that the input feature map is input into more than one row at a time. The selector circuit can also include a delay circuit, so that the duplicated input feature map can be input into the computational array later than the original input feature map.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: January 9, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Jeffrey T. Huynh, Sundeep Amirineni
  • Publication number: 20230385233
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Application
    Filed: August 8, 2023
    Publication date: November 30, 2023
    Inventors: Thomas A. Volpe, Sundeep Amirineni, Thomas Elmer
  • Publication number: 20230359876
    Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
    Type: Application
    Filed: July 14, 2023
    Publication date: November 9, 2023
    Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
  • Patent number: 11775430
    Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit includes a port and an access engine. The integrated circuit is coupled with a memory device. The access engine is configured to: receive, from an access requester device, a request to access data stored at a memory device; and based on receiving the request: provide, via the port, a sequential access of a plurality of portions of the data to the access requester device; and access the plurality of portions of the data in a parallel form at the memory device for the access requester device. The sequential access can include a sequential write access or a sequential read access of the plurality of portions of the data.
    Type: Grant
    Filed: August 24, 2020
    Date of Patent: October 3, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Akshay Balasubramanian, Eyal Freund
  • Patent number: 11762803
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: April 18, 2022
    Date of Patent: September 19, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11741350
    Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: August 29, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
  • Patent number: 11507378
    Abstract: In one example, an integrated circuit comprises: a memory configured to store a first mapping between a first opcode and first control information and a second mapping between the first opcode and second control information; a processing engine configured to perform processing operations based on the control information; and a controller configured to: at a first time, provide the first opcode to the memory to, based on the first mapping stored in the memory, fetch the first control information for the processing engine, to enable the processing engine to perform a first processing operation based on the first control information; and at a second time, provide the first opcode to the memory to, based on the second mapping stored in the memory, fetch the second control information for the processing engine, to enable the processing engine to perform a second processing operation based on the second control information.
    Type: Grant
    Filed: March 1, 2021
    Date of Patent: November 22, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Sagar Sonar, Kenneth Wayne Patton
  • Publication number: 20220350775
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Application
    Filed: April 18, 2022
    Publication date: November 3, 2022
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11467973
    Abstract: Systems and methods are provided to perform fine-grained memory accesses using a memory controller. The memory controller can access elements stored in memory across multiple dimensions of a matrix. The memory controller can perform accesses to non-contiguous memory locations by skipping zero or more elements across any dimension of the matrix.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni
  • Patent number: 11423313
    Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: August 23, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Kenneth Wayne Patton, Thomas Elmer
  • Publication number: 20220188073
    Abstract: To reduce power consumption, data bits or a portion of a data register that is not expected to toggle frequently can be grouped together, and be clock-gated independently from the rest of the data register. The grouping of the data bits can be determined based on the data types of the workload being operated on. For a data register configured to store a numeric value that supports multiple data types, the portion of the data register being clock-gated may store a group of data bits that are unused for one or more data types of the multiple data types supported by the data register. The portion of the data register being clock-gated can also be a group of data bits that remain unchanged or have a constant value for numeric values within a certain numeric range that is frequently operated on.
    Type: Application
    Filed: December 11, 2020
    Publication date: June 16, 2022
    Inventors: Joshua Wayne Bowman, Thomas A. Volpe, Sundeep Amirineni, Nishith Desai, Ron Diamant
  • Patent number: 11314842
    Abstract: Methods and systems for performing hardware computations of mathematical functions are provided. In one example, a system comprises a mapping table that maps each base value of a plurality of base values to parameters related to a mathematical function; a selection module configured to select, based on an input value, a first base value and first parameters mapped to the first base value in the mapping table; and arithmetic circuits configured to: receive, from the mapping table, the first base value and the first plurality of parameters; and compute, based on a relationship between the input value and the first base value, and based on the first parameters, an estimated output value of the mathematical function for the input value.
    Type: Grant
    Filed: August 7, 2020
    Date of Patent: April 26, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Mohammad El-Shabani, Sundeep Amirineni, Kenneth Wayne Patton, Willis Wang
  • Patent number: 11308027
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11294599
    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: April 5, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Randy Renfu Huang, Sundeep Amirineni, Jeffrey T. Huynh
  • Patent number: 11275997
    Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include memory configured to store a first weight and a second weight; a row of processing elements comprising a first processing element and a second processing element, the first processing element comprising a first weight register, the second processing element comprising a second weight register, both of the first weight register and the second weight register being controllable by a weight load signal; and a controller configured to: provide the first weight from the memory to the row of processing elements; set the weight load signal to enable the first weight to propagate through the row to reach the first processing element; and set the weight load signal to store the first weight at the first weight register and the flush value at the second weight register.
    Type: Grant
    Filed: April 30, 2018
    Date of Patent: March 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Ron Diamant, Sundeep Amirineni
  • Patent number: 11232062
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given columnar bus can receive an input from a prior element of the given columnar bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: January 25, 2022
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11232016
    Abstract: Techniques disclosed herein relate generally to debugging complex computing systems, such as those executing neural networks. A neural network processor includes a processing engine configured to execute instructions to implement multiple layers of a neural network. The neural network processor includes a debugging circuit configured to generate error detection codes for input data to the processing engine or error detection codes for output data generated by the processing engine. The neural network processor also includes an interface to a memory device, where the interface is configured to save the error detection codes generated by the debugging circuit into the memory device. The error detection codes generated by the debugging circuit are compared with expected error detection codes generated using a function model of the neural network to identify defects of the neural network.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: January 25, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant, Sundeep Amirineni, Randy Renfu Huang
  • Patent number: 11048569
    Abstract: Disclosed herein are techniques for preventing or minimizing completion timeout errors on a computer device. An apparatus include a processing logic circuit and a timeout logic. The timeout logic is configured to: generate a timeout event based on a transaction not completed by the processing logic circuit within a timeout period; determine a number of the timeout events generated during a monitoring period; and responsive to determining that the number equals to or exceeds a threshold, reduce the timeout period.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: June 29, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Kiran Kalkunte Seshadri, Sundeep Amirineni, Nafea Bshara, Asif Khan
  • Publication number: 20210158132
    Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Jeffrey T. Huynh, Ron Diamant, Hongbin Zheng, Yizhi Liu, Animesh Jain, Yida Wang, Vinod Sharma, Richard John Heaton, Randy Renfu Huang, Sundeep Amirineni, Drazen Borkovic
  • Patent number: 10990408
    Abstract: Methods for place-and-route aware data pipelining for an integrated circuit device are provided. In large integrated circuits, the physical distance a data signal must travel between a signal source in a master circuit block partition and a signal destination in a servant circuit block partition can exceed the distance the signal can travel in a single clock cycle. To maintain timing requirements of the integrated circuit, a longest physical distance and signal delay for a datapath between master and servant circuit block partitions can be determined and pipelining registers added. Datapaths of master circuit block partitions further away from the servant circuit block can have more pipelining registers added within the master circuit block than datapaths of master circuit block partitions that are closer to the servant circuit block.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: April 27, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Akshay Balasubramanian, Sundeep Amirineni