Patents by Inventor Thomas Elmer

Thomas Elmer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11880682
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: January 23, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Patent number: 11842169
    Abstract: Systems and methods are provided to perform multiplication-delayed-addition operations in a systolic array to increase clock speeds, reduce circuit area, and/or reduce dynamic power consumption. Each processing element in the systolic array can have a pipeline configured to perform a multiplication during a first systolic interval and to perform an accumulation during a second systolic interval. The multiplication result from the first systolic interval can be stored in a delay register for use by the accumulator during the second systolic interval. A skip detection circuit can be used to skip one or more of the multiplication, storing in the delay register, and the addition during skip conditions for improved energy efficiency.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: December 12, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Thomas Elmer
  • Publication number: 20230385233
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Application
    Filed: August 8, 2023
    Publication date: November 30, 2023
    Inventors: Thomas A. Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11816446
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of multiple data types in a systolic array. One or more processing elements in the systolic array can include a shared multiplier and one or more adders. The shared multiplier can include a separate and/or a shared circuitry where the shared circuitry can perform at least a part of integer multiplication and at least a part of non-integer multiplication. The one or more adders can include one or more shared adders or one or more separate adders. The shared adder can include a separate and/or a shared circuitry where the shared circuitry can perform at least a part of integer addition and at least a part of non-integer addition.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: November 14, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas Elmer, Thomas A. Volpe
  • Patent number: 11762803
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: April 18, 2022
    Date of Patent: September 19, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Publication number: 20230266942
    Abstract: A data processing apparatus is provided, which includes addition circuitry that performs a calculation of a sum of a first operand and a second operand. The addition circuitry produces an intermediate data prior to the calculation completing. Determination circuitry uses the intermediate data to produce the sum of the first operand and the second operand plus 1. Further determination circuitry configured to use the intermediate data to produce the sum of the first operand and the second operand plus 2.
    Type: Application
    Filed: February 24, 2022
    Publication date: August 24, 2023
    Inventors: Javier Diaz BRUGUERA, David Raymond LUTZ, Thomas ELMER, Nicholas Andrew PFISTER
  • Publication number: 20230010054
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of at least one normalized number in a systolic array. The systolic array can obtain a first input and detect that the first input is denormal. Based on determining the first input is denormal, the systolic array can generate a first normalized number by normalizing the first input. Processing elements of the systolic array can include a multiplier and an adder. The multiplier can multiply the first normalized number by a second normal or normalized number to generate a multiplier product and the adder can add an input partial sum to the multiplier product to generate an addition result.
    Type: Application
    Filed: September 15, 2022
    Publication date: January 12, 2023
    Inventor: Thomas Elmer
  • Publication number: 20230004384
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Paul Gilbert Meyer, Thomas A Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Publication number: 20230004523
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Paul Gilbert Meyer, Thomas A. Volpe, Ron Diamant, Joshua Wayne Bowman, Nishith Desai, Thomas Elmer
  • Publication number: 20220350775
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Application
    Filed: April 18, 2022
    Publication date: November 3, 2022
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11467806
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of normalized numbers in a systolic array to enable greater computational density, reduce the size of systolic arrays required to perform multiply-accumulate operations of normalized numbers, and/or enable higher throughput operation. The systolic array can be provided normalized numbers by a column of normalizers and can lack support for denormal numbers. Each normalizer can normalize the inputs to each processing element in the systolic array. The systolic array can include a multiplier and an adder. The multiplier can have multiple data paths that correspond to the data type of the input. The multiplier and adder can employ expanded exponent range to operate on normalized floating-point numbers and can lack support for denormal numbers.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Thomas Elmer
  • Patent number: 11422773
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: August 23, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Thomas Elmer, Kiran K Seshadri
  • Patent number: 11423313
    Abstract: Methods and systems for performing hardware approximation of function are provided. In one example, a system comprises a controller, configurable arithmetic circuits, and a mapping table. The mapping table stores a first set of function parameters in a first mode of operation and stores a second set of function parameters in a second mode of operation. Depending on the mode of operation, the controller may configure the arithmetic circuits to compute a first approximation result of a function at an input value based on the first set of function parameters, or to compute a second approximation result of the function at the input value based on the second set of function parameters and to perform post-processing, such as quantization, of the second approximation result.
    Type: Grant
    Filed: December 12, 2018
    Date of Patent: August 23, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Kenneth Wayne Patton, Thomas Elmer
  • Patent number: 11308027
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11308026
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus, and perform arithmetic operations on the input. Each processing element can generate an output partial sum based on the arithmetic operations, provide the input to a next processing element of the given row-oriented bus, without the input being processed by a processing element of the row located between the two processing elements that uses a different row-oriented bus. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Vasanta Kumar Palisetti, Thomas Elmer, Kiran K Seshadri, FNU Arun Kumar
  • Patent number: 11232062
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given columnar bus can receive an input from a prior element of the given columnar bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: January 25, 2022
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11061672
    Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.
    Type: Grant
    Filed: July 5, 2016
    Date of Patent: July 13, 2021
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Thomas Elmer, Nikhil A. Patil
  • Publication number: 20210157549
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of multiple data types in a systolic array to increase clock speeds and/or reduce the size and quantity of systolic arrays required to perform multiply-accumulate operations of multiple data types. Each processing element in the systolic array can have a shared multiplier and one or more adders. The shared multiplier can have a separate and/or a shared circuitry where the shared circuitry is capable of performing at least a part of integer multiplication and at least a part of non-integer multiplication. The one or more adders can be a shared adder or separate adders. The shared adder can have a separate and a shared circuitry wherein the shared circuitry is capable of performing at least a part of integer addition and at least a part of non-integer addition.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Thomas Elmer, Thomas A. Volpe
  • Publication number: 20210157548
    Abstract: Systems and methods are provided to perform multiply-accumulate operations of normalized numbers in a systolic array to enable greater computational density, reduce the size of systolic arrays required to perform multiply-accumulate operations of normalized numbers, and/or enable higher throughput operation. The systolic array can be provided normalized numbers by a column of normalizers and can lack support for denormal numbers. Each normalizer can normalize the inputs to each processing element in the systolic array. The systolic array can include a multiplier and an adder. The multiplier can have multiple data paths that correspond to the data type of the input. The multiplier and adder can employ expanded exponent range to operate on normalized floating-point numbers and can lack support for denormal numbers.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventor: Thomas Elmer
  • Patent number: 10983754
    Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: April 20, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni