Patents by Inventor Thomas Elmer
Thomas Elmer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10817260Abstract: Systems and methods are provided to skip multiplication operations with zeros in processing elements of the systolic array to reduce dynamic power consumption. A value of zero can be detected on an input data element entering each row of the array and respective zero indicators may be generated. These respective zero indicators may be passed to all the processing elements in the respective rows. The multiplication operation with the zero value can be skipped in each processing element based on the zero indicators, thus reducing dynamic power consumption.Type: GrantFiled: June 13, 2018Date of Patent: October 27, 2020Assignee: Amazon Technologies, Inc.Inventors: Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni, Thomas A. Volpe
-
Publication number: 20200293284Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. In one example, an apparatus comprises a first circuit, a second circuit, and a third circuit. The first circuit is configured to: receive first values in a first format, the first values being generated from one or more asymmetric quantization operations of second values in a second format, and generate difference values based on subtracting a third value from each of the first values, the third value representing a zero value in the first format. The second circuit is configured to generate a sum of products in the first format using the difference values. The third circuit is configured to convert the sum of products from the first format to the second format based on scaling the sum of products with a scaling factor.Type: ApplicationFiled: June 2, 2020Publication date: September 17, 2020Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni
-
Patent number: 10674882Abstract: An adapter device for arranging a container (4, 6) on the upper face (5) of a cleaning device (9), said adapter device consisting of an adapter plate (1, 2) with a support surface (13, 15) and at least one locking element for locking the container in place on the (4, 6) adapter plate. The adapter plate (1, 2) is divided into at least two parts and consists of a front adapter plate (1) and a rear adapter plate (2) which can be separately mounted on the upper face (5) of the cleaning device (9).Type: GrantFiled: September 15, 2015Date of Patent: June 9, 2020Assignee: NILFISK A/SInventors: Henrik Mathiassen, Steen Klimt Johannesen, Thomas Elmer, Trine Baek Nielsen
-
Patent number: 10678508Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.Type: GrantFiled: March 23, 2018Date of Patent: June 9, 2020Assignee: Amazon Technologies, Inc.Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni
-
Publication number: 20190294413Abstract: Disclosed herein are techniques for accelerating convolution operations or other matrix multiplications in applications such as neural network. A computer-implemented method includes receiving low-precision inputs for a convolution operation from a storage device, and subtracting a low-precision value representing a high-precision zero value from the low-precision inputs to generate difference values, where the low-precision inputs are asymmetrically quantized from high-precision inputs. The method also includes performing multiplication and summation operations on the difference values to generate a sum of products, and generating a high-precision output by scaling the sum of products with a scaling factor.Type: ApplicationFiled: March 23, 2018Publication date: September 26, 2019Inventors: Dana Michelle Vantrease, Randy Huang, Ron Diamant, Thomas Elmer, Sundeep Amirineni
-
Patent number: 10078512Abstract: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp?Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to ?K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds ?ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.Type: GrantFiled: October 3, 2016Date of Patent: September 18, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 10023814Abstract: Methods are provided for modifying hydrogenation catalysts having silica supports (or other non-alumina supports) with additional alumina, and using such catalysts to achieve unexpectedly superior hydrogenation of feedstocks. The modified hydrogenation catalysts can have a relatively low cracking activity while providing an increased activity for hydrogenation.Type: GrantFiled: May 18, 2015Date of Patent: July 17, 2018Assignee: EXXONMOBIL RESEARCH AND ENGINEERING COMPANYInventors: Michael P. Lanci, Stuart L. Soled, Javier Guzman, Sabato Miseo, Thomas Elmer Green, Joseph Ernest Baumgartner
-
Patent number: 10019230Abstract: An arithmetic operation is performed using a first instruction execution unit to generate an intermediate result vector and a plurality of calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The intermediate result vector and the plurality of calculation control indicators are stored in memory external to the instruction execution unit, and later read by a second instruction execution unit to complete the arithmetic operation.Type: GrantFiled: June 24, 2015Date of Patent: July 10, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer
-
Patent number: 10019229Abstract: A microprocessor comprises an instruction execution unit operable to generate an intermediate result vector and a plurality of calculation control indicators and storage external to the instruction execution unit which stores the intermediate result vector and the plurality of calculation control indicators. The intermediate result vector is generated from an application of at least a first arithmetic operation of a compound arithmetic operation. The calculation control indicators indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The subsequent calculations may involve one or more remaining arithmetic operations of the compound arithmetic operation.Type: GrantFiled: June 24, 2015Date of Patent: July 10, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer
-
Publication number: 20180095749Abstract: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp?Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to ?K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds ?ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.Type: ApplicationFiled: October 3, 2016Publication date: April 5, 2018Inventor: THOMAS ELMER
-
Patent number: 9891887Abstract: A microprocessor prepares a fused multiply-accumulate operation of a form ±A*B±C for execution by issuing first and second multiply-accumulate microinstructions to one or more instruction execution units to complete the fused multiply-accumulate operation. The first multiply-accumulate microinstruction causes an unrounded nonredundant result vector to be generated from a first accumulation of a selected one of (a) the partial products of A and B or (b) C with the partial products of A and B. The second multiply-accumulate microinstruction causes performance of a second accumulation of C with the unrounded nonredundant result vector, if the first accumulation did not include C. The second multiply-accumulate microinstruction also causes a final rounded result to be generated from the unrounded nonredundant result vector, wherein the final rounded result is a complete result of the fused multiply-accumulate operation.Type: GrantFiled: June 24, 2015Date of Patent: February 13, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer
-
Patent number: 9891886Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.Type: GrantFiled: June 24, 2015Date of Patent: February 13, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventor: Thomas Elmer
-
Patent number: 9798519Abstract: A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation. The first arithmetic processing unit generates a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The second arithmetic processing unit performs a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation.Type: GrantFiled: June 24, 2015Date of Patent: October 24, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 9778908Abstract: A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result.Type: GrantFiled: June 24, 2015Date of Patent: October 3, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 9778907Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C using first and second execution units. An input operand analyzer circuit determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B. The first instruction execution unit multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B. The second instruction execution unit separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B.Type: GrantFiled: June 24, 2015Date of Patent: October 3, 2017Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Publication number: 20170273519Abstract: An adapter device for arranging a container (4, 6) on the upper face (5) of a cleaning device (9), said adapter device consisting of an adapter plate (1, 2) with a support surface (13, 15) and at least one locking element for locking the container in place on the (4, 6) adapter plate. The adapter plate (1, 2) is divided into at least two parts and consists of a front adapter plate (1) and a rear adapter plate (2) which can be separately mounted on the upper face (5) of the cleaning device (9).Type: ApplicationFiled: September 15, 2015Publication date: September 28, 2017Inventors: Henrik MATHIASSEN, Steen Klimt JOHANNESEN, Thomas ELMER, Trine BAEK NIELSEN
-
Publication number: 20170097824Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.Type: ApplicationFiled: July 5, 2016Publication date: April 6, 2017Inventors: THOMAS ELMER, NIKHIL A. PATIL
-
Publication number: 20160004507Abstract: A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.Type: ApplicationFiled: June 24, 2015Publication date: January 7, 2016Inventor: THOMAS ELMER
-
Publication number: 20160004509Abstract: An arithmetic operation is performed using a first instruction execution unit to generate an intermediate result vector and a plurality of calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The intermediate result vector and the plurality of calculation control indicators are stored in memory external to the instruction execution unit, and later read by a second instruction execution unit to complete the arithmetic operation.Type: ApplicationFiled: June 24, 2015Publication date: January 7, 2016Inventor: THOMAS ELMER
-
Patent number: D782759Type: GrantFiled: December 23, 2015Date of Patent: March 28, 2017Assignee: Nilfisk A/SInventors: Henrik Mathiassen, Steen Klimt Johannesen, Thomas Elmer