Patents by Inventor Gary S. Goldman
Gary S. Goldman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240143988Abstract: Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3×3 array of m-bit activation values may include 9 n-bit mantissa values and one exponent shared between the n-bit mantissa values (n<m); and a quantized representation of a 3×3 kernel with p-bit parameter values may include 9 q-bit mantissa values and one exponent shared between the q-bit mantissa values (q<p). Convolution of the kernel with the activation data may include computing a dot product of the 9 n-bit mantissa values with the 9 q-bit mantissa values, and summing the shared exponents. In a CNN with multiple kernels, multiple computing units (each corresponding to one of the kernels) may receive the quantized representation of the 3×3 array of m-bit activation values from the same quantization-alignment module.Type: ApplicationFiled: January 11, 2024Publication date: May 2, 2024Inventors: Jian hui Huang, James Michael Bodwin, Pradeep R. Joginipally, Shabarivas Abhiram, Gary S. Goldman, Martin Stefan Patz, Eugene M. Feinberg, Berend Ozceri
-
Patent number: 11915126Abstract: Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3×3 array of m-bit activation values may include 9 n-bit mantissa values and one exponent shared between the n-bit mantissa values (n<m); and a quantized representation of a 3×3 kernel with p-bit parameter values may include 9 q-bit mantissa values and one exponent shared between the q-bit mantissa values (q<p). Convolution of the kernel with the activation data may include computing a dot product of the 9 n-bit mantissa values with the 9 q-bit mantissa values, and summing the shared exponents. In a CNN with multiple kernels, multiple computing units (each corresponding to one of the kernels) may receive the quantized representation of the 3×3 array of m-bit activation values from the same quantization-alignment module.Type: GrantFiled: September 4, 2020Date of Patent: February 27, 2024Assignee: Recogni Inc.Inventors: Jian hui Huang, James Michael Bodwin, Pradeep R. Joginipally, Shabarivas Abhiram, Gary S. Goldman, Martin Stefan Patz, Eugene M. Feinberg, Berend Ozceri
-
Publication number: 20240053919Abstract: A memory system comprises a plurality of memory sub-systems, each with a memory bank and other circuit components. For each of the memory sub-systems, a first buffer receives and stores a read-modify-write request (with a read address, a write address and a first operand), a second operand is read from the memory bank at the location specified by the read address, a combiner circuit combines the first operand with the second operand, an activation circuit transforms the output of the combiner circuit, and the output of the activation circuit is stored in the memory bank at the location specified by the write address. The first operand and the write address may be stored in a second buffer while the second operand is read from the memory bank. Further, the output of the activation circuit may be first stored in the first buffer before being stored in the memory bank.Type: ApplicationFiled: March 13, 2023Publication date: February 15, 2024Inventors: Gary S. Goldman, Ashwin Radhakrishnan
-
Publication number: 20230401433Abstract: In a low power hardware architecture for handling accumulation overflows in a convolver unit, an accumulator of the convolver unit computes a running total by successively summing dot products from a dot product computation module during an accumulation cycle. In response to the running total overflowing the maximum or minimum value of a data storage element, the accumulator transmits an overflow indicator to a controller and sets its output equal to a positive or negative overflow value. In turn, the controller disables the dot product computation module by clock gating, clamping one of its inputs to zero and/or holding its inputs to constant values. At the end of the accumulation cycle, the output of the accumulator is sampled. In response to a clear signal being asserted, the dot product computation module is enabled, and the running total is set to zero for the start of the next accumulation cycle.Type: ApplicationFiled: June 9, 2022Publication date: December 14, 2023Inventors: Shabarivas Abhiram, Gary S. Goldman, Jian hui Huang, Eugene M. Feinberg
-
Patent number: 11762946Abstract: Convolution with a 5×5 kernel involves computing the dot product of a 5×5 data block with a 5×5 kernel. Instead of computing this dot product as a single sum of 25 products, the dot product is computed as a sum of four partial sums, where each partial sum is computed as a dot product of a 3×3 data block with a 3×3 kernel. The four partial sums may be computed by a single 3×3 convolver unit over four time periods. During each time period, at least some of the weights received by the 3×3 convolver unit may correspond to a quadrant of weights from the 5×5 kernel. A shifter circuit provides shifted columns (left or right shifted) of the input data to the 3×3 convolver unit, allowing the 3×3 convolver unit access to the 3×3 data block that spatially corresponds to a particular quadrant of weights from the 5×5 kernel.Type: GrantFiled: September 23, 2022Date of Patent: September 19, 2023Assignee: Recogni Inc.Inventors: Gary S. Goldman, Shabarivas Abhiram
-
Patent number: 11645355Abstract: A system for evaluating a piecewise linear function includes a first look-up table with N entries, and a second look-up table with M entries, with M being less than N. Each of the N entries contains parameters that define a corresponding linear segment of the piecewise linear function. The system further includes a controller configured to store a subset of the N entries from the first look-up table in the second look-up table. The system further includes a classifier for receiving an input value and classifying the input value in one of a plurality of segments of a number line. A total number of the segments is equal to M, and the segments are non-overlapping and contiguous. The system further includes a multiplexor for selecting one of the M entries of the second look-up table based on the classification of the input value into one of the plurality of segments.Type: GrantFiled: December 30, 2022Date of Patent: May 9, 2023Assignee: Recogni Inc.Inventors: Gilles J. C. A. Backhus, Gary S. Goldman
-
Patent number: 11630605Abstract: A memory system comprises a plurality of memory sub-systems, each with a memory bank and other circuit components. For each of the memory sub-systems, a first buffer receives and stores a read-modify-write request (with a read address, a write address and a first operand), a second operand is read from the memory bank at the location specified by the read address, a combiner circuit combines the first operand with the second operand, an activation circuit transforms the output of the combiner circuit, and the output of the activation circuit is stored in the memory bank at the location specified by the write address. The first operand and the write address may be stored in a second buffer while the second operand is read from the memory bank. Further, the output of the activation circuit may be first stored in the first buffer before being stored in the memory bank.Type: GrantFiled: August 10, 2022Date of Patent: April 18, 2023Assignee: Recogni Inc.Inventors: Gary S. Goldman, Ashwin Radhakrishnan
-
Publication number: 20220076104Abstract: Dynamic data quantization may be applied to minimize the power consumption of a system that implements a convolutional neural network (CNN). Under such a quantization scheme, a quantized representation of a 3×3 array of m-bit activation values may include 9 n-bit mantissa values and one exponent shared between the n-bit mantissa values (n<m); and a quantized representation of a 3×3 kernel with p-bit parameter values may include 9 q-bit mantissa values and one exponent shared between the q-bit mantissa values (q<p). Convolution of the kernel with the activation data may include computing a dot product of the 9 n-bit mantissa values with the 9 q-bit mantissa values, and summing the shared exponents. In a CNN with multiple kernels, multiple computing units (each corresponding to one of the kernels) may receive the quantized representation of the 3×3 array of m-bit activation values from the same quantization-alignment module.Type: ApplicationFiled: September 4, 2020Publication date: March 10, 2022Inventors: Jian hui Huang, James Michael Bodwin, Pradeep R. Joginipally, Shabarivas Abhiram, Gary S. Goldman, Martin Stefan Patz, Eugene M. Feinberg, Berend Ozceri
-
Patent number: 6829224Abstract: A method and apparatus for smoothing the rate of packet discards for random early detection (“RED”) in a communication device such as an ATM switch is described. The ATM switch includes a plurality of class of service queues. An accumulated discard probability is stored independently for each class of service queue. With the arrival of each packet (frame), an instantaneous discard probability is calculated. The sum of the instantaneous discard probability and the accumulated discard probability becomes the effective probability for discard. If the effective discard probability is greater than (or equal to) a random number, the cell is discarded, and the accumulated discard probability is cleared. Otherwise, the sum is stored back as the new value for the accumulated discard probability. The accumulated discard probability may optionally be cleared if a class of service queue's current cell count is zero.Type: GrantFiled: February 4, 1999Date of Patent: December 7, 2004Assignee: Cisco Technology, Inc.Inventors: Gary S. Goldman, Mohammed Nikuie
-
Patent number: 6385710Abstract: In accordance with the present invention, a cache memory subsystem includes a processor, a cache control unit and a SRAM serving as the cache memory. The SRAM is a synchronous SRAM. The cache control unit provides appropriately timed control signals to the SRAM when the processor is accessing the cache memory. The SRAM can be either a pipelined architecture SRAM (register output SRAM) or a flow-through access architecture SRAM (latch output SRAM). The cache control unit is selectably configured to operate in a pipelined mode (1-1-1) or a flow-through (2-2) mode. The cache control unit is configured in the 1-1-1 mode when the SRAM is a pipelined architecture SRAM having a clock rate equal to the processor. When the SRAM is a flow-through architecture SRAM that cannot be clocked at the same rate as the processor, the cache control unit is configured in the 2-2 mode and the SRAM is clocked at a clock rate half of the processor clock rate.Type: GrantFiled: February 23, 1996Date of Patent: May 7, 2002Assignee: Sun Microsystems, Inc.Inventors: Gary S. Goldman, Christopher Chen, Douglas W. Forehand
-
Patent number: 6212181Abstract: A system and method for assigning departure timeslots to arrival data in an ATM switch is described. The departure timeslots are assigned to arrival data when no departure data is pending or when arrival data has a higher priority than pending departure data.Type: GrantFiled: March 26, 1999Date of Patent: April 3, 2001Assignee: Cisco Technology, Inc.Inventors: Robert J. Divivier, Christopher B. Bergen, Gary S. Goldman
-
Patent number: 5715425Abstract: A central processing unit is connected to an external memory including system memory and an external cache. The central processing unit includes a First-In-First-Out (FIFO) load buffer configured to generate an access to the external memory in response to a data prefetch command. The access to external memory has an associated data load latency period as data is moved from the system memory into the external cache. Instead of requiring the access to external memory to be completed before another FIFO load buffer address is processed, as is typically required in a FIFO load buffer configuration, the FIFO load buffer responds to the data prefetch command by processing additional stored addresses during the data load latency period.Type: GrantFiled: February 22, 1996Date of Patent: February 3, 1998Assignee: Sun Microsystems, Inc.Inventors: Gary S. Goldman, Bruce E. Petrick, Marc Tremblay, Dale R. Greenley
-
Patent number: 5490250Abstract: The invention provides a method and apparatus for tagging a control error indication onto a data signal passing through a data router in a computer system.Type: GrantFiled: December 31, 1991Date of Patent: February 6, 1996Assignee: Amdahl CorporationInventors: Klaus P. Reschke, Gary S. Goldman
-
Patent number: 5423025Abstract: An error handling and reporting mechanism is capable of taking advantage of sophisticated error analysis performed after clocks have been stopped in response to an error detected in a controller. The controller provides services in a data processing system in response to requests for controller services from a plurality of requestors. The controller includes a plurality of ports for storing requests for controller services. A plurality of servers is coupled to the plurality of ports, and perform separate services associated with the requests for controller services stored in the plurality of ports. An error reporting mechanism is included which is responsive to a detected error in a particular server associated with a request in a particular port, for posting error status in the particular port and causing clock stoppage within a clock stop latency period. An error analysis mechanism analyzes the detected errors during the clock stoppage.Type: GrantFiled: September 29, 1992Date of Patent: June 6, 1995Assignee: Amdahl CorporationInventors: Gary S. Goldman, Kent W. Wendorf
-
Patent number: 5339407Abstract: Recovery of data from a store-to cache in a malfunctioning CPU, is accomplished without exercising the hardware of the malfunctioning CPU. A data path which is independent of the normal operating paths of the computer, such as a scan facility, is used to move data out of the cache into the mainstore while the malfunctioning CPU's clocks are off. A system controller controls normal transfer of data between the cache memory of the processing unit and the mainstore. A service processor is coupled to the processing unit, the mainstore, and the system controller, and is responsive to the detection of errors in the processing unit, for stopping the processing unit and moving data out of the cache memory to the mainstore through the scan facility separate from the system controller. Logic in the system controller flushes the move out queue or other storage locations in the system controller.Type: GrantFiled: September 29, 1992Date of Patent: August 16, 1994Assignee: Amdahl CorporationInventors: Gary S. Goldman, Silas P. Elash, Jeffrey L. Baker
-
Patent number: 4745605Abstract: In a data processing machine that generates a control word and that includes a plurality of registers connected to receive respective copies of the control word for execution in sections of the data processing machine, the present invention provides an apparatus for detecting an error condition in the execution of the control word. The apparatus detects an error in any of the respective copies of the control word. Further, a second means, responsive to the one copy of the control word in one register, is included for analyzing the one copy to identify a class of possible errors. Finally, responsive to the detection of an error in any of the respective copies and to the class of possible errors, a signal is generated indicating an error condition.Type: GrantFiled: August 19, 1986Date of Patent: May 17, 1988Assignee: Amadahl CorporationInventors: Gary S. Goldman, Mark W. Semmelmeyer
-
Patent number: 4223255Abstract: An electric motor with microprogrammed controller and dual-functioning brushless commutation/rectification circuitry contained entirely within a wheel. The principal use of this electric motor is intended to be, but not limited to, powering a four-wheel drive electric vehicle through normal driving modes and serving as a power-recovery generator during braking. The magnetic and electronic configuration is optimized within the wheel to provide high torque and efficiency without the use of gear reductions, chain or belt drives, transmission, rotating axles, differentials, universal joints, or brushes. Power losses from mechanical drive system couplings are thus eliminated. Except for the wheel and bearings, there are no moving parts. Also, the wheel is virtually free of devices that are subject to mechanical failure.Type: GrantFiled: October 28, 1977Date of Patent: September 16, 1980Inventors: Gary S. Goldman, Allen W. Beishline