Pipeline Patents (Class 708/508)

Instructions for vector multiplication of unsigned words with rounding

Patent number: 11221849

Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.

Type: Grant

Filed: September 27, 2017

Date of Patent: January 11, 2022

Assignee: Intel Corporation

Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
Processor core to coprocessor interface with FIFO semantics

Patent number: 10614023

Abstract: Techniques are provided for exchanging dedicated hardware signals to manage a first-in first-out (FIFO). In an embodiment, a first processor initiates content transfer into the FIFO. The first processor activates a first hardware signal that is reserved for indicating that content resides within the FIFO. A second processor activates a second hardware signal that is reserved for indicating that content is accepted. The second hardware signal causes the first hardware signal to be deactivated. This exchange of hardware signals demarcates a FIFO transaction, which is mediated by interface circuitry of the FIFO.

Type: Grant

Filed: June 28, 2019

Date of Patent: April 7, 2020

Assignee: Oracle International Corporation

Inventors: David A. Brown, Daniel Fowler, Rishabh Jain, Erik Schlanger, Michael Duller
Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices

Patent number: 10061592

Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.

Type: Grant

Filed: March 30, 2015

Date of Patent: August 28, 2018

Assignee: Samsung Electronics Co., Ltd.

Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
High performance implementation of the FFT butterfly computation

Patent number: 9672192

Abstract: This invention is a FFT butterfly circuit. This circuit includes four temporary data registers connected to three memories. The three memories include read/write X and Y memories and a read only twiddle coefficient memory. A multiplier-accumulator forms a product and accumulates the product with one of two accumulator registers. A register file with plural registers is loaded from one of the accumulator registers or the fourth temporary data register. An adder/subtracter forms a selected one of a sum of registers or a difference of registers. A write buffer with two buffers temporarily stores data from the adder/subtracter before storage in the first or second memory. The X and Y memories must be read/write but the twiddle memory may be read only.

Type: Grant

Filed: November 11, 2015

Date of Patent: June 6, 2017

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Darrell E. Tinker, Keerthinarayan Heragu
Floating point round-off amount determination processors, methods, systems, and instructions

Patent number: 9513871

Abstract: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 30, 2011

Date of Patent: December 6, 2016

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Bret L. Toll, Robert Valentine, Simon Rubanovich, Amit Gradstein
Fast normalization in a mixed precision floating-point unit

Patent number: 9286031

Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.

Type: Grant

Filed: November 26, 2013

Date of Patent: March 15, 2016

Assignee: International Business Machines Corporation

Inventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
Fast normalization in a mixed precision floating-point unit

Patent number: 9280316

Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.

Type: Grant

Filed: January 9, 2014

Date of Patent: March 8, 2016

Assignee: International Business Machines Corporation

Inventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
System and method for signal processing in digital signal processors

Patent number: 9274750

Abstract: An embodiment of a method and a related apparatus for digital computation of a floating point complex multiply-add is provided. The method includes receiving an input addend, a first product, and a second product. The input addend, the first product and the second product each respectively has a mantissa and an exponent. The method includes shifting the mantissas of the two with smaller exponents of the input addend, the first product, and the second product to align together with the mantissa of the one with largest exponent of the input addend, the first product and the second product, and adding the aligned input addend, the aligned first product and the aligned second product.

Type: Grant

Filed: April 20, 2012

Date of Patent: March 1, 2016

Assignee: Futurewei Technologies, Inc.

Inventors: Tong Sun, Weizhong Chen, Zhikun Cheng, Yuanbin Guo
Device with capability of processing FFT radix 2 butterfly operation and operation method thereof

Patent number: 9213679

Abstract: The disclosure provides a device with a capability of processing a Fast Fourier Transform Algorithm (FFT) radix 2 butterfly operation and an operation method thereof, the device at least includes a latch, a complex multiplier, a complex adder-subtractor, a switch and a complex conjugate Arithmetic Logical Unit (ALU). The complex operation unit of the disclosure has a simple structure. The parallel processing array composed of the complex operation unit has the capability of efficiently processing vectors and the FFT operation.

Type: Grant

Filed: April 20, 2012

Date of Patent: December 15, 2015

Assignee: ZTE CORPORATION

Inventor: Chengke Shen
Operand special case handling for multi-lane processing

Patent number: 9128531

Abstract: A single instruction multiple data processing pipeline 12 for processing floating point operands includes shared special case handling circuitry 34 for performing any operand dependent special case processing operations. The operand dependent special case processing operations result from special case conditions such as operands that are denormal, an infinity, a not-a-number and a floating point number requiring format conversion. The pipeline 12 may in some embodiments be stalled while the operands requiring special case processing are serially shifted to and from the shared special case handling circuitry 34. In other embodiments the instruction in which the special case condition for an operand arose may be recirculated through the pipeline with permutation circuitry 86, 94 being used to swap the operands between lanes in order to place the operand(s) requiring special case processing operations into the lane containing the shared special case handling circuitry 98.

Type: Grant

Filed: February 22, 2012

Date of Patent: September 8, 2015

Assignee: ARM Limited

Inventors: Sean Tristram Ellis, Simon Alex Charles, Andrew Burdass
Combined matrix-vector and matrix transpose vector multiply for a block-sparse matrix

Patent number: 9058302

Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.

Type: Grant

Filed: March 26, 2012

Date of Patent: June 16, 2015

Assignee: International Business Machines Corporation

Inventor: Karen A. Magerlein
Compact filter design

Patent number: 8295412

Abstract: An apparatus and method for signal detection in which a digital sample stream is fed round robin into a plurality of buffers, which are sequentially compared with a reference signal to determine a match. A processor determines the chronological order of the samples in each bit of each buffer, and directs a bitwise comparison between the signal in each buffer with the reference to determine a match, e.g., by correlation. The apparatus and method are preferably implemented with a Field-Programmable Gate Array (FPGA). This scheme permits real time correlation of a data stream with a reference without use of shift registers, or a significant number of dedicated logic blocks.

Type: Grant

Filed: September 30, 2010

Date of Patent: October 23, 2012

Assignee: The United States of America as represented by the Secretary of the Navy

Inventor: Jeremy R. O'Neal
Multipurpose functional unit with double-precision and filtering operations

Patent number: 8051123

Abstract: A multipurpose arithmetic functional unit selectively performs planar attribute interpolation, unary function approximation, double-precision arithmetic, and/or arbitrary filtering functions such as texture filtering, bilinear filtering, or anisotropic filtering by iterating through a multi-step multiplication operation with partial products (partial results) accumulated in an accumulation register. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for unary function approximation and planar interpolation; the same multipliers and adders are also leveraged to implement double-precision multiplication and addition.

Type: Grant

Filed: December 15, 2006

Date of Patent: November 1, 2011

Assignee: NVIDIA Corporation

Inventors: Stuart Oberman, Ming Y. Siu
Reshuffled communications processes in pipelined asynchronous circuits

Patent number: 7934031

Abstract: An asynchronous logic family of circuits which communicate on delay-insensitive flow-controlled channels with 4-phase handshakes and 1 of N encoding, compute output data directly from input data using domino logic, and use the state-holding ability of the domino logic to implement pipelining without additional latches.

Type: Grant

Filed: May 11, 2006

Date of Patent: April 26, 2011

Assignee: California Institute of Technology

Inventors: Andrew M. Lines, Alain J. Martin, Uri Cummings
Apparatus and method for integer to floating-point format conversion

Patent number: 7774393

Abstract: An apparatus and method for integer to floating-point format conversion. A processor may include an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, where a smaller-exponent one of the floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. The processor may further include an alignment shifter coupled to the adder and configured, in a first mode of operation, to align the floating-point operands prior to the addition by shifting the respective mantissa of the smaller-exponent operand towards a least-significant bit position. The alignment shifter may be further configured, in a second mode of operation, to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The second mode of operation may be active during execution of an instruction to convert the integer operand to floating-point format.

Type: Grant

Filed: June 30, 2004

Date of Patent: August 10, 2010

Assignee: Oracle America, Inc.

Inventors: Jeffrey S. Brooks, Sadar U. Ahmed
High-speed precoders for communication systems

Patent number: 7769099

Abstract: The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.

Type: Grant

Filed: September 13, 2005

Date of Patent: August 3, 2010

Assignee: Leanics Corporation

Inventors: Keshab K. Parhi, Yongru Gu
System and method for a floating point unit with feedback prior to normalization and rounding

Patent number: 7730117

Abstract: A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes a mechanism for performing a shift or masking operation in response to determining that the operand is in an un-normalized format. The system also includes instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.

Type: Grant

Filed: February 9, 2005

Date of Patent: June 1, 2010

Assignee: International Business Machines Corporation

Inventors: Bruce M. Fleischer, Juergen Haess, Michael Kroener, Martin S. Schmookler, Eric M. Schwarz, Son Dao-Trong
Apparatus and method for cryptographic key expansion

Patent number: 7711955

Abstract: An apparatus and method for cryptographic key expansion. According to a first embodiment, a cryptographic unit may include key storage configured to store an expanded set of cipher keys for a cipher algorithm, and a key expansion pipeline comprising a plurality of pipeline stages. During a key expansion mode of operation, each pipeline stage may be configured to perform a corresponding step of generating a member of the expanded set of cipher keys according to a key expansion algorithm. During a cipher mode of operation, a portion of the key expansion pipeline may be configured to perform a step of the cipher algorithm.

Type: Grant

Filed: September 13, 2004

Date of Patent: May 4, 2010

Assignee: Oracle America, Inc.

Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
Synchronization between pipelines in a data processing apparatus utilizing a synchronization queue

Patent number: 7490221

Abstract: The technology described provides a technique for synchronization between pipelines in a data processing apparatus. The data processing apparatus comprises a main processor operable to execute a sequence of instructions, the main processor comprising a first pipeline having a first plurality of pipeline stages, and a coprocessor operable to execute coprocessor instructions in said sequence of instructions. The coprocessor comprises a second pipeline having a second plurality of pipeline stages, and each coprocessor instruction is arranged to be routed through both the first pipeline and the second pipeline.

Type: Grant

Filed: June 24, 2003

Date of Patent: February 10, 2009

Assignee: ARM Limited

Inventors: Martin Robert Evans, Ian Victor Devereux
Processor having efficient function estimate instructions

Patent number: 7406589

Abstract: High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.

Type: Grant

Filed: May 12, 2005

Date of Patent: July 29, 2008

Assignee: International Business Machines Corporation

Inventors: Sang Hoo Dhong, Gordon Clyde Fossum, Harm Peter Hofstee, Brad William Michael, Silvia Melitta Mueller, Hwa-Joon Oh
Reduction of add-pipe logic by operand offset shift

Patent number: 7043516

Abstract: The shifters (30, 32) that a floating-point processor (10)'s addition pipeline (14) uses to align or normalize floating-point operands' mantissas before addition or subtraction shift a given mantissa pair one more bit to the left for subtraction than for addition. As a result, the addition pipeline's rounding circuitry (160, 166) does not need to be capable of adding round bits in as many positions as it would without the shift difference, so it can be simpler and faster. Similarly, circuitry (164a–g and 188) employed for normalization after addition and subtraction can be simpler because it does not have to implement as shift options.

Type: Grant

Filed: March 13, 1998

Date of Patent: May 9, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser
Multi-function floating point arithmetic pipeline

Publication number: 20040167953

Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.

Type: Application

Filed: February 26, 2004

Publication date: August 26, 2004

Inventor: Steven Shaw
Microprocessor with instruction for saturating and packing data

Patent number: 6748521

Abstract: A data processing system is provided with a digital signal processor which has an instruction for saturating multiple fields of a selected set of source operands and storing the separate saturated results in a selected destination register. A first 32-bit operand (600) and a second 32-bit operand (602) are treated as four 16-bit fields and the sixteen bits in each field are saturated separately. Multi-field saturation circuitry is operable to treat a source operand as a number of fields, such that a multi-field saturated (610) result is produced that includes a number of saturated results each corresponding to each field. One instruction is provided which treats an operand pair as having two packed fields, and another instruction is provided that treats the operand pair has having four packed fields. Saturation circuitry is operable to selectively treat a field as either a signed value or an unsigned value.

Type: Grant

Filed: October 31, 2000

Date of Patent: June 8, 2004

Assignee: Texas Instruments Incorporated

Inventor: David Hoyle
Method and apparatus for instruction set architecture to perform primary and shadow digital signal processing sub-instructions simultaneously

Patent number: 6748516

Abstract: Disclosed is a method, apparatus, and an instruction set architecture (ISA) for an application specific signal processor (ASSP) tailored to digital signal processing (DSP) applications. A single DSP instruction includes a pair of sub-instructions: a primary DSP sub-instruction and a shadow DSP sub-instruction. Both the primary and the shadow DSP sub-instructions are dyadic DSP instructions performing two operations in one instruction cycle. Each signal processing unit of the ASSP includes a primary stage to execute a primary DSP sub-instruction based upon current data and a shadow stage to simultaneously execute a shadow DSP sub-instruction based upon delayed data stored locally within registers of the signal processing units. The present invention efficiently executes DSP instructions by simultaneously executing primary DSP sub-instructions (based upon current data) and shadow DSP sub-instructions (based upon delayed locally stored data) with a single DSP instruction.

Type: Grant

Filed: January 29, 2002

Date of Patent: June 8, 2004

Assignee: Intel Corporation

Inventors: Kumar Ganapathy, Ruban Kanapathipillai
Method and apparatus for performing addressing operations in a superscalar, superpipelined processor

Patent number: 6718458

Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.

Type: Grant

Filed: March 27, 2003

Date of Patent: April 6, 2004

Assignee: Broadcom Corporation

Inventors: Dan Dobberpuhl, Robert Stepanian
Multi-function floating point arithmetic pipeline

Publication number: 20030018676

Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.

Type: Application

Filed: March 15, 2001

Publication date: January 23, 2003

Inventor: Steven Shaw
Method and apparatus for denormal load handling

Patent number: 6487653

Abstract: A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete.

Type: Grant

Filed: August 25, 1999

Date of Patent: November 26, 2002

Assignee: Advanced Micro Devices, Inc.

Inventors: Stuart F. Oberman, Stephan G. Meier, Jeffrey E. Trull
Digital signal processor

Patent number: 6430681

Abstract: In a digital signal processor having an improved arithmetic processing efficiency, there is provided in parallel a first ROM for storing branch commands and a second ROM for storing arithmetic commands. The ROMs are connected to a branch command decoder and an arithmetic command decoder, respectively. Operations of a first memory control circuit and a second memory control circuit are controlled in response to instructions from the branch command decoder, while operations of an arithmetic circuit are controlled in response to instructions from the arithmetic command decoder. By processing the branch commands and the arithmetic commands in parallel, the operation efficiency of the arithmetic circuit is enhanced.

Type: Grant

Filed: June 18, 1999

Date of Patent: August 6, 2002

Assignee: Sanyo Electric Co., Ltd.

Inventor: Fumiaki Nagao
Microprocessor comprising bit concatenation means

Patent number: 6317825

Abstract: The invention relates to a microprocessor (MP) comprising means to decode (DEC1) a compact instruction (BMV) for the concatenation of at least one bit (bi) of a first binary word (W1) with at least one bit of a second binary word (W2), and means (REGBANK, MUX, BSHIFT) to process this instruction in one clock cycle. Advantages: fast processing of a concatenation operation. Application especially to chip cards.

Type: Grant

Filed: May 3, 2000

Date of Patent: November 13, 2001

Assignee: Inside Technologies

Inventor: Sean Commercial
Close path selection unit for performing effective subtraction within a floating point arithmetic unit

Patent number: 6088715

Abstract: An optimized multimedia execution unit configured to perform vectored floating point and integer instructions. In one embodiment, the execution unit includes an add/subtract pipeline having far and close data paths. The far data path is configured to handle effective addition operations, as well as effective subtraction operations for operands having an absolute exponent difference greater than one. The close data path, conversely, is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close data path includes an adder unit configured to generate a first and second output value. The first output value is equal to the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one.

Type: Grant

Filed: March 27, 1998

Date of Patent: July 11, 2000

Assignee: Advanced Micro Devices, Inc.

Inventor: Stuart F. Oberman
Reduced-latency floating-point pipeline using normalization shifts of both operands

Patent number: 6018756

Abstract: If the exponents of a floating-point-processor addition pipeline's input operands are equal, a signal (INVERT) that determines whether the pipeline's sole full-width carry-propagate mantissa adder (34) will invert one of its inputs results from an inversion-determination circuit (FIG. 11) that determines whether the sole set bit in a decoded normalization-shift signal (NORM.sub.-- SHIFT) occupies the same position as a set bit in a signal (FRAC.sub.-- A.sub.-- GT.sub.-- B) representing what the possible normalization amounts will be if a first of the mantissas is greater than the other, second mantissa. Consequently, a bit-comparison operation (56) that employs no full-width carry-propagate addition can determine the amount of normalization shifting to be performed by bit shifters (30 and 32) disposed in respective processing trains that generate mantissa inputs to the mantissa adder (34).

Type: Grant

Filed: March 13, 1998

Date of Patent: January 25, 2000

Assignee: Digital Equipment Corporation

Inventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser