Pipeline Patents (Class 708/508)
  • Patent number: 11221849
    Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: January 11, 2022
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
  • Patent number: 10614023
    Abstract: Techniques are provided for exchanging dedicated hardware signals to manage a first-in first-out (FIFO). In an embodiment, a first processor initiates content transfer into the FIFO. The first processor activates a first hardware signal that is reserved for indicating that content resides within the FIFO. A second processor activates a second hardware signal that is reserved for indicating that content is accepted. The second hardware signal causes the first hardware signal to be deactivated. This exchange of hardware signals demarcates a FIFO transaction, which is mediated by interface circuitry of the FIFO.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: April 7, 2020
    Assignee: Oracle International Corporation
    Inventors: David A. Brown, Daniel Fowler, Rishabh Jain, Erik Schlanger, Michael Duller
  • Patent number: 10061592
    Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: August 28, 2018
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
  • Patent number: 9672192
    Abstract: This invention is a FFT butterfly circuit. This circuit includes four temporary data registers connected to three memories. The three memories include read/write X and Y memories and a read only twiddle coefficient memory. A multiplier-accumulator forms a product and accumulates the product with one of two accumulator registers. A register file with plural registers is loaded from one of the accumulator registers or the fourth temporary data register. An adder/subtracter forms a selected one of a sum of registers or a difference of registers. A write buffer with two buffers temporarily stores data from the adder/subtracter before storage in the first or second memory. The X and Y memories must be read/write but the twiddle memory may be read only.
    Type: Grant
    Filed: November 11, 2015
    Date of Patent: June 6, 2017
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Darrell E. Tinker, Keerthinarayan Heragu
  • Patent number: 9513871
    Abstract: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: December 6, 2016
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Bret L. Toll, Robert Valentine, Simon Rubanovich, Amit Gradstein
  • Patent number: 9286031
    Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.
    Type: Grant
    Filed: November 26, 2013
    Date of Patent: March 15, 2016
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
  • Patent number: 9280316
    Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.
    Type: Grant
    Filed: January 9, 2014
    Date of Patent: March 8, 2016
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
  • Patent number: 9274750
    Abstract: An embodiment of a method and a related apparatus for digital computation of a floating point complex multiply-add is provided. The method includes receiving an input addend, a first product, and a second product. The input addend, the first product and the second product each respectively has a mantissa and an exponent. The method includes shifting the mantissas of the two with smaller exponents of the input addend, the first product, and the second product to align together with the mantissa of the one with largest exponent of the input addend, the first product and the second product, and adding the aligned input addend, the aligned first product and the aligned second product.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: March 1, 2016
    Assignee: Futurewei Technologies, Inc.
    Inventors: Tong Sun, Weizhong Chen, Zhikun Cheng, Yuanbin Guo
  • Patent number: 9213679
    Abstract: The disclosure provides a device with a capability of processing a Fast Fourier Transform Algorithm (FFT) radix 2 butterfly operation and an operation method thereof, the device at least includes a latch, a complex multiplier, a complex adder-subtractor, a switch and a complex conjugate Arithmetic Logical Unit (ALU). The complex operation unit of the disclosure has a simple structure. The parallel processing array composed of the complex operation unit has the capability of efficiently processing vectors and the FFT operation.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: December 15, 2015
    Assignee: ZTE CORPORATION
    Inventor: Chengke Shen
  • Patent number: 9128531
    Abstract: A single instruction multiple data processing pipeline 12 for processing floating point operands includes shared special case handling circuitry 34 for performing any operand dependent special case processing operations. The operand dependent special case processing operations result from special case conditions such as operands that are denormal, an infinity, a not-a-number and a floating point number requiring format conversion. The pipeline 12 may in some embodiments be stalled while the operands requiring special case processing are serially shifted to and from the shared special case handling circuitry 34. In other embodiments the instruction in which the special case condition for an operand arose may be recirculated through the pipeline with permutation circuitry 86, 94 being used to swap the operands between lanes in order to place the operand(s) requiring special case processing operations into the lane containing the shared special case handling circuitry 98.
    Type: Grant
    Filed: February 22, 2012
    Date of Patent: September 8, 2015
    Assignee: ARM Limited
    Inventors: Sean Tristram Ellis, Simon Alex Charles, Andrew Burdass
  • Patent number: 9058302
    Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.
    Type: Grant
    Filed: March 26, 2012
    Date of Patent: June 16, 2015
    Assignee: International Business Machines Corporation
    Inventor: Karen A. Magerlein
  • Patent number: 8295412
    Abstract: An apparatus and method for signal detection in which a digital sample stream is fed round robin into a plurality of buffers, which are sequentially compared with a reference signal to determine a match. A processor determines the chronological order of the samples in each bit of each buffer, and directs a bitwise comparison between the signal in each buffer with the reference to determine a match, e.g., by correlation. The apparatus and method are preferably implemented with a Field-Programmable Gate Array (FPGA). This scheme permits real time correlation of a data stream with a reference without use of shift registers, or a significant number of dedicated logic blocks.
    Type: Grant
    Filed: September 30, 2010
    Date of Patent: October 23, 2012
    Assignee: The United States of America as represented by the Secretary of the Navy
    Inventor: Jeremy R. O'Neal
  • Patent number: 8051123
    Abstract: A multipurpose arithmetic functional unit selectively performs planar attribute interpolation, unary function approximation, double-precision arithmetic, and/or arbitrary filtering functions such as texture filtering, bilinear filtering, or anisotropic filtering by iterating through a multi-step multiplication operation with partial products (partial results) accumulated in an accumulation register. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for unary function approximation and planar interpolation; the same multipliers and adders are also leveraged to implement double-precision multiplication and addition.
    Type: Grant
    Filed: December 15, 2006
    Date of Patent: November 1, 2011
    Assignee: NVIDIA Corporation
    Inventors: Stuart Oberman, Ming Y. Siu
  • Patent number: 7934031
    Abstract: An asynchronous logic family of circuits which communicate on delay-insensitive flow-controlled channels with 4-phase handshakes and 1 of N encoding, compute output data directly from input data using domino logic, and use the state-holding ability of the domino logic to implement pipelining without additional latches.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: April 26, 2011
    Assignee: California Institute of Technology
    Inventors: Andrew M. Lines, Alain J. Martin, Uri Cummings
  • Patent number: 7774393
    Abstract: An apparatus and method for integer to floating-point format conversion. A processor may include an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, where a smaller-exponent one of the floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. The processor may further include an alignment shifter coupled to the adder and configured, in a first mode of operation, to align the floating-point operands prior to the addition by shifting the respective mantissa of the smaller-exponent operand towards a least-significant bit position. The alignment shifter may be further configured, in a second mode of operation, to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The second mode of operation may be active during execution of an instruction to convert the integer operand to floating-point format.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: August 10, 2010
    Assignee: Oracle America, Inc.
    Inventors: Jeffrey S. Brooks, Sadar U. Ahmed
  • Patent number: 7769099
    Abstract: The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: August 3, 2010
    Assignee: Leanics Corporation
    Inventors: Keshab K. Parhi, Yongru Gu
  • Patent number: 7730117
    Abstract: A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes a mechanism for performing a shift or masking operation in response to determining that the operand is in an un-normalized format. The system also includes instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.
    Type: Grant
    Filed: February 9, 2005
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventors: Bruce M. Fleischer, Juergen Haess, Michael Kroener, Martin S. Schmookler, Eric M. Schwarz, Son Dao-Trong
  • Patent number: 7711955
    Abstract: An apparatus and method for cryptographic key expansion. According to a first embodiment, a cryptographic unit may include key storage configured to store an expanded set of cipher keys for a cipher algorithm, and a key expansion pipeline comprising a plurality of pipeline stages. During a key expansion mode of operation, each pipeline stage may be configured to perform a corresponding step of generating a member of the expanded set of cipher keys according to a key expansion algorithm. During a cipher mode of operation, a portion of the key expansion pipeline may be configured to perform a step of the cipher algorithm.
    Type: Grant
    Filed: September 13, 2004
    Date of Patent: May 4, 2010
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
  • Patent number: 7490221
    Abstract: The technology described provides a technique for synchronization between pipelines in a data processing apparatus. The data processing apparatus comprises a main processor operable to execute a sequence of instructions, the main processor comprising a first pipeline having a first plurality of pipeline stages, and a coprocessor operable to execute coprocessor instructions in said sequence of instructions. The coprocessor comprises a second pipeline having a second plurality of pipeline stages, and each coprocessor instruction is arranged to be routed through both the first pipeline and the second pipeline.
    Type: Grant
    Filed: June 24, 2003
    Date of Patent: February 10, 2009
    Assignee: ARM Limited
    Inventors: Martin Robert Evans, Ian Victor Devereux
  • Patent number: 7406589
    Abstract: High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.
    Type: Grant
    Filed: May 12, 2005
    Date of Patent: July 29, 2008
    Assignee: International Business Machines Corporation
    Inventors: Sang Hoo Dhong, Gordon Clyde Fossum, Harm Peter Hofstee, Brad William Michael, Silvia Melitta Mueller, Hwa-Joon Oh
  • Patent number: 7043516
    Abstract: The shifters (30, 32) that a floating-point processor (10)'s addition pipeline (14) uses to align or normalize floating-point operands' mantissas before addition or subtraction shift a given mantissa pair one more bit to the left for subtraction than for addition. As a result, the addition pipeline's rounding circuitry (160, 166) does not need to be capable of adding round bits in as many positions as it would without the shift difference, so it can be simpler and faster. Similarly, circuitry (164a–g and 188) employed for normalization after addition and subtraction can be simpler because it does not have to implement as shift options.
    Type: Grant
    Filed: March 13, 1998
    Date of Patent: May 9, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser
  • Publication number: 20040167953
    Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.
    Type: Application
    Filed: February 26, 2004
    Publication date: August 26, 2004
    Inventor: Steven Shaw
  • Patent number: 6748521
    Abstract: A data processing system is provided with a digital signal processor which has an instruction for saturating multiple fields of a selected set of source operands and storing the separate saturated results in a selected destination register. A first 32-bit operand (600) and a second 32-bit operand (602) are treated as four 16-bit fields and the sixteen bits in each field are saturated separately. Multi-field saturation circuitry is operable to treat a source operand as a number of fields, such that a multi-field saturated (610) result is produced that includes a number of saturated results each corresponding to each field. One instruction is provided which treats an operand pair as having two packed fields, and another instruction is provided that treats the operand pair has having four packed fields. Saturation circuitry is operable to selectively treat a field as either a signed value or an unsigned value.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: June 8, 2004
    Assignee: Texas Instruments Incorporated
    Inventor: David Hoyle
  • Patent number: 6748516
    Abstract: Disclosed is a method, apparatus, and an instruction set architecture (ISA) for an application specific signal processor (ASSP) tailored to digital signal processing (DSP) applications. A single DSP instruction includes a pair of sub-instructions: a primary DSP sub-instruction and a shadow DSP sub-instruction. Both the primary and the shadow DSP sub-instructions are dyadic DSP instructions performing two operations in one instruction cycle. Each signal processing unit of the ASSP includes a primary stage to execute a primary DSP sub-instruction based upon current data and a shadow stage to simultaneously execute a shadow DSP sub-instruction based upon delayed data stored locally within registers of the signal processing units. The present invention efficiently executes DSP instructions by simultaneously executing primary DSP sub-instructions (based upon current data) and shadow DSP sub-instructions (based upon delayed locally stored data) with a single DSP instruction.
    Type: Grant
    Filed: January 29, 2002
    Date of Patent: June 8, 2004
    Assignee: Intel Corporation
    Inventors: Kumar Ganapathy, Ruban Kanapathipillai
  • Patent number: 6718458
    Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.
    Type: Grant
    Filed: March 27, 2003
    Date of Patent: April 6, 2004
    Assignee: Broadcom Corporation
    Inventors: Dan Dobberpuhl, Robert Stepanian
  • Publication number: 20030018676
    Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.
    Type: Application
    Filed: March 15, 2001
    Publication date: January 23, 2003
    Inventor: Steven Shaw
  • Patent number: 6487653
    Abstract: A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete.
    Type: Grant
    Filed: August 25, 1999
    Date of Patent: November 26, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Stuart F. Oberman, Stephan G. Meier, Jeffrey E. Trull
  • Patent number: 6430681
    Abstract: In a digital signal processor having an improved arithmetic processing efficiency, there is provided in parallel a first ROM for storing branch commands and a second ROM for storing arithmetic commands. The ROMs are connected to a branch command decoder and an arithmetic command decoder, respectively. Operations of a first memory control circuit and a second memory control circuit are controlled in response to instructions from the branch command decoder, while operations of an arithmetic circuit are controlled in response to instructions from the arithmetic command decoder. By processing the branch commands and the arithmetic commands in parallel, the operation efficiency of the arithmetic circuit is enhanced.
    Type: Grant
    Filed: June 18, 1999
    Date of Patent: August 6, 2002
    Assignee: Sanyo Electric Co., Ltd.
    Inventor: Fumiaki Nagao
  • Patent number: 6317825
    Abstract: The invention relates to a microprocessor (MP) comprising means to decode (DEC1) a compact instruction (BMV) for the concatenation of at least one bit (bi) of a first binary word (W1) with at least one bit of a second binary word (W2), and means (REGBANK, MUX, BSHIFT) to process this instruction in one clock cycle. Advantages: fast processing of a concatenation operation. Application especially to chip cards.
    Type: Grant
    Filed: May 3, 2000
    Date of Patent: November 13, 2001
    Assignee: Inside Technologies
    Inventor: Sean Commercial
  • Patent number: 6088715
    Abstract: An optimized multimedia execution unit configured to perform vectored floating point and integer instructions. In one embodiment, the execution unit includes an add/subtract pipeline having far and close data paths. The far data path is configured to handle effective addition operations, as well as effective subtraction operations for operands having an absolute exponent difference greater than one. The close data path, conversely, is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close data path includes an adder unit configured to generate a first and second output value. The first output value is equal to the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one.
    Type: Grant
    Filed: March 27, 1998
    Date of Patent: July 11, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Stuart F. Oberman
  • Patent number: 6018756
    Abstract: If the exponents of a floating-point-processor addition pipeline's input operands are equal, a signal (INVERT) that determines whether the pipeline's sole full-width carry-propagate mantissa adder (34) will invert one of its inputs results from an inversion-determination circuit (FIG. 11) that determines whether the sole set bit in a decoded normalization-shift signal (NORM.sub.-- SHIFT) occupies the same position as a set bit in a signal (FRAC.sub.-- A.sub.-- GT.sub.-- B) representing what the possible normalization amounts will be if a first of the mantissas is greater than the other, second mantissa. Consequently, a bit-comparison operation (56) that employs no full-width carry-propagate addition can determine the amount of normalization shifting to be performed by bit shifters (30 and 32) disposed in respective processing trains that generate mantissa inputs to the mantissa adder (34).
    Type: Grant
    Filed: March 13, 1998
    Date of Patent: January 25, 2000
    Assignee: Digital Equipment Corporation
    Inventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser