Pipeline Patents (Class 708/508)
-
Patent number: 11221849Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.Type: GrantFiled: September 27, 2017Date of Patent: January 11, 2022Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
-
Patent number: 10614023Abstract: Techniques are provided for exchanging dedicated hardware signals to manage a first-in first-out (FIFO). In an embodiment, a first processor initiates content transfer into the FIFO. The first processor activates a first hardware signal that is reserved for indicating that content resides within the FIFO. A second processor activates a second hardware signal that is reserved for indicating that content is accepted. The second hardware signal causes the first hardware signal to be deactivated. This exchange of hardware signals demarcates a FIFO transaction, which is mediated by interface circuitry of the FIFO.Type: GrantFiled: June 28, 2019Date of Patent: April 7, 2020Assignee: Oracle International CorporationInventors: David A. Brown, Daniel Fowler, Rishabh Jain, Erik Schlanger, Michael Duller
-
Patent number: 10061592Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.Type: GrantFiled: March 30, 2015Date of Patent: August 28, 2018Assignee: Samsung Electronics Co., Ltd.Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
-
Patent number: 9672192Abstract: This invention is a FFT butterfly circuit. This circuit includes four temporary data registers connected to three memories. The three memories include read/write X and Y memories and a read only twiddle coefficient memory. A multiplier-accumulator forms a product and accumulates the product with one of two accumulator registers. A register file with plural registers is loaded from one of the accumulator registers or the fourth temporary data register. An adder/subtracter forms a selected one of a sum of registers or a difference of registers. A write buffer with two buffers temporarily stores data from the adder/subtracter before storage in the first or second memory. The X and Y memories must be read/write but the twiddle memory may be read only.Type: GrantFiled: November 11, 2015Date of Patent: June 6, 2017Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Darrell E. Tinker, Keerthinarayan Heragu
-
Patent number: 9513871Abstract: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.Type: GrantFiled: December 30, 2011Date of Patent: December 6, 2016Assignee: Intel CorporationInventors: Cristina S. Anderson, Bret L. Toll, Robert Valentine, Simon Rubanovich, Amit Gradstein
-
Patent number: 9286031Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.Type: GrantFiled: November 26, 2013Date of Patent: March 15, 2016Assignee: International Business Machines CorporationInventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
-
Patent number: 9280316Abstract: A hardware circuit for returning single precision denormal results to double precision. A hardware circuit component configured to count leading zeros of an unrounded single precision denormal result. A hardware circuit component configured to pre-compute a first exponent and a second exponent for the unrounded single precision denormal result. A hardware circuit component configured to perform a second normalization of the rounded single precision denormal result back to architected format.Type: GrantFiled: January 9, 2014Date of Patent: March 8, 2016Assignee: International Business Machines CorporationInventors: Maarten J. Boersma, Thomas Fuchs, Markus Kaltenbach, David Lang
-
Patent number: 9274750Abstract: An embodiment of a method and a related apparatus for digital computation of a floating point complex multiply-add is provided. The method includes receiving an input addend, a first product, and a second product. The input addend, the first product and the second product each respectively has a mantissa and an exponent. The method includes shifting the mantissas of the two with smaller exponents of the input addend, the first product, and the second product to align together with the mantissa of the one with largest exponent of the input addend, the first product and the second product, and adding the aligned input addend, the aligned first product and the aligned second product.Type: GrantFiled: April 20, 2012Date of Patent: March 1, 2016Assignee: Futurewei Technologies, Inc.Inventors: Tong Sun, Weizhong Chen, Zhikun Cheng, Yuanbin Guo
-
Patent number: 9213679Abstract: The disclosure provides a device with a capability of processing a Fast Fourier Transform Algorithm (FFT) radix 2 butterfly operation and an operation method thereof, the device at least includes a latch, a complex multiplier, a complex adder-subtractor, a switch and a complex conjugate Arithmetic Logical Unit (ALU). The complex operation unit of the disclosure has a simple structure. The parallel processing array composed of the complex operation unit has the capability of efficiently processing vectors and the FFT operation.Type: GrantFiled: April 20, 2012Date of Patent: December 15, 2015Assignee: ZTE CORPORATIONInventor: Chengke Shen
-
Patent number: 9128531Abstract: A single instruction multiple data processing pipeline 12 for processing floating point operands includes shared special case handling circuitry 34 for performing any operand dependent special case processing operations. The operand dependent special case processing operations result from special case conditions such as operands that are denormal, an infinity, a not-a-number and a floating point number requiring format conversion. The pipeline 12 may in some embodiments be stalled while the operands requiring special case processing are serially shifted to and from the shared special case handling circuitry 34. In other embodiments the instruction in which the special case condition for an operand arose may be recirculated through the pipeline with permutation circuitry 86, 94 being used to swap the operands between lanes in order to place the operand(s) requiring special case processing operations into the lane containing the shared special case handling circuitry 98.Type: GrantFiled: February 22, 2012Date of Patent: September 8, 2015Assignee: ARM LimitedInventors: Sean Tristram Ellis, Simon Alex Charles, Andrew Burdass
-
Patent number: 9058302Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.Type: GrantFiled: March 26, 2012Date of Patent: June 16, 2015Assignee: International Business Machines CorporationInventor: Karen A. Magerlein
-
Patent number: 8295412Abstract: An apparatus and method for signal detection in which a digital sample stream is fed round robin into a plurality of buffers, which are sequentially compared with a reference signal to determine a match. A processor determines the chronological order of the samples in each bit of each buffer, and directs a bitwise comparison between the signal in each buffer with the reference to determine a match, e.g., by correlation. The apparatus and method are preferably implemented with a Field-Programmable Gate Array (FPGA). This scheme permits real time correlation of a data stream with a reference without use of shift registers, or a significant number of dedicated logic blocks.Type: GrantFiled: September 30, 2010Date of Patent: October 23, 2012Assignee: The United States of America as represented by the Secretary of the NavyInventor: Jeremy R. O'Neal
-
Patent number: 8051123Abstract: A multipurpose arithmetic functional unit selectively performs planar attribute interpolation, unary function approximation, double-precision arithmetic, and/or arbitrary filtering functions such as texture filtering, bilinear filtering, or anisotropic filtering by iterating through a multi-step multiplication operation with partial products (partial results) accumulated in an accumulation register. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for unary function approximation and planar interpolation; the same multipliers and adders are also leveraged to implement double-precision multiplication and addition.Type: GrantFiled: December 15, 2006Date of Patent: November 1, 2011Assignee: NVIDIA CorporationInventors: Stuart Oberman, Ming Y. Siu
-
Patent number: 7934031Abstract: An asynchronous logic family of circuits which communicate on delay-insensitive flow-controlled channels with 4-phase handshakes and 1 of N encoding, compute output data directly from input data using domino logic, and use the state-holding ability of the domino logic to implement pipelining without additional latches.Type: GrantFiled: May 11, 2006Date of Patent: April 26, 2011Assignee: California Institute of TechnologyInventors: Andrew M. Lines, Alain J. Martin, Uri Cummings
-
Patent number: 7774393Abstract: An apparatus and method for integer to floating-point format conversion. A processor may include an adder configured to perform addition of respective mantissas of two floating-point operands to produce a sum, where a smaller-exponent one of the floating-point operands has a respective exponent less than or equal to a respective exponent of a larger-exponent one of the floating-point operands. The processor may further include an alignment shifter coupled to the adder and configured, in a first mode of operation, to align the floating-point operands prior to the addition by shifting the respective mantissa of the smaller-exponent operand towards a least-significant bit position. The alignment shifter may be further configured, in a second mode of operation, to normalize an integer operand by shifting the integer operand towards a most-significant bit position. The second mode of operation may be active during execution of an instruction to convert the integer operand to floating-point format.Type: GrantFiled: June 30, 2004Date of Patent: August 10, 2010Assignee: Oracle America, Inc.Inventors: Jeffrey S. Brooks, Sadar U. Ahmed
-
Patent number: 7769099Abstract: The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.Type: GrantFiled: September 13, 2005Date of Patent: August 3, 2010Assignee: Leanics CorporationInventors: Keshab K. Parhi, Yongru Gu
-
Patent number: 7730117Abstract: A system for performing floating point arithmetic operations including an input register adapted for receiving an operand. The system also includes a mechanism for performing a shift or masking operation in response to determining that the operand is in an un-normalized format. The system also includes instructions for performing single precision incrementing of the operand in response to determining that the operand is single precision, that the operand requires the incrementing based on the results of a previous operation and that the previous operation did not perform the incrementing. The operand was created in the previous operation. The system further includes instructions for performing double precision incrementing of the operand in response to determining that the operand is double precision, that the operand requires the incrementing based on the results of the previous operation and that the previous operation did not perform the incrementing.Type: GrantFiled: February 9, 2005Date of Patent: June 1, 2010Assignee: International Business Machines CorporationInventors: Bruce M. Fleischer, Juergen Haess, Michael Kroener, Martin S. Schmookler, Eric M. Schwarz, Son Dao-Trong
-
Patent number: 7711955Abstract: An apparatus and method for cryptographic key expansion. According to a first embodiment, a cryptographic unit may include key storage configured to store an expanded set of cipher keys for a cipher algorithm, and a key expansion pipeline comprising a plurality of pipeline stages. During a key expansion mode of operation, each pipeline stage may be configured to perform a corresponding step of generating a member of the expanded set of cipher keys according to a key expansion algorithm. During a cipher mode of operation, a portion of the key expansion pipeline may be configured to perform a step of the cipher algorithm.Type: GrantFiled: September 13, 2004Date of Patent: May 4, 2010Assignee: Oracle America, Inc.Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
-
Patent number: 7490221Abstract: The technology described provides a technique for synchronization between pipelines in a data processing apparatus. The data processing apparatus comprises a main processor operable to execute a sequence of instructions, the main processor comprising a first pipeline having a first plurality of pipeline stages, and a coprocessor operable to execute coprocessor instructions in said sequence of instructions. The coprocessor comprises a second pipeline having a second plurality of pipeline stages, and each coprocessor instruction is arranged to be routed through both the first pipeline and the second pipeline.Type: GrantFiled: June 24, 2003Date of Patent: February 10, 2009Assignee: ARM LimitedInventors: Martin Robert Evans, Ian Victor Devereux
-
Patent number: 7406589Abstract: High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic.Type: GrantFiled: May 12, 2005Date of Patent: July 29, 2008Assignee: International Business Machines CorporationInventors: Sang Hoo Dhong, Gordon Clyde Fossum, Harm Peter Hofstee, Brad William Michael, Silvia Melitta Mueller, Hwa-Joon Oh
-
Patent number: 7043516Abstract: The shifters (30, 32) that a floating-point processor (10)'s addition pipeline (14) uses to align or normalize floating-point operands' mantissas before addition or subtraction shift a given mantissa pair one more bit to the left for subtraction than for addition. As a result, the addition pipeline's rounding circuitry (160, 166) does not need to be capable of adding round bits in as many positions as it would without the shift difference, so it can be simpler and faster. Similarly, circuitry (164a–g and 188) employed for normalization after addition and subtraction can be simpler because it does not have to implement as shift options.Type: GrantFiled: March 13, 1998Date of Patent: May 9, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser
-
Publication number: 20040167953Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.Type: ApplicationFiled: February 26, 2004Publication date: August 26, 2004Inventor: Steven Shaw
-
Patent number: 6748521Abstract: A data processing system is provided with a digital signal processor which has an instruction for saturating multiple fields of a selected set of source operands and storing the separate saturated results in a selected destination register. A first 32-bit operand (600) and a second 32-bit operand (602) are treated as four 16-bit fields and the sixteen bits in each field are saturated separately. Multi-field saturation circuitry is operable to treat a source operand as a number of fields, such that a multi-field saturated (610) result is produced that includes a number of saturated results each corresponding to each field. One instruction is provided which treats an operand pair as having two packed fields, and another instruction is provided that treats the operand pair has having four packed fields. Saturation circuitry is operable to selectively treat a field as either a signed value or an unsigned value.Type: GrantFiled: October 31, 2000Date of Patent: June 8, 2004Assignee: Texas Instruments IncorporatedInventor: David Hoyle
-
Patent number: 6748516Abstract: Disclosed is a method, apparatus, and an instruction set architecture (ISA) for an application specific signal processor (ASSP) tailored to digital signal processing (DSP) applications. A single DSP instruction includes a pair of sub-instructions: a primary DSP sub-instruction and a shadow DSP sub-instruction. Both the primary and the shadow DSP sub-instructions are dyadic DSP instructions performing two operations in one instruction cycle. Each signal processing unit of the ASSP includes a primary stage to execute a primary DSP sub-instruction based upon current data and a shadow stage to simultaneously execute a shadow DSP sub-instruction based upon delayed data stored locally within registers of the signal processing units. The present invention efficiently executes DSP instructions by simultaneously executing primary DSP sub-instructions (based upon current data) and shadow DSP sub-instructions (based upon delayed locally stored data) with a single DSP instruction.Type: GrantFiled: January 29, 2002Date of Patent: June 8, 2004Assignee: Intel CorporationInventors: Kumar Ganapathy, Ruban Kanapathipillai
-
Method and apparatus for performing addressing operations in a superscalar, superpipelined processor
Patent number: 6718458Abstract: A method and apparatus for improving the performance of a superscalar, superpipelined processor by identifying and processing instructions for performing addressing operations is provided. The invention heuristically determines instructions likely to perform addressing operations and assigns those instructions to specialized pipes in a pipeline structure. The invention can assign such instructions to both an execute pipe and a load/store pipe to avoid the occurrence of “bubbles” in the event execution of the instruction requires the calculation capability of the execute pipe. The invention can also examine a sequence of instructions to identify an instruction for performing a calculation where the result of the calculation is used by a succeeding load or store instruction. In this case, the invention controls the pipeline to assure the result of the calculation is available for the succeeding load or store instruction even if both instructions are being processed concurrently.Type: GrantFiled: March 27, 2003Date of Patent: April 6, 2004Assignee: Broadcom CorporationInventors: Dan Dobberpuhl, Robert Stepanian -
Publication number: 20030018676Abstract: A scalable engine having multiple datapaths, each of which is a unique multi-function floating point pipeline capable of performing a four component dot product on data in a single pass through the datapath, which allows matrix transformations to be computed in an efficient manner, with a high data throughput and without substantially increasing the cost and amount of hardware required to implement the pipeline.Type: ApplicationFiled: March 15, 2001Publication date: January 23, 2003Inventor: Steven Shaw
-
Patent number: 6487653Abstract: A microprocessor configured to dynamically switch its floating point load pipeline length from one stage in length to more than one stage in length is disclosed. The microprocessor may perform normal loads and detect denormal loads in a single clock cycle. The microprocessor temporarily stores each scheduled floating point instruction in a reissue buffer for at least one clock cycle. When a denormal load instruction is detected, the microprocessor is configured to add one or more stages to the floating point load pipeline to allow the denormal value to complete the conversion to an internal format. The longer pipeline is then used for all loads that follow the denormal load until there is an idle clock cycle or an abort occurs. At that point, the pipeline reverts back to its original shorter state. In addition, the microprocessor may be configured to cancel instructions scheduled assuming the denormal load would take only one clock cycle to complete.Type: GrantFiled: August 25, 1999Date of Patent: November 26, 2002Assignee: Advanced Micro Devices, Inc.Inventors: Stuart F. Oberman, Stephan G. Meier, Jeffrey E. Trull
-
Patent number: 6430681Abstract: In a digital signal processor having an improved arithmetic processing efficiency, there is provided in parallel a first ROM for storing branch commands and a second ROM for storing arithmetic commands. The ROMs are connected to a branch command decoder and an arithmetic command decoder, respectively. Operations of a first memory control circuit and a second memory control circuit are controlled in response to instructions from the branch command decoder, while operations of an arithmetic circuit are controlled in response to instructions from the arithmetic command decoder. By processing the branch commands and the arithmetic commands in parallel, the operation efficiency of the arithmetic circuit is enhanced.Type: GrantFiled: June 18, 1999Date of Patent: August 6, 2002Assignee: Sanyo Electric Co., Ltd.Inventor: Fumiaki Nagao
-
Patent number: 6317825Abstract: The invention relates to a microprocessor (MP) comprising means to decode (DEC1) a compact instruction (BMV) for the concatenation of at least one bit (bi) of a first binary word (W1) with at least one bit of a second binary word (W2), and means (REGBANK, MUX, BSHIFT) to process this instruction in one clock cycle. Advantages: fast processing of a concatenation operation. Application especially to chip cards.Type: GrantFiled: May 3, 2000Date of Patent: November 13, 2001Assignee: Inside TechnologiesInventor: Sean Commercial
-
Patent number: 6088715Abstract: An optimized multimedia execution unit configured to perform vectored floating point and integer instructions. In one embodiment, the execution unit includes an add/subtract pipeline having far and close data paths. The far data path is configured to handle effective addition operations, as well as effective subtraction operations for operands having an absolute exponent difference greater than one. The close data path, conversely, is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close data path includes an adder unit configured to generate a first and second output value. The first output value is equal to the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one.Type: GrantFiled: March 27, 1998Date of Patent: July 11, 2000Assignee: Advanced Micro Devices, Inc.Inventor: Stuart F. Oberman
-
Patent number: 6018756Abstract: If the exponents of a floating-point-processor addition pipeline's input operands are equal, a signal (INVERT) that determines whether the pipeline's sole full-width carry-propagate mantissa adder (34) will invert one of its inputs results from an inversion-determination circuit (FIG. 11) that determines whether the sole set bit in a decoded normalization-shift signal (NORM.sub.-- SHIFT) occupies the same position as a set bit in a signal (FRAC.sub.-- A.sub.-- GT.sub.-- B) representing what the possible normalization amounts will be if a first of the mantissas is greater than the other, second mantissa. Consequently, a bit-comparison operation (56) that employs no full-width carry-propagate addition can determine the amount of normalization shifting to be performed by bit shifters (30 and 32) disposed in respective processing trains that generate mantissa inputs to the mantissa adder (34).Type: GrantFiled: March 13, 1998Date of Patent: January 25, 2000Assignee: Digital Equipment CorporationInventors: Gilbert M. Wolrich, Mark D. Matson, John D. Clouser