Floating Point Or Vector Patents (Class 712/222)
  • Publication number: 20150089205
    Abstract: Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.
    Type: Application
    Filed: December 4, 2014
    Publication date: March 26, 2015
    Inventors: Steven R. Carlough, Reid T. Copeland, Charles W. Gainey, JR., Marcel Mitran, Eric M. Schwarz, Timothy J. Slegel
  • Patent number: 8984260
    Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20150074383
    Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.
    Type: Application
    Filed: November 16, 2014
    Publication date: March 12, 2015
    Inventor: Jonathan D. Bradbury
  • Publication number: 20150052336
    Abstract: Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control.
    Type: Application
    Filed: October 29, 2014
    Publication date: February 19, 2015
    Inventors: Dan F. Greiner, Christian Jacobi, Robert R. Rogers, Timothy J. Slegel
  • Publication number: 20150052335
    Abstract: Techniques are disclosed relating to floating-point operations in computer processors. In one embodiment, an apparatus includes a floating-point unit and circuitry configured to receive an initial value X for a floating-point operation. In this embodiment, X is between 0 and 1.0 inclusive. In this embodiment, the circuitry is configured to generate first and second floating-point values based on an exponent of X that sum to 1. In this embodiment, the floating-point unit is configured to perform an operation using the first and second floating-point values. The apparatus may reduce drift, in this embodiment, when a floating-point representation of X does not guarantee that the sum of X and (1?X) is 1. The apparatus may be configured to perform blending and/or interpolation operations using the first and second floating-point values.
    Type: Application
    Filed: August 19, 2013
    Publication date: February 19, 2015
    Applicant: Apple Inc.
    Inventor: James S. Blomgren
  • Publication number: 20150039866
    Abstract: A family of computers is disclosed and claimed that supports simultaneous processes from the single core up to multi-chip Program Execution Systems (PES). The instruction processing of the instructed resources is local, dispensing with the need for large VLIW memories. The cores through the PES have maximum performance for Amdahl-compliant algorithms like matrix inversion, because the multiplications do not stall and the other circuitry keeps up. Cores with log based multiplication generators improve this performance by a factor of two for sine and cosine calculations in single precision floating point and have even greater performance for loge and ex calculations. Apparatus specifying, simulating, and/or layouts of the computer (components) are disclosed. Apparatus the computer and/or its components are disclosed.
    Type: Application
    Filed: October 16, 2014
    Publication date: February 5, 2015
    Applicant: QSigma, Inc.
    Inventors: Earle Jennings, George Landers
  • Publication number: 20150006859
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Application
    Filed: September 15, 2014
    Publication date: January 1, 2015
    Inventors: Eric M. SCHWARZ, Ronald M. SMITH, SR.
  • Patent number: 8924693
    Abstract: The described embodiments include a processor that executes vector instructions. While dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. When executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.
    Type: Grant
    Filed: May 12, 2011
    Date of Patent: December 30, 2014
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8909905
    Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: December 9, 2014
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
  • Publication number: 20140351566
    Abstract: A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m?1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m?1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x?1)-th elements of the input data series in parallel for each of the 0-th to (m?1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections.
    Type: Application
    Filed: April 7, 2014
    Publication date: November 27, 2014
    Applicant: FUJITSU LIMITED
    Inventors: MAKIKO ITO, Manabu KUBOTA, Kazuhiro Nomoto
  • Publication number: 20140351565
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Application
    Filed: March 24, 2014
    Publication date: November 27, 2014
    Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.
    Inventors: Craig HANSEN, John MOUSSOURIS, Alexia MASSALIN
  • Publication number: 20140351564
    Abstract: Embodiments relate to simplifying large and complex networks and graphs using global connectivity information based on calculated node centralities. An aspect includes calculating node centralities of a graph until a designated number of central nodes are detected. A percentage of the central nodes are then selected as pivot nodes. The neighboring nodes to each of the pivot nodes are then collapsed until the graph shrinks to a predefined threshold of total nodes. Responsive to the number of total nodes reaching the predefined threshold, the simplified graph is outputted.
    Type: Application
    Filed: September 12, 2013
    Publication date: November 27, 2014
    Applicant: International Business Machines Corporation
    Inventors: Konstantinos Bekas, Alessandro Curioni
  • Publication number: 20140344555
    Abstract: A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors.
    Type: Application
    Filed: May 20, 2013
    Publication date: November 20, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Ramshankar Ramanarayanan, Meghana Gupta, Soham S. Chakraborty, Dibyendu Das
  • Publication number: 20140331031
    Abstract: A reconfigurable processor configured to include a constant storage register to store a constant is provided, thereby improving efficiency in the use of a memory space. Specifically, a reconfigurable processor includes a plurality of Functional Units (FUs), a configuration memory configured to store configuration information, and a constant storage register configured to store a constant that is used as an operand for an operation in the plurality of FUs.
    Type: Application
    Filed: May 5, 2014
    Publication date: November 6, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Dong-Kwan SUH, Seok-Jin KIM
  • Patent number: 8880855
    Abstract: A processor includes a first and second execution unit each of which is arranged to execute multiply instructions of a first type upon fixed point operands and to execute multiply instructions of a second type upon floating point operands. A register file of the processor stores operands in registers that are each addressable by instructions for performing the first and second types of operations. An instruction decode unit is responsive to the at least one multiply instruction of the first type and the at least one multiply instruction of the second type to at the same time enable a first data path between the first set of registers and the first execution unit and to enable a second data path between a second set of registers and the second execution unit.
    Type: Grant
    Filed: September 15, 2011
    Date of Patent: November 4, 2014
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D Anderson, Duc Quang Bui, Eric Biscondi, Shriram D Moharil, Mujibur Rahman, Soujanya Narnur, Peter Richard Dent, Ashish Rai Shrivastava
  • Publication number: 20140317387
    Abstract: A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.
    Type: Application
    Filed: March 14, 2014
    Publication date: October 23, 2014
    Applicant: Soft Machines, Inc.
    Inventor: Mohammad ABDALLAH
  • Publication number: 20140289502
    Abstract: Systems, apparatuses and methods for utilizing enhanced vector true/false instructions. The enhanced vector true/false instructions generate enhanced predicates to correspond to the request element width and/or vector size. A vector true instruction generates an enhanced predicate where all elements supported by the processing unit are active. A vector false instruction generates an enhanced predicate where all elements supported by the processing unit are inactive. The enhanced predicate specifies the requested element width in addition to designating the element selectors.
    Type: Application
    Filed: March 18, 2014
    Publication date: September 25, 2014
    Applicant: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20140281421
    Abstract: An example method of updating an output data vector includes identifying a data value vector including element data values. The method also includes identifying an address value vector including a set of elements. The method further includes applying a conditional operator to each element of the set of elements in the address value vector. The method also includes for each element data value in the data value vector, determining whether to update an output data vector based on applying the conditional operator.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Marc M. Hoffman, Ajay Anant Ingle, Jose Fridman
  • Publication number: 20140281420
    Abstract: An apparatus includes memory storing an instruction that identifies a first register, a second register, and a third register. Upon execution of the instruction by a processor, a vector addition operation is performed by the processor to add first values from the first register to second values from the second register. A vector subtraction operation is also performed upon execution of the instruction to subtract the second value from third values from the third register. A vector compare operation is also performed upon execution of the instruction to compare results of the vector addition operation to results of the vector subtraction operation.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventor: Nico De Laurentiis
  • Publication number: 20140281419
    Abstract: An error handling method includes identifying a code region eligible for cumulative multiply add (CMA) optimization and translating code region instructions into interpreter code instructions, which may include translating sequences of multiply add instructions in the code region instructions into fusion code including CMA instructions. Floating point (FP) exceptions generated by the fusion code may be monitored and at least a portion of the code region instructions may be re-translated to eliminate some or all fusion code if CMA intermediate rounding exceptions exceed a threshold.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Intel Corporation
    Inventors: Marc Lupon, Grigorios Magklis, Sridhar Samudrala, Raul Martinez, Kyriakos A. Stavrou, Enric Gibert Codina
  • Patent number: 8838946
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: September 16, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 8838942
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
  • Publication number: 20140237218
    Abstract: A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register.
    Type: Application
    Filed: December 19, 2011
    Publication date: August 21, 2014
    Inventors: Vinodh Gopal, Gilbert M. Wolrich, Erdinc Ozturk, James D. Guilford, Kirk S. Yap, Sean M. Gulley, wajdi K. Feghali, Martin G. Dixon
  • Publication number: 20140237217
    Abstract: An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program.
    Type: Application
    Filed: February 21, 2013
    Publication date: August 21, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William J. Schmidt
  • Publication number: 20140229716
    Abstract: An embodiment includes a method for computing operations, such as modular exponentiation, using a mix of vector and scalar instructions to accelerate various applications such as encryption protocols that rely heavily on large number arithmetic operations. The embodiment requires far fewer instructions to execute the operations than more conventional practices. Other embodiments are described herein.
    Type: Application
    Filed: May 30, 2012
    Publication date: August 14, 2014
    Inventors: Shay Gueron, Vlad Krasnov
  • Patent number: 8793472
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: July 29, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20140208078
    Abstract: A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Publication number: 20140208079
    Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Jonathan D. Bradbury
  • Publication number: 20140208077
    Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Patent number: 8788796
    Abstract: A Reduced Instruction Set Computing (RISC) processor is capable of emulating operation of a floating-point register stack. The RISC processor may include a floating-point register file containing a plurality of floating-point registers, a decoding section for decoding operation instructions, and a floating-point operation section. The RISC processor may also include a control register for controlling status of floating-point registers, and for controlling the decoding section and the floating-point operation section, to thereby emulate a floating-point register stack using the floating-point register file. The decoding section may include a pointer register for maintaining a stack operation pointer, and for storing a value of the stack operation pointer.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: July 22, 2014
    Assignee: Loongson Technology Corporation Limited
    Inventors: Wei Duan, Xiaoyu Li
  • Patent number: 8782376
    Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: July 15, 2014
    Assignee: Icera Inc.
    Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
  • Publication number: 20140195783
    Abstract: A method of an aspect includes receiving a dot product instruction. The dot product instruction indicates a first source packed data including at least four data elements, indicates a second source packed data including at least eight data elements, and indicates a destination storage location. A result packed data is stored in the destination storage location in response to the dot product instruction. The result includes a plurality of data elements that each includes a dot product result. Each of the dot product results includes a sum of products of the at least four data elements of the first source packed data with corresponding data elements in a different subset of at least four data elements of the second source packed data. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 29, 2011
    Publication date: July 10, 2014
    Inventors: Krishnan Karthikeyan, Elmoustapha Ould-Ahmed-Vall, Victor Cherepanov
  • Publication number: 20140189321
    Abstract: Instructions and logic provide vectorization of conditional loops. A vector expand instruction has a parameter to specify a source vector, a parameter to specify a conditions mask register, and a destination parameter to specify a destination vector to hold n consecutive vector elements, each of the plurality of n consecutive vector elements having a same variable partition size of m bytes. In response to the processor instruction, data is copied from consecutive vector elements in the source vector, and expanded into unmasked vector elements of the specified destination vector, without copying data into masked vector elements of the destination vector, wherein n varies responsive to the processor instruction executed. The source vector may be a register and the destination vector may be in memory. Some embodiments store counts of the condition decisions. Alternative embodiments may store other data, for example such as target addresses, or table offsets, or indicators of processing directives, etc.
    Type: Application
    Filed: December 31, 2012
    Publication date: July 3, 2014
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll
  • Publication number: 20140189320
    Abstract: A processor is described having a functional unit of an instruction execution pipeline. The functional unit has comparison bank circuitry and adder circuitry. The comparison bank circuitry is to compare one or more elements of a first input vector against an element of a second input vector. The adder circuitry is coupled to the comparison bank circuitry to add the number of elements of the second input vector that match a value of the first input vector on an element by element basis of the first input vector.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventor: Shih Shigjong KUO
  • Patent number: 8769249
    Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, a processor includes a first logic to receive an instruction having one or more bits corresponding to override control data. The override control data is to indicate one or more floating point operation settings that are to override one or more default settings. The processor also has a second logic to perform a floating point operation in response to the instruction and at least one of the one or more floating point operation settings.
    Type: Grant
    Filed: November 6, 2012
    Date of Patent: July 1, 2014
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan
  • Patent number: 8769248
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Grant
    Filed: June 11, 2012
    Date of Patent: July 1, 2014
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 8766991
    Abstract: A graphics processing unit 2 includes a texture pipeline 6 which performs filter operations upon texture values. If the texture values are integer texture values, then they may be processed by the texture pipeline in a variable order corresponding to the order in which they are retrieved from a memory 4. If the texture values are floating point texture values, then they are processed in a fixed order in order to ensure result invariants as the filter operation is non-associative for floating point values. The filter operation is not commenced until all of the floating point texture values have been retrieved from the memory 4 and other available for processing.
    Type: Grant
    Filed: May 25, 2011
    Date of Patent: July 1, 2014
    Assignee: ARM Limited
    Inventors: Andreas Due Engh-Halstvedt, Jørn Nystad
  • Publication number: 20140181481
    Abstract: Detection of whether a result of a floating point operation is safe. Characteristics of the result are examined to determine whether the result is safe or potentially unsafe, as defined by the user. An instruction is provided to facilitate detection of safe or potentially unsafe results.
    Type: Application
    Filed: February 28, 2014
    Publication date: June 26, 2014
    Applicant: International Business Machines Corporation
    Inventors: Michael F. Cowlishaw, Shawn D. Lundvall, Ronald M. Smith, Phil C. Yeh
  • Publication number: 20140149720
    Abstract: A method and circuit arrangement provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.
    Type: Application
    Filed: November 29, 2012
    Publication date: May 29, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20140136820
    Abstract: A mechanism for recycling error bits in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprising a floating point unit (FPU) to generate a result value from applying an operation on floating point number inputs to the FPU and generate an error value using the result value. The FPU also writes the result value to a first register of the processing device dedicated to storing results from the operation of the FPU and writes the error value to a second register of the processing device dedicated to storing errors from the operation of the FPU.
    Type: Application
    Filed: November 14, 2012
    Publication date: May 15, 2014
    Inventors: Helia Naeimi, Ralph Nathan, Daniel Sorin, Shih-Lien L. Lu
  • Publication number: 20140095843
    Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed_Vall, Bret L. Toll, Robert Valentine
  • Publication number: 20140095842
    Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
  • Publication number: 20140089644
    Abstract: A floating-point unit and a method of identifying exception cases in a floating-point unit. In one embodiment, the floating-point unit includes: (1) a floating-point computation circuit having a normal path and an exception path and operable to execute an operation on an operand and (2) a decision circuit associated with the normal path and the exception path and configured to employ a flush-to-zero mode of the floating-point unit to determine which one of the normal path and the exception path is appropriate for carrying out the operation on the operand.
    Type: Application
    Filed: September 25, 2012
    Publication date: March 27, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Marcin Andrychowicz, Alex Fit-Florea
  • Patent number: 8683183
    Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: March 25, 2014
    Assignee: Intel Corporation
    Inventor: Eric Sprangle
  • Patent number: 8683182
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Grant
    Filed: June 11, 2012
    Date of Patent: March 25, 2014
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20140082333
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor absolute difference calculation in response to a single vector packed absolute difference instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.
    Type: Application
    Filed: December 22, 2011
    Publication date: March 20, 2014
    Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 8667250
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 26, 2007
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Publication number: 20140059328
    Abstract: A mechanism for executing speculative predicated instructions may include execution of initiating execution of a vector instruction when one or more operands upon which the vector instruction depends are available for use, even if a predicate vector that the vector instruction also depends is not available. If the predicate vector was not available, the results of the execution of the vector instruction may be temporarily held until the predicate vector becomes available, at which time, a destination vector may be updated with the results.
    Type: Application
    Filed: August 21, 2012
    Publication date: February 27, 2014
    Inventor: Jeffry E. Gonion
  • Publication number: 20140052970
    Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.
    Type: Application
    Filed: October 25, 2013
    Publication date: February 20, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
  • Publication number: 20140052969
    Abstract: A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.
    Type: Application
    Filed: December 23, 2011
    Publication date: February 20, 2014
    Applicant: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Thomas D. Fletcher, Lisa K. Wu, Eric Sprangle