Floating Point Or Vector Patents (Class 712/222)

CONVERT FROM ZONED FORMAT TO DECIMAL FLOATING POINT FORMAT

Publication number: 20150089205

Abstract: Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.

Type: Application

Filed: December 4, 2014

Publication date: March 26, 2015

Inventors: Steven R. Carlough, Reid T. Copeland, Charles W. Gainey, JR., Marcel Mitran, Eric M. Schwarz, Timothy J. Slegel
Predecode logic autovectorizing a group of scalar instructions including result summing add instruction to a vector instruction for execution in vector unit with dot product adder

Patent number: 8984260

Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.

Type: Grant

Filed: December 20, 2011

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE INSTRUCTION

Publication number: 20150074383

Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.

Type: Application

Filed: November 16, 2014

Publication date: March 12, 2015

Inventor: Jonathan D. Bradbury
SELECTIVELY CONTROLLING INSTRUCTION EXECUTION IN TRANSACTIONAL PROCESSING

Publication number: 20150052336

Abstract: Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control.

Type: Application

Filed: October 29, 2014

Publication date: February 19, 2015

Inventors: Dan F. Greiner, Christian Jacobi, Robert R. Rogers, Timothy J. Slegel
INTERPOLATION IMPLEMENTATION

Publication number: 20150052335

Abstract: Techniques are disclosed relating to floating-point operations in computer processors. In one embodiment, an apparatus includes a floating-point unit and circuitry configured to receive an initial value X for a floating-point operation. In this embodiment, X is between 0 and 1.0 inclusive. In this embodiment, the circuitry is configured to generate first and second floating-point values based on an exponent of X that sum to 1. In this embodiment, the floating-point unit is configured to perform an operation using the first and second floating-point values. The apparatus may reduce drift, in this embodiment, when a floating-point representation of X does not guarantee that the sum of X and (1?X) is 1. The apparatus may be configured to perform blending and/or interpolation operations using the first and second floating-point values.

Type: Application

Filed: August 19, 2013

Publication date: February 19, 2015

Applicant: Apple Inc.

Inventor: James S. Blomgren
COMPUTER FOR AMDAHL-COMPLIANT ALGORITHMS LIKE MATRIX INVERSION

Publication number: 20150039866

Abstract: A family of computers is disclosed and claimed that supports simultaneous processes from the single core up to multi-chip Program Execution Systems (PES). The instruction processing of the instructed resources is local, dispensing with the need for large VLIW memories. The cores through the PES have maximum performance for Amdahl-compliant algorithms like matrix inversion, because the multiplications do not stall and the other circuitry keeps up. Cores with log based multiplication generators improve this performance by a factor of two for sine and cosine calculations in single precision floating point and have even greater performance for loge and ex calculations. Apparatus specifying, simulating, and/or layouts of the computer (components) are disclosed. Apparatus the computer and/or its components are disclosed.

Type: Application

Filed: October 16, 2014

Publication date: February 5, 2015

Applicant: QSigma, Inc.

Inventors: Earle Jennings, George Landers
MULTIFUNCTIONAL HEXADECIMAL INSTRUCTION FORM SYSTEM AND PROGRAM PRODUCT

Publication number: 20150006859

Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.

Type: Application

Filed: September 15, 2014

Publication date: January 1, 2015

Inventors: Eric M. SCHWARZ, Ronald M. SMITH, SR.
Predicting a result for a predicate-generating instruction when processing vector instructions

Patent number: 8924693

Abstract: The described embodiments include a processor that executes vector instructions. While dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. When executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.

Type: Grant

Filed: May 12, 2011

Date of Patent: December 30, 2014

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Method for performing plurality of bit operations and a device having plurality of bit operations capabilities

Patent number: 8909905

Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.

Type: Grant

Filed: August 18, 2006

Date of Patent: December 9, 2014

Assignee: Freescale Semiconductor, Inc.

Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
MOVING AVERAGE PROCESSING IN PROCESSOR AND PROCESSOR

Publication number: 20140351566

Abstract: A processor, which executes m number of arithmetic operations in parallel, executes a partial sum instruction which takes an i-th to (i+m?1)-th elements of an input data series as input elements, so as to obtain first vector data, executes the partial sum instruction which takes a (i+x)-th to (i+x+m?1)-th elements of the input data series as the input elements, so as to obtain second vector data, and performs operations to subtract the p-th element of the first vector data and add the p-th element of the second vector data from and to a sum of the i-th to (i+x?1)-th elements of the input data series in parallel for each of the 0-th to (m?1)-th elements, so as to calculate sums of elements for m sections different from each other in parallel, and moving average processing to calculate a moving average from the sums of elements of the sections.

Type: Application

Filed: April 7, 2014

Publication date: November 27, 2014

Applicant: FUJITSU LIMITED

Inventors: MAKIKO ITO, Manabu KUBOTA, Kazuhiro Nomoto
SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS

Publication number: 20140351565

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Application

Filed: March 24, 2014

Publication date: November 27, 2014

Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.

Inventors: Craig HANSEN, John MOUSSOURIS, Alexia MASSALIN
SIMPLIFICATION OF LARGE NETWORKS AND GRAPHS

Publication number: 20140351564

Abstract: Embodiments relate to simplifying large and complex networks and graphs using global connectivity information based on calculated node centralities. An aspect includes calculating node centralities of a graph until a designated number of central nodes are detected. A percentage of the central nodes are then selected as pivot nodes. The neighboring nodes to each of the pivot nodes are then collapsed until the graph shrinks to a predefined threshold of total nodes. Responsive to the number of total nodes reaching the predefined threshold, the simplified graph is outputted.

Type: Application

Filed: September 12, 2013

Publication date: November 27, 2014

Applicant: International Business Machines Corporation

Inventors: Konstantinos Bekas, Alessandro Curioni
Scalable Partial Vectorization

Publication number: 20140344555

Abstract: A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors.

Type: Application

Filed: May 20, 2013

Publication date: November 20, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Ramshankar Ramanarayanan, Meghana Gupta, Soham S. Chakraborty, Dibyendu Das
RECONFIGURABLE PROCESSOR HAVING CONSTANT STORAGE REGISTER

Publication number: 20140331031

Abstract: A reconfigurable processor configured to include a constant storage register to store a constant is provided, thereby improving efficiency in the use of a memory space. Specifically, a reconfigurable processor includes a plurality of Functional Units (FUs), a configuration memory configured to store configuration information, and a constant storage register configured to store a constant that is used as an operand for an operation in the plurality of FUs.

Type: Application

Filed: May 5, 2014

Publication date: November 6, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Dong-Kwan SUH, Seok-Jin KIM
Dual register data path architecture with registers in a data file divided into groups and sub-groups

Patent number: 8880855

Abstract: A processor includes a first and second execution unit each of which is arranged to execute multiply instructions of a first type upon fixed point operands and to execute multiply instructions of a second type upon floating point operands. A register file of the processor stores operands in registers that are each addressable by instructions for performing the first and second types of operations. An instruction decode unit is responsive to the at least one multiply instruction of the first type and the at least one multiply instruction of the second type to at the same time enable a first data path between the first set of registers and the first execution unit and to enable a second data path between a second set of registers and the second execution unit.

Type: Grant

Filed: September 15, 2011

Date of Patent: November 4, 2014

Assignee: Texas Instruments Incorporated

Inventors: Timothy D Anderson, Duc Quang Bui, Eric Biscondi, Shriram D Moharil, Mujibur Rahman, Soujanya Narnur, Peter Richard Dent, Ashish Rai Shrivastava
METHOD FOR PERFORMING DUAL DISPATCH OF BLOCKS AND HALF BLOCKS

Publication number: 20140317387

Abstract: A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.

Type: Application

Filed: March 14, 2014

Publication date: October 23, 2014

Applicant: Soft Machines, Inc.

Inventor: Mohammad ABDALLAH
ENHANCED VECTOR TRUE/FALSE PREDICATE-GENERATING INSTRUCTIONS

Publication number: 20140289502

Abstract: Systems, apparatuses and methods for utilizing enhanced vector true/false instructions. The enhanced vector true/false instructions generate enhanced predicates to correspond to the request element width and/or vector size. A vector true instruction generates an enhanced predicate where all elements supported by the processing unit are active. A vector false instruction generates an enhanced predicate where all elements supported by the processing unit are inactive. The enhanced predicate specifies the requested element width in addition to designating the element selectors.

Type: Application

Filed: March 18, 2014

Publication date: September 25, 2014

Applicant: Apple Inc.

Inventor: Jeffry E. Gonion
ARBITRARY SIZE TABLE LOOKUP AND PERMUTES WITH CROSSBAR

Publication number: 20140281421

Abstract: An example method of updating an output data vector includes identifying a data value vector including element data values. The method also includes identifying an address value vector including a set of elements. The method further includes applying a conditional operator to each element of the set of elements in the address value vector. The method also includes for each element data value in the data value vector, determining whether to update an output data vector based on applying the conditional operator.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: Marc M. Hoffman, Ajay Anant Ingle, Jose Fridman
ADD-COMPARE-SELECT INSTRUCTION

Publication number: 20140281420

Abstract: An apparatus includes memory storing an instruction that identifies a first register, a second register, and a third register. Upon execution of the instruction by a processor, a vector addition operation is performed by the processor to add first values from the first register to second values from the second register. A vector subtraction operation is also performed upon execution of the instruction to subtract the second value from third values from the third register. A vector compare operation is also performed upon execution of the instruction to compare results of the vector addition operation to results of the vector subtraction operation.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventor: Nico De Laurentiis
COMBINED FLOATING POINT MULTIPLIER ADDER WITH INTERMEDIATE ROUNDING LOGIC

Publication number: 20140281419

Abstract: An error handling method includes identifying a code region eligible for cumulative multiply add (CMA) optimization and translating code region instructions into interpreter code instructions, which may include translating sequences of multiply add instructions in the code region instructions into fusion code including CMA instructions. Floating point (FP) exceptions generated by the fusion code may be monitored and at least a portion of the code region instructions may be re-translated to eliminate some or all fusion code if CMA intermediate rounding exceptions exceed a threshold.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Applicant: Intel Corporation

Inventors: Marc Lupon, Grigorios Magklis, Sridhar Samudrala, Raul Martinez, Kyriakos A. Stavrou, Enric Gibert Codina
Packing lower half bits of signed data elements in two source registers in a destination register with saturation

Patent number: 8838946

Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.

Type: Grant

Filed: December 29, 2012

Date of Patent: September 16, 2014

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
Multifunction hexadecimal instruction form system and program product

Patent number: 8838942

Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.

Type: Grant

Filed: January 23, 2013

Date of Patent: September 16, 2014

Assignee: International Business Machines Corporation

Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION FOR MULTI-PRECISION ARITHMETIC

Publication number: 20140237218

Abstract: A multiply-and-accumulate (MAC) instruction allows efficient execution of unsigned integer multiplications. The MAC instruction indicates a first vector register as a first operand, a second vector register as a second operand, and a third vector register as a destination. The first vector register stores a first factor, and the second vector register stores a partial sum. The MAC instruction is executed to multiply the first factor with an implicit second factor to generate a product, and to add the partial sum to the product to generate a result. The first factor, the implicit second factor and the partial sum have a same data width and the product has twice the data width. The most significant half of the result is stored in the third vector register, and the least significant half of the result is stored in the second vector register.

Type: Application

Filed: December 19, 2011

Publication date: August 21, 2014

Inventors: Vinodh Gopal, Gilbert M. Wolrich, Erdinc Ozturk, James D. Guilford, Kirk S. Yap, Sean M. Gulley, wajdi K. Feghali, Martin G. Dixon
VECTORIZATION IN AN OPTIMIZING COMPILER

Publication number: 20140237217

Abstract: An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program.

Type: Application

Filed: February 21, 2013

Publication date: August 21, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: William J. Schmidt
VECTOR AND SCALAR BASED MODULAR EXPONENTIATION

Publication number: 20140229716

Abstract: An embodiment includes a method for computing operations, such as modular exponentiation, using a mix of vector and scalar instructions to accelerate various applications such as encryption protocols that rely heavily on large number arithmetic operations. The embodiment requires far fewer instructions to execute the operations than more conventional practices. Other embodiments are described herein.

Type: Application

Filed: May 30, 2012

Publication date: August 14, 2014

Inventors: Shay Gueron, Vlad Krasnov
Vector index instruction for generating a result vector with incremental values based on a start value and an increment value

Patent number: 8793472

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.

Type: Grant

Filed: November 8, 2011

Date of Patent: July 29, 2014

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
VECTOR CHECKSUM INSTRUCTION

Publication number: 20140208078

Abstract: A Vector Checksum instruction. Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Eric M. Schwarz
VECTOR GALOIS FIELD MULTIPLY SUM AND ACCUMULATE INSTRUCTION

Publication number: 20140208079

Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Jonathan D. Bradbury
VECTOR FLOATING POINT TEST DATA CLASS IMMEDIATE INSTRUCTION

Publication number: 20140208077

Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Eric M. Schwarz
Technique for simulating floating-point stack operation involving conversion of certain floating-point register numbers based on a top-of-stack pointer and modulo function

Patent number: 8788796

Abstract: A Reduced Instruction Set Computing (RISC) processor is capable of emulating operation of a floating-point register stack. The RISC processor may include a floating-point register file containing a plurality of floating-point registers, a decoding section for decoding operation instructions, and a floating-point operation section. The RISC processor may also include a control register for controlling status of floating-point registers, and for controlling the decoding section and the floating-point operation section, to thereby emulate a floating-point register stack using the floating-point register file. The decoding section may include a pointer register for maintaining a stack operation pointer, and for storing a value of the stack operation pointer.

Type: Grant

Filed: December 12, 2008

Date of Patent: July 22, 2014

Assignee: Loongson Technology Corporation Limited

Inventors: Wei Duan, Xiaoyu Li
Vector instruction execution to load vector data in registers of plural vector units using offset addressing logic

Patent number: 8782376

Abstract: A processor including: a first and at least a second data processing channel with enable logic for selectively enabling the second channel; logic for generating first and second storage addresses having a variable offset therebetween based on the same one or more address operands of the same storage access instruction; and circuitry for transferring data between the first address and a register of the first data processing channel and between the second address and a corresponding register of the second channel based on a same one or more register specifier operands of the access instruction. The first data processing channel performs an operation using one or more registers of the first data processing channel, and on condition of being enabled the second channel performs the same operation using a corresponding one or more of its own registers based on the same one or more operands of the data processing instruction.

Type: Grant

Filed: August 26, 2011

Date of Patent: July 15, 2014

Assignee: Icera Inc.

Inventors: Simon Knowles, Edward Andrews, Stephen Felix, Simon Huckett, Colman Hegarty
DOT PRODUCT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20140195783

Abstract: A method of an aspect includes receiving a dot product instruction. The dot product instruction indicates a first source packed data including at least four data elements, indicates a second source packed data including at least eight data elements, and indicates a destination storage location. A result packed data is stored in the destination storage location in response to the dot product instruction. The result includes a plurality of data elements that each includes a dot product result. Each of the dot product results includes a sum of products of the at least four data elements of the first source packed data with corresponding data elements in a different subset of at least four data elements of the second source packed data. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: December 29, 2011

Publication date: July 10, 2014

Inventors: Krishnan Karthikeyan, Elmoustapha Ould-Ahmed-Vall, Victor Cherepanov
INSTRUCTIONS AND LOGIC TO VECTORIZE CONDITIONAL LOOPS

Publication number: 20140189321

Abstract: Instructions and logic provide vectorization of conditional loops. A vector expand instruction has a parameter to specify a source vector, a parameter to specify a conditions mask register, and a destination parameter to specify a destination vector to hold n consecutive vector elements, each of the plurality of n consecutive vector elements having a same variable partition size of m bytes. In response to the processor instruction, data is copied from consecutive vector elements in the source vector, and expanded into unmasked vector elements of the specified destination vector, without copying data into masked vector elements of the destination vector, wherein n varies responsive to the processor instruction executed. The source vector may be a register and the destination vector may be in memory. Some embodiments store counts of the condition decisions. Alternative embodiments may store other data, for example such as target addresses, or table offsets, or indicators of processing directives, etc.

Type: Application

Filed: December 31, 2012

Publication date: July 3, 2014

Inventors: Tal Uliel, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll
Instruction for Determining Histograms

Publication number: 20140189320

Abstract: A processor is described having a functional unit of an instruction execution pipeline. The functional unit has comparison bank circuitry and adder circuitry. The comparison bank circuitry is to compare one or more elements of a first input vector against an element of a second input vector. The adder circuitry is coupled to the comparison bank circuitry to add the number of elements of the second input vector that match a value of the first input vector on an element by element basis of the first input vector.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventor: Shih Shigjong KUO
Instructions with floating point control override

Patent number: 8769249

Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, a processor includes a first logic to receive an instruction having one or more bits corresponding to override control data. The override control data is to indicate one or more floating point operation settings that are to override one or more default settings. The processor also has a second logic to perform a floating point operation in response to the instruction and at least one of the one or more floating point operation settings.

Type: Grant

Filed: November 6, 2012

Date of Patent: July 1, 2014

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan
System and apparatus for group floating-point inflate and deflate operations

Patent number: 8769248

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Grant

Filed: June 11, 2012

Date of Patent: July 1, 2014

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Processing order with integer inputs and floating point inputs

Patent number: 8766991

Abstract: A graphics processing unit 2 includes a texture pipeline 6 which performs filter operations upon texture values. If the texture values are integer texture values, then they may be processed by the texture pipeline in a variable order corresponding to the order in which they are retrieved from a memory 4. If the texture values are floating point texture values, then they are processed in a fixed order in order to ensure result invariants as the filter operation is non-associative for floating point values. The filter operation is not commenced until all of the floating point texture values have been retrieved from the memory 4 and other available for processing.

Type: Grant

Filed: May 25, 2011

Date of Patent: July 1, 2014

Assignee: ARM Limited

Inventors: Andreas Due Engh-Halstvedt, Jørn Nystad
DETECTION OF POTENTIAL NEED TO USE A LARGER DATA FORMAT IN PERFORMING FLOATING POINT OPERATIONS

Publication number: 20140181481

Abstract: Detection of whether a result of a floating point operation is safe. Characteristics of the result are examined to determine whether the result is safe or potentially unsafe, as defined by the user. An instruction is provided to facilitate detection of safe or potentially unsafe results.

Type: Application

Filed: February 28, 2014

Publication date: June 26, 2014

Applicant: International Business Machines Corporation

Inventors: Michael F. Cowlishaw, Shawn D. Lundvall, Ronald M. Smith, Phil C. Yeh
FLOATING POINT EXECUTION UNIT FOR CALCULATING PACKED SUM OF ABSOLUTE DIFFERENCES

Publication number: 20140149720

Abstract: A method and circuit arrangement provide support for packed sum of absolute difference operations in a floating point execution unit, e.g., a scalar or vector floating point execution unit. Existing adders in a floating point execution unit may be utilized along with minimal additional logic in the floating point execution unit to support efficient execution of a fixed point packed sum of absolute differences instruction within the floating point execution unit, often eliminating the need for a separate vector fixed point execution unit in a processor architecture, and thereby leading to less logic and circuit area, lower power consumption and lower cost.

Type: Application

Filed: November 29, 2012

Publication date: May 29, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Recycling Error Bits in Floating Point Units

Publication number: 20140136820

Abstract: A mechanism for recycling error bits in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprising a floating point unit (FPU) to generate a result value from applying an operation on floating point number inputs to the FPU and generate an error value using the result value. The FPU also writes the result value to a first register of the processing device dedicated to storing results from the operation of the FPU and writes the error value to a second register of the processing device dedicated to storing errors from the operation of the FPU.

Type: Application

Filed: November 14, 2012

Publication date: May 15, 2014

Inventors: Helia Naeimi, Ralph Nathan, Daniel Sorin, Shih-Lien L. Lu
Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register

Publication number: 20140095843

Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed_Vall, Bret L. Toll, Robert Valentine
ACCELERATED INTERLANE VECTOR REDUCTION INSTRUCTIONS

Publication number: 20140095842

Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
CIRCUIT AND METHOD FOR IDENTIFYING EXCEPTION CASES IN A FLOATING-POINT UNIT AND GRAPHICS PROCESSING UNIT EMPLOYING THE SAME

Publication number: 20140089644

Abstract: A floating-point unit and a method of identifying exception cases in a floating-point unit. In one embodiment, the floating-point unit includes: (1) a floating-point computation circuit having a normal path and an exception path and operable to execute an operation on an operand and (2) a decision circuit associated with the normal path and the exception path and configured to employ a flush-to-zero mode of the floating-point unit to determine which one of the normal path and the exception path is appropriate for carrying out the operation on the operand.

Type: Application

Filed: September 25, 2012

Publication date: March 27, 2014

Applicant: NVIDIA CORPORATION

Inventors: Marcin Andrychowicz, Alex Fit-Florea
Performing a multiply-multiply-accumulate instruction

Patent number: 8683183

Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.

Type: Grant

Filed: March 4, 2013

Date of Patent: March 25, 2014

Assignee: Intel Corporation

Inventor: Eric Sprangle
System and apparatus for group floating-point inflate and deflate operations

Patent number: 8683182

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Grant

Filed: June 11, 2012

Date of Patent: March 25, 2014

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING AN ABSOLUTE DIFFERENCE CALCULATION BETWEEN CORRESPONDING PACKED DATA ELEMENTS OF TWO VECTOR REGISTERS

Publication number: 20140082333

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor absolute difference calculation in response to a single vector packed absolute difference instruction that includes a first and second source vector register operand, a destination vector register operand, and an opcode are described.

Type: Application

Filed: December 22, 2011

Publication date: March 20, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Methods, apparatus, and instructions for converting vector data

Patent number: 8667250

Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.

Type: Grant

Filed: December 26, 2007

Date of Patent: March 4, 2014

Assignee: Intel Corporation

Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
MECHANISM FOR PERFORMING SPECULATIVE PREDICATED INSTRUCTIONS

Publication number: 20140059328

Abstract: A mechanism for executing speculative predicated instructions may include execution of initiating execution of a vector instruction when one or more operands upon which the vector instruction depends are available for use, even if a predicate vector that the vector instruction also depends is not available. If the predicate vector was not available, the results of the execution of the vector instruction may be temporarily held until the predicate vector becomes available, at which time, a destination vector may be updated with the results.

Type: Application

Filed: August 21, 2012

Publication date: February 27, 2014

Inventor: Jeffry E. Gonion
OPCODE COUNTING FOR PERFORMANCE MEASUREMENT

Publication number: 20140052970

Abstract: Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.

Type: Application

Filed: October 25, 2013

Publication date: February 20, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alan Gara, David L. Satterfield, Robert E. Walkup
SUPER MULTIPLY ADD (SUPER MADD) INSTRUCTIONS WITH THREE SCALAR TERMS

Publication number: 20140052969

Abstract: A processing core is described having execution unit logic circuitry having a first register to store a first vector input operand, a second register to a store a second vector input operand and a third register to store a packed data structure containing scalar input operands a, b, c. The execution unit logic circuitry further include a multiplier to perform the operation (a*(first vector input operand))+(b*(second vector operand))+c.

Type: Application

Filed: December 23, 2011

Publication date: February 20, 2014

Applicant: Intel Corporation

Inventors: Jesus Corbal, Andrew T. Forsyth, Thomas D. Fletcher, Lisa K. Wu, Eric Sprangle

prev 1 2 3 4 5 6 7 … next