Floating Point Or Vector Patents (Class 712/222)

PROCESSING DEVICE AND A SWIZZLE PATTERN GENERATOR

Publication number: 20130067203

Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.

Type: Application

Filed: September 14, 2012

Publication date: March 14, 2013

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
Determining index values for bits of binary vector by processing masked sub-vector index values

Patent number: 8392692

Abstract: In one embodiment, the present invention determines index values corresponding to bits of a binary vector that have a value of 1. During each clock cycle, a masking technique is applied to M sub-vector index values, where each sub-vector index value corresponds to a different bit of a sub-vector of the binary vector. The masking technique is applied such that (i) the sub-vector index values that correspond to bits having a value of 0 are zeroed out and (ii) the sub-vector index values that correspond to the bits having a value of 1 are left unchanged. The masked sub-vector index values are sorted, and index values are calculated based on the masked sub-vector index values. The index values generated are then distributed uniformly to a number M of index memories such that the M index memories store substantially the same number of index values.

Type: Grant

Filed: December 12, 2008

Date of Patent: March 5, 2013

Assignee: LSI Corporation

Inventor: Kiran Gunnam
Emulating hexadecimal floating-point operations in non-native systems

Patent number: 8386756

Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.

Type: Grant

Filed: April 11, 2011

Date of Patent: February 26, 2013

Assignee: International Business Machines Corporation

Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
Non-atomic scheduling of micro-operations to perform round instruction

Patent number: 8386755

Abstract: A microprocessor executes an instruction specifying a floating-point input operand having a predetermined size and that instructs the microprocessor to round the floating-point input operand to an integer value using a rounding mode and to return a floating-point result having the same predetermined size. An instruction translator translates the instruction into first and second microinstructions. An execution unit executes the first and second microinstructions. The first microinstruction receives as an input operand the instruction floating-point input operand and generates an intermediate result from the input operand. The second microinstruction receives as an input operand the intermediate result of the first microinstruction and generates the floating-point result of the instruction from the intermediate result. The intermediate result is the same predetermined size as the instruction floating-point input operand.

Type: Grant

Filed: May 20, 2010

Date of Patent: February 26, 2013

Assignee: VIA Technologies, Inc.

Inventors: Tom Elmer, Terry Parks
MERGE OPERATIONS OF DATA ARRAYS BASED ON SIMD INSTRUCTIONS

Publication number: 20130042092

Abstract: A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations.

Type: Application

Filed: August 31, 2012

Publication date: February 14, 2013

Applicant: SAP Global IP Group

Inventors: Hiroshi INOUE, Moriyoshi Ohara, Hideaki Komatsu
FLOATING POINT EXECUTION UNIT WITH FIXED POINT FUNCTIONALITY

Publication number: 20130036296

Abstract: A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit.

Type: Application

Filed: August 4, 2011

Publication date: February 7, 2013

Applicant: International Business Machines Corporation

Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Running-AND, running-OR, running-XOR, and running-multiply instructions for processing vectors using a base value from a key element of an input vector

Patent number: 8364938

Abstract: In the described embodiments, a processor captures a value from an element at a key element position in a second input vector into a base value. The processor then generates a result vector by, if the predicate vector is received, for each element in the result vector to the right of the key element position for which a corresponding element in the predicate vector is active, otherwise, for each element in the result vector to the right of the key element position, setting the element in the result vector equal to a result from an associative Boolean operation or a multiplication operation for which the inputs are the base value and a value in each relevant element of a first input vector from an element at the key element position to and including a predetermined element in the first input vector.

Type: Grant

Filed: August 14, 2009

Date of Patent: January 29, 2013

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
PROCESSING VECTORS USING WRAPPING PROPAGATE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024672

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors.

Type: Application

Filed: September 28, 2012

Publication date: January 24, 2013

Applicant: APPLE INC.

Inventor: APPLE INC.
PROCESSING VECTORS USING WRAPPING NEGATION INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024671

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector.

Type: Application

Filed: September 27, 2012

Publication date: January 24, 2013

Applicant: Apple Inc.

Inventor: Apple Inc.
PROCESSING VECTORS USING WRAPPING MULTIPLY AND DIVIDE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024670

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.

Type: Application

Filed: September 24, 2012

Publication date: January 24, 2013

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
PROCESSING VECTORS USING WRAPPING SHIFT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024669

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.

Type: Application

Filed: September 24, 2012

Publication date: January 24, 2013

Applicant: APPLE INC.

Inventor: APPLE INC.
Running-sum instructions for processing vectors using a base value from a key element of an input vector

Patent number: 8359460

Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Grant

Filed: August 14, 2009

Date of Patent: January 22, 2013

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
Running-shift instructions for processing vectors using a base value from a key element of an input vector

Patent number: 8359461

Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Grant

Filed: August 14, 2009

Date of Patent: January 22, 2013

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Processor

Publication number: 20130019084

Abstract: Apparatus (100) is provided which is arranged to accept an input data stream. In some embodiments, the apparatus (100) comprises a sampler arranged to sample the input data stream to provide k samples thereof, wherein each of the samples is n bits long and a string selector arranged to select m binary strings n bits long from at least a chosen subset of all random binary strings of a predetermined length. The apparatus (100) may further comprise a logical operator arranged to perform a logical function for each of the k samples with each of the selected binary strings to provide a vector, a memory arranged to store a matrix of the vectors generated from k samples, and an address generator arranged to generate RAM address segments from the matrix. In embodiments, the apparatus (100) may comprise a processor for, for example, pattern matching; feature detection, image recognition.

Type: Application

Filed: September 28, 2010

Publication date: January 17, 2013

Applicant: QINETIQ LIMITED

Inventors: David Arthur Orchard, Rebecca Anne Wilson, Jonathan Alexander Skoyles Pritchard, Martin James Cooper, Terence John Shepherd, Andrew Charles Lewin, Paul Richard Tapster, Charlotte Rachel Helen Bennett
SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS

Publication number: 20130013901

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Application

Filed: June 11, 2012

Publication date: January 10, 2013

Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Apparatus for efficient DCT calculations in a SIMD programmable processor

Patent number: 8352528

Abstract: The present invention relates to a efficient implementation of integer and fractional 8-length or 4-length, or 8×8 or 4×4 DCT in a SIMD processor as part of MPEG and other video compression standards.

Type: Grant

Filed: September 20, 2009

Date of Patent: January 8, 2013

Inventor: Tibet Mimar
PROCESSING VECTORS USING WRAPPING ADD AND SUBTRACT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130007422

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.

Type: Application

Filed: September 11, 2012

Publication date: January 3, 2013

Inventor: Jeffry E. Gonion
Vector SIMD processor

Patent number: 8341204

Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.

Type: Grant

Filed: July 2, 2009

Date of Patent: December 25, 2012

Assignee: Renesas Electronics Corporation

Inventors: Fumio Arakawa, Tetsuya Yamada
Logical map table for detecting dependency conditions between instructions having varying width operand values

Patent number: 8335912

Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.

Type: Grant

Filed: April 22, 2009

Date of Patent: December 18, 2012

Assignee: Oracle America, Inc.

Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
Load/Move Duplicate Instructions for a Processor

Publication number: 20120317401

Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.

Type: Application

Filed: June 12, 2012

Publication date: December 13, 2012

Inventor: Patrice Roussel
Instructions with floating point control override

Patent number: 8327120

Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. In an embodiment, at least one of the one or more floating point operation settings is to cause a modification to one of the one or more default settings during execution of the instruction, wherein the second logic is to perform the floating point operation, at least in part, based on the modified default setting. Other embodiments are also described.

Type: Grant

Filed: December 29, 2007

Date of Patent: December 4, 2012

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan
DSP BLOCK WITH EMBEDDED FLOATING POINT STRUCTURES

Publication number: 20120290819

Abstract: A specialized processing block includes a first floating-point arithmetic operator stage, a second floating-point arithmetic operator stage, and configurable interconnect within the specialized processing block for routing signals into and out of each of the first and second floating-point arithmetic operator stages. In some embodiments, the configurable interconnect may be configurable to route a plurality of block inputs to inputs of the first floating-point arithmetic operator stage, at least one of the block inputs to an input of the second floating-point arithmetic operator stage, output of the first floating-point arithmetic operator stage to an input of the second floating-point arithmetic operator stage, at least one of the block inputs to a direct-connect output to another such block, output of the first floating-point arithmetic operator stage to the direct-connect output, and a direct-connect input from another such block to an input of the second floating-point arithmetic operator stage.

Type: Application

Filed: July 21, 2011

Publication date: November 15, 2012

Applicant: ALTERA CORPORATION

Inventor: Martin Langhammer
Vector Completion Mask Handling

Publication number: 20120272046

Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.

Type: Application

Filed: June 28, 2012

Publication date: October 25, 2012

Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
METHOD AND APPARATUS FOR FAST BRANCH-FREE VECTOR DIVISION COMPUTATION

Publication number: 20120254585

Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.

Type: Application

Filed: December 25, 2009

Publication date: October 4, 2012

Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
VALUE CHECK INSTRUCTION FOR PROCESSING VECTORS

Publication number: 20120239911

Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.

Type: Application

Filed: May 31, 2012

Publication date: September 20, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
CONDITIONAL EXTRACT INSTRUCTION FOR PROCESSING VECTORS

Publication number: 20120239910

Abstract: The described embodiments include a vector processor that executes a ConditionalExtract instruction. In the described embodiments, the processor receives an input scalar variable, an input vector, and a predicate vector, wherein each of the vectors has N elements. The processor then executes the ConditionalExtract instruction, which causes the processor to determine if at least one element in the predicate vector is active. If so, the processor copies a value from a last element in the input vector for which a corresponding element in the predicate vector is active into a scalar result variable. Otherwise, of no elements of the predicate vector are active, the processor copies a value from the input scalar variable into the scalar result variable.

Type: Application

Filed: May 31, 2012

Publication date: September 20, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
RUNNING MULTIPLY-ACCUMULATE INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20120221837

Abstract: The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector.

Type: Application

Filed: May 3, 2012

Publication date: August 30, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT ARITHMETIC OPERATIONS

Publication number: 20120204013

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Application

Filed: December 2, 2011

Publication date: August 9, 2012

Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.

Inventors: Craig HANSEN, John MOUSSOURIS, Alexia MASSALIN
Method and apparatus implementing a minimal area consumption multiple addend floating point summation function in a vector microprocessor

Patent number: 8239439

Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.

Type: Grant

Filed: December 13, 2007

Date of Patent: August 7, 2012

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Matthew R. Tubbs
PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA OPERAND REMAPPING

Publication number: 20120191956

Abstract: Methods and apparatuses are provided for achieving increased processor performance and energy saving via reordering operand mapping as opposed to the actual operand data. The apparatus comprises a plurality of physical registers available for use storing operands and includes a unit capable of mapping logical registers to the plurality of physical registers. A multiplexer then reorders the operands by reordering the mapping of logical registers to the plurality of physical registers, which increases processor performance and energy saving by reordering narrow registers instead of wide registers. The method comprises mapping logical registers storing to physical registers storing operands in a processor and then reordering the mapping to achieve the equivalent of reordering the operands without reordering the operands from the physical registers in the processor.

Type: Application

Filed: January 26, 2011

Publication date: July 26, 2012

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor: Jay FLEISCHMAN
METHOD AND SYSTEM FOR FLOATING POINT ACCELERATION ON FIXED POINT DIGITAL SIGNAL PROCESSORS

Publication number: 20120191955

Abstract: A system for performing floating point operations comprising a floating point multiply function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point add function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point normalize function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor.

Type: Application

Filed: January 20, 2011

Publication date: July 26, 2012

Inventors: Ragnar H. Jonsson, Sveinn V. Grimsson, Vilhjalmur Thorvaldsson, Trausti Thormundsson
PREDICTING A RESULT FOR AN ACTUAL INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS

Publication number: 20120191957

Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.

Type: Application

Filed: April 20, 2011

Publication date: July 26, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
PROCESSOR HAVING INCREASED EFFECTIVE PHYSICAL FILE SIZE VIA REGISTER MAPPING

Publication number: 20120173854

Abstract: Methods and apparatuses are provided for an efficient technique for processing registers having a known value while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a known value. A renaming unit maps the logical register containing the known value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the known value without storing the known value in one of the plurality of physical registers. The method comprises determining that a logical register of a processor has a known value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that the logical register represents the known value.

Type: Application

Filed: December 29, 2010

Publication date: July 5, 2012

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Jay FLEISCHMAN, Debjit Das Sarma, Michael SEDMAK
Graphics processing method and apparatus implementing window system

Patent number: 8203567

Abstract: A graphics processing method and apparatus described herein is capable of converting graphics processing of a window system into a vector-based application program interface (API) format usable in the GPU and performing the converted graphics processing in the GPU. For example, the vector-based API may be based on an OpenVG standard or an EGL standard.

Type: Grant

Filed: July 3, 2009

Date of Patent: June 19, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dong-kyun Jeong, Soo-chan Lim, Na-min Kim
REDUCING POWER CONSUMPTION IN MULTI-PRECISION FLOATING POINT MULTIPLIERS

Publication number: 20120151191

Abstract: Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described.

Type: Application

Filed: December 14, 2010

Publication date: June 14, 2012

Inventors: Brent R. Boswell, Thierry Pons, Tom Aviram
Apparatus and method for performing re-arrangement operations on data

Patent number: 8200948

Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.

Type: Grant

Filed: December 4, 2007

Date of Patent: June 12, 2012

Assignee: ARM Limited

Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
Processor Architecture for Executing Wide Transform Slice Instructions

Publication number: 20120117441

Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Type: Application

Filed: January 19, 2012

Publication date: May 10, 2012

Applicant: MicroUnity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
METHOD, SYSTEM, AND PRODUCT FOR PERFORMING UNIFORMLY FINE-GRAIN DATA PARALLEL COMPUTING

Publication number: 20120096244

Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.

Type: Application

Filed: October 14, 2010

Publication date: April 19, 2012

Inventors: Bin ZHANG, Ren Wu, Meichun Hsu
Replicating opcode to other lanes and modifying argument register to others in vector portion for parallel operation

Patent number: 8161266

Abstract: An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.

Type: Grant

Filed: December 22, 2008

Date of Patent: April 17, 2012

Assignee: STMicroelectronics Inc.

Inventor: Osvaldo M. Colavin
PERFORMING A MULTIPLY-MULTIPLY-ACCUMULATE INSTRUCTION

Publication number: 20120079252

Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Inventor: ERIC S. SPRANGLE
FUNCTIONAL UNIT FOR VECTOR LEADING ZEROES, VECTOR TRAILING ZEROES, VECTOR OPERAND 1s COUNT AND VECTOR PARITY CALCULATION

Publication number: 20120079253

Abstract: A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver, Eric W. Mahurin
VECTOR INDEX INSTRUCTION FOR PROCESSING VECTORS

Publication number: 20120060020

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.

Type: Application

Filed: November 8, 2011

Publication date: March 8, 2012

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
SIMD processor performing fractional multiply operation with saturation history data processing to generate condition code flags

Patent number: 8131981

Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.

Type: Grant

Filed: August 12, 2009

Date of Patent: March 6, 2012

Assignee: Marvell International Ltd.

Inventors: Nigel C. Paver, Bradley C. Aldrich
System and apparatus for group floating-point arithmetic operations

Patent number: 8117426

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Grant

Filed: July 27, 2007

Date of Patent: February 14, 2012

Assignee: Microunity Systems Engineering, Inc

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Handling floating point operations

Patent number: 8108657

Abstract: A computing system capable of handling floating point operations during program code conversion is described, comprising a processor including a floating point unit and an integer unit. The computing system further comprises a translator unit arranged to receive subject code instructions including at least one instruction relating to a floating point operation and in response to generate corresponding target code for execution on said processor. To handle floating point operations a floating point status unit and a floating point control unit are provided within the translator. These units are cause the translator unit to generate either: target code for performing the floating point operations directly on the floating point unit; or target code for performing the floating point operations indirectly, for example using a combination of the integer unit and the floating point unit. In this way the efficiency of the computing system is improved.

Type: Grant

Filed: February 28, 2008

Date of Patent: January 31, 2012

Assignee: International Business Machines Corporation

Inventors: Gavin Barraclough, James Richard Mulcahy, David James Rigby
Efficient parallel floating point exception handling in a processor

Patent number: 8103858

Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.

Type: Grant

Filed: June 30, 2008

Date of Patent: January 24, 2012

Assignee: Intel Corporation

Inventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Arnit Gradstein, Guy Bale, Thierry Pons
Matrix Multiplication Operations Using Pair-Wise Load and Splat Operations

Publication number: 20120011348

Abstract: Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.

Type: Application

Filed: July 12, 2010

Publication date: January 12, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, Valentina Salapura
MEMORY SAFETY OF FLOATING-POINT COMPUTATIONS

Publication number: 20110314251

Abstract: Concepts and technologies are described herein for determining memory safety of floating-point computations. The concepts and technologies described herein analyze code to determine if any floating-point computations exist in the code, and if so, if the floating-point computations are memory safe. The analysis can include identifying floating-point instructions and conditional statements in the code. The code can be symbolically executed, and behavior of the floating-point instructions and the conditional statements can be monitored to determine if a floating point calculation is ever involved in computation of any memory address during the execution of the code.

Type: Application

Filed: June 17, 2010

Publication date: December 22, 2011

Applicant: Microsoft Corporation

Inventors: Patrice Godefroid, Johannes Kinder
SYSTEM AND METHOD FOR PROCESSING REGULAR EXPRESSIONS USING SIMD AND PARALLEL STREAMS

Publication number: 20110302394

Abstract: A system and method for performing regular expression computations includes loading a plurality of input values corresponding to one or more input streams as elements of a vector register implemented on programmable storage media. New state indexes are computed using the input values, and current state values corresponding to different automata by using single instruction, multiple data (SIMD) vector operations. New state values associated with the different automata are determined using the new state indexes to look up new state values such that state transitions for a plurality of regular expressions are processed concurrently.

Type: Application

Filed: June 8, 2010

Publication date: December 8, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: GREGORY F. RUSSELL, Valentina Salapura, Daniele P. Scarpazza
Providing extended precision in SIMD vector arithmetic operations

Patent number: 8074058

Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.

Type: Grant

Filed: June 8, 2009

Date of Patent: December 6, 2011

Assignee: MIPS Technologies, Inc.

Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian

prev 1 2 3 4 5 6 7 8 9 … next