Floating Point Or Vector Patents (Class 712/222)
-
Publication number: 20130067203Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.Type: ApplicationFiled: September 14, 2012Publication date: March 14, 2013Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
-
Patent number: 8392692Abstract: In one embodiment, the present invention determines index values corresponding to bits of a binary vector that have a value of 1. During each clock cycle, a masking technique is applied to M sub-vector index values, where each sub-vector index value corresponds to a different bit of a sub-vector of the binary vector. The masking technique is applied such that (i) the sub-vector index values that correspond to bits having a value of 0 are zeroed out and (ii) the sub-vector index values that correspond to the bits having a value of 1 are left unchanged. The masked sub-vector index values are sorted, and index values are calculated based on the masked sub-vector index values. The index values generated are then distributed uniformly to a number M of index memories such that the M index memories store substantially the same number of index values.Type: GrantFiled: December 12, 2008Date of Patent: March 5, 2013Assignee: LSI CorporationInventor: Kiran Gunnam
-
Patent number: 8386756Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.Type: GrantFiled: April 11, 2011Date of Patent: February 26, 2013Assignee: International Business Machines CorporationInventors: Eric M. Schwarz, Ronald M. Smith, Sr.
-
Patent number: 8386755Abstract: A microprocessor executes an instruction specifying a floating-point input operand having a predetermined size and that instructs the microprocessor to round the floating-point input operand to an integer value using a rounding mode and to return a floating-point result having the same predetermined size. An instruction translator translates the instruction into first and second microinstructions. An execution unit executes the first and second microinstructions. The first microinstruction receives as an input operand the instruction floating-point input operand and generates an intermediate result from the input operand. The second microinstruction receives as an input operand the intermediate result of the first microinstruction and generates the floating-point result of the instruction from the intermediate result. The intermediate result is the same predetermined size as the instruction floating-point input operand.Type: GrantFiled: May 20, 2010Date of Patent: February 26, 2013Assignee: VIA Technologies, Inc.Inventors: Tom Elmer, Terry Parks
-
Publication number: 20130042092Abstract: A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations.Type: ApplicationFiled: August 31, 2012Publication date: February 14, 2013Applicant: SAP Global IP GroupInventors: Hiroshi INOUE, Moriyoshi Ohara, Hideaki Komatsu
-
Publication number: 20130036296Abstract: A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit.Type: ApplicationFiled: August 4, 2011Publication date: February 7, 2013Applicant: International Business Machines CorporationInventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
-
Patent number: 8364938Abstract: In the described embodiments, a processor captures a value from an element at a key element position in a second input vector into a base value. The processor then generates a result vector by, if the predicate vector is received, for each element in the result vector to the right of the key element position for which a corresponding element in the predicate vector is active, otherwise, for each element in the result vector to the right of the key element position, setting the element in the result vector equal to a result from an associative Boolean operation or a multiplication operation for which the inputs are the base value and a value in each relevant element of a first input vector from an element at the key element position to and including a predetermined element in the first input vector.Type: GrantFiled: August 14, 2009Date of Patent: January 29, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
-
Publication number: 20130024672Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors.Type: ApplicationFiled: September 28, 2012Publication date: January 24, 2013Applicant: APPLE INC.Inventor: APPLE INC.
-
Publication number: 20130024671Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector.Type: ApplicationFiled: September 27, 2012Publication date: January 24, 2013Applicant: Apple Inc.Inventor: Apple Inc.
-
Publication number: 20130024670Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.Type: ApplicationFiled: September 24, 2012Publication date: January 24, 2013Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20130024669Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.Type: ApplicationFiled: September 24, 2012Publication date: January 24, 2013Applicant: APPLE INC.Inventor: APPLE INC.
-
Patent number: 8359460Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: GrantFiled: August 14, 2009Date of Patent: January 22, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
-
Patent number: 8359461Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: GrantFiled: August 14, 2009Date of Patent: January 22, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20130019084Abstract: Apparatus (100) is provided which is arranged to accept an input data stream. In some embodiments, the apparatus (100) comprises a sampler arranged to sample the input data stream to provide k samples thereof, wherein each of the samples is n bits long and a string selector arranged to select m binary strings n bits long from at least a chosen subset of all random binary strings of a predetermined length. The apparatus (100) may further comprise a logical operator arranged to perform a logical function for each of the k samples with each of the selected binary strings to provide a vector, a memory arranged to store a matrix of the vectors generated from k samples, and an address generator arranged to generate RAM address segments from the matrix. In embodiments, the apparatus (100) may comprise a processor for, for example, pattern matching; feature detection, image recognition.Type: ApplicationFiled: September 28, 2010Publication date: January 17, 2013Applicant: QINETIQ LIMITEDInventors: David Arthur Orchard, Rebecca Anne Wilson, Jonathan Alexander Skoyles Pritchard, Martin James Cooper, Terence John Shepherd, Andrew Charles Lewin, Paul Richard Tapster, Charlotte Rachel Helen Bennett
-
Publication number: 20130013901Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.Type: ApplicationFiled: June 11, 2012Publication date: January 10, 2013Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 8352528Abstract: The present invention relates to a efficient implementation of integer and fractional 8-length or 4-length, or 8×8 or 4×4 DCT in a SIMD processor as part of MPEG and other video compression standards.Type: GrantFiled: September 20, 2009Date of Patent: January 8, 2013Inventor: Tibet Mimar
-
Publication number: 20130007422Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.Type: ApplicationFiled: September 11, 2012Publication date: January 3, 2013Inventor: Jeffry E. Gonion
-
Patent number: 8341204Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: GrantFiled: July 2, 2009Date of Patent: December 25, 2012Assignee: Renesas Electronics CorporationInventors: Fumio Arakawa, Tetsuya Yamada
-
Patent number: 8335912Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.Type: GrantFiled: April 22, 2009Date of Patent: December 18, 2012Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
-
Publication number: 20120317401Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: ApplicationFiled: June 12, 2012Publication date: December 13, 2012Inventor: Patrice Roussel
-
Patent number: 8327120Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. In an embodiment, at least one of the one or more floating point operation settings is to cause a modification to one of the one or more default settings during execution of the instruction, wherein the second logic is to perform the floating point operation, at least in part, based on the modified default setting. Other embodiments are also described.Type: GrantFiled: December 29, 2007Date of Patent: December 4, 2012Assignee: Intel CorporationInventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan
-
Publication number: 20120290819Abstract: A specialized processing block includes a first floating-point arithmetic operator stage, a second floating-point arithmetic operator stage, and configurable interconnect within the specialized processing block for routing signals into and out of each of the first and second floating-point arithmetic operator stages. In some embodiments, the configurable interconnect may be configurable to route a plurality of block inputs to inputs of the first floating-point arithmetic operator stage, at least one of the block inputs to an input of the second floating-point arithmetic operator stage, output of the first floating-point arithmetic operator stage to an input of the second floating-point arithmetic operator stage, at least one of the block inputs to a direct-connect output to another such block, output of the first floating-point arithmetic operator stage to the direct-connect output, and a direct-connect input from another such block to an input of the second floating-point arithmetic operator stage.Type: ApplicationFiled: July 21, 2011Publication date: November 15, 2012Applicant: ALTERA CORPORATIONInventor: Martin Langhammer
-
Publication number: 20120272046Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.Type: ApplicationFiled: June 28, 2012Publication date: October 25, 2012Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
-
Publication number: 20120254585Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.Type: ApplicationFiled: December 25, 2009Publication date: October 4, 2012Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
-
Publication number: 20120239911Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.Type: ApplicationFiled: May 31, 2012Publication date: September 20, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120239910Abstract: The described embodiments include a vector processor that executes a ConditionalExtract instruction. In the described embodiments, the processor receives an input scalar variable, an input vector, and a predicate vector, wherein each of the vectors has N elements. The processor then executes the ConditionalExtract instruction, which causes the processor to determine if at least one element in the predicate vector is active. If so, the processor copies a value from a last element in the input vector for which a corresponding element in the predicate vector is active into a scalar result variable. Otherwise, of no elements of the predicate vector are active, the processor copies a value from the input scalar variable into the scalar result variable.Type: ApplicationFiled: May 31, 2012Publication date: September 20, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120221837Abstract: The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector.Type: ApplicationFiled: May 3, 2012Publication date: August 30, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120204013Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.Type: ApplicationFiled: December 2, 2011Publication date: August 9, 2012Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.Inventors: Craig HANSEN, John MOUSSOURIS, Alexia MASSALIN
-
Patent number: 8239439Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.Type: GrantFiled: December 13, 2007Date of Patent: August 7, 2012Assignee: International Business Machines CorporationInventors: Adam J. Muff, Matthew R. Tubbs
-
Publication number: 20120191956Abstract: Methods and apparatuses are provided for achieving increased processor performance and energy saving via reordering operand mapping as opposed to the actual operand data. The apparatus comprises a plurality of physical registers available for use storing operands and includes a unit capable of mapping logical registers to the plurality of physical registers. A multiplexer then reorders the operands by reordering the mapping of logical registers to the plurality of physical registers, which increases processor performance and energy saving by reordering narrow registers instead of wide registers. The method comprises mapping logical registers storing to physical registers storing operands in a processor and then reordering the mapping to achieve the equivalent of reordering the operands without reordering the operands from the physical registers in the processor.Type: ApplicationFiled: January 26, 2011Publication date: July 26, 2012Applicant: ADVANCED MICRO DEVICES, INC.Inventor: Jay FLEISCHMAN
-
Publication number: 20120191955Abstract: A system for performing floating point operations comprising a floating point multiply function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point add function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point normalize function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor.Type: ApplicationFiled: January 20, 2011Publication date: July 26, 2012Inventors: Ragnar H. Jonsson, Sveinn V. Grimsson, Vilhjalmur Thorvaldsson, Trausti Thormundsson
-
Publication number: 20120191957Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.Type: ApplicationFiled: April 20, 2011Publication date: July 26, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120173854Abstract: Methods and apparatuses are provided for an efficient technique for processing registers having a known value while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a known value. A renaming unit maps the logical register containing the known value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the known value without storing the known value in one of the plurality of physical registers. The method comprises determining that a logical register of a processor has a known value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that the logical register represents the known value.Type: ApplicationFiled: December 29, 2010Publication date: July 5, 2012Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Jay FLEISCHMAN, Debjit Das Sarma, Michael SEDMAK
-
Patent number: 8203567Abstract: A graphics processing method and apparatus described herein is capable of converting graphics processing of a window system into a vector-based application program interface (API) format usable in the GPU and performing the converted graphics processing in the GPU. For example, the vector-based API may be based on an OpenVG standard or an EGL standard.Type: GrantFiled: July 3, 2009Date of Patent: June 19, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Dong-kyun Jeong, Soo-chan Lim, Na-min Kim
-
Publication number: 20120151191Abstract: Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described.Type: ApplicationFiled: December 14, 2010Publication date: June 14, 2012Inventors: Brent R. Boswell, Thierry Pons, Tom Aviram
-
Patent number: 8200948Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.Type: GrantFiled: December 4, 2007Date of Patent: June 12, 2012Assignee: ARM LimitedInventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
-
Publication number: 20120117441Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: ApplicationFiled: January 19, 2012Publication date: May 10, 2012Applicant: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Publication number: 20120096244Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.Type: ApplicationFiled: October 14, 2010Publication date: April 19, 2012Inventors: Bin ZHANG, Ren Wu, Meichun Hsu
-
Patent number: 8161266Abstract: An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.Type: GrantFiled: December 22, 2008Date of Patent: April 17, 2012Assignee: STMicroelectronics Inc.Inventor: Osvaldo M. Colavin
-
Publication number: 20120079252Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Inventor: ERIC S. SPRANGLE
-
Publication number: 20120079253Abstract: A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver, Eric W. Mahurin
-
Publication number: 20120060020Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.Type: ApplicationFiled: November 8, 2011Publication date: March 8, 2012Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Patent number: 8131981Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.Type: GrantFiled: August 12, 2009Date of Patent: March 6, 2012Assignee: Marvell International Ltd.Inventors: Nigel C. Paver, Bradley C. Aldrich
-
Patent number: 8117426Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.Type: GrantFiled: July 27, 2007Date of Patent: February 14, 2012Assignee: Microunity Systems Engineering, IncInventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 8108657Abstract: A computing system capable of handling floating point operations during program code conversion is described, comprising a processor including a floating point unit and an integer unit. The computing system further comprises a translator unit arranged to receive subject code instructions including at least one instruction relating to a floating point operation and in response to generate corresponding target code for execution on said processor. To handle floating point operations a floating point status unit and a floating point control unit are provided within the translator. These units are cause the translator unit to generate either: target code for performing the floating point operations directly on the floating point unit; or target code for performing the floating point operations indirectly, for example using a combination of the integer unit and the floating point unit. In this way the efficiency of the computing system is improved.Type: GrantFiled: February 28, 2008Date of Patent: January 31, 2012Assignee: International Business Machines CorporationInventors: Gavin Barraclough, James Richard Mulcahy, David James Rigby
-
Patent number: 8103858Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.Type: GrantFiled: June 30, 2008Date of Patent: January 24, 2012Assignee: Intel CorporationInventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Arnit Gradstein, Guy Bale, Thierry Pons
-
Publication number: 20120011348Abstract: Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.Type: ApplicationFiled: July 12, 2010Publication date: January 12, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, Valentina Salapura
-
Publication number: 20110314251Abstract: Concepts and technologies are described herein for determining memory safety of floating-point computations. The concepts and technologies described herein analyze code to determine if any floating-point computations exist in the code, and if so, if the floating-point computations are memory safe. The analysis can include identifying floating-point instructions and conditional statements in the code. The code can be symbolically executed, and behavior of the floating-point instructions and the conditional statements can be monitored to determine if a floating point calculation is ever involved in computation of any memory address during the execution of the code.Type: ApplicationFiled: June 17, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Patrice Godefroid, Johannes Kinder
-
Publication number: 20110302394Abstract: A system and method for performing regular expression computations includes loading a plurality of input values corresponding to one or more input streams as elements of a vector register implemented on programmable storage media. New state indexes are computed using the input values, and current state values corresponding to different automata by using single instruction, multiple data (SIMD) vector operations. New state values associated with the different automata are determined using the new state indexes to look up new state values such that state transitions for a plurality of regular expressions are processed concurrently.Type: ApplicationFiled: June 8, 2010Publication date: December 8, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: GREGORY F. RUSSELL, Valentina Salapura, Daniele P. Scarpazza
-
Patent number: 8074058Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.Type: GrantFiled: June 8, 2009Date of Patent: December 6, 2011Assignee: MIPS Technologies, Inc.Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian