Floating Point Or Vector Patents (Class 712/222)
  • Publication number: 20130067203
    Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.
    Type: Application
    Filed: September 14, 2012
    Publication date: March 14, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
  • Patent number: 8392692
    Abstract: In one embodiment, the present invention determines index values corresponding to bits of a binary vector that have a value of 1. During each clock cycle, a masking technique is applied to M sub-vector index values, where each sub-vector index value corresponds to a different bit of a sub-vector of the binary vector. The masking technique is applied such that (i) the sub-vector index values that correspond to bits having a value of 0 are zeroed out and (ii) the sub-vector index values that correspond to the bits having a value of 1 are left unchanged. The masked sub-vector index values are sorted, and index values are calculated based on the masked sub-vector index values. The index values generated are then distributed uniformly to a number M of index memories such that the M index memories store substantially the same number of index values.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: March 5, 2013
    Assignee: LSI Corporation
    Inventor: Kiran Gunnam
  • Patent number: 8386756
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Grant
    Filed: April 11, 2011
    Date of Patent: February 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
  • Patent number: 8386755
    Abstract: A microprocessor executes an instruction specifying a floating-point input operand having a predetermined size and that instructs the microprocessor to round the floating-point input operand to an integer value using a rounding mode and to return a floating-point result having the same predetermined size. An instruction translator translates the instruction into first and second microinstructions. An execution unit executes the first and second microinstructions. The first microinstruction receives as an input operand the instruction floating-point input operand and generates an intermediate result from the input operand. The second microinstruction receives as an input operand the intermediate result of the first microinstruction and generates the floating-point result of the instruction from the intermediate result. The intermediate result is the same predetermined size as the instruction floating-point input operand.
    Type: Grant
    Filed: May 20, 2010
    Date of Patent: February 26, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: Tom Elmer, Terry Parks
  • Publication number: 20130042092
    Abstract: A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations.
    Type: Application
    Filed: August 31, 2012
    Publication date: February 14, 2013
    Applicant: SAP Global IP Group
    Inventors: Hiroshi INOUE, Moriyoshi Ohara, Hideaki Komatsu
  • Publication number: 20130036296
    Abstract: A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit.
    Type: Application
    Filed: August 4, 2011
    Publication date: February 7, 2013
    Applicant: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 8364938
    Abstract: In the described embodiments, a processor captures a value from an element at a key element position in a second input vector into a base value. The processor then generates a result vector by, if the predicate vector is received, for each element in the result vector to the right of the key element position for which a corresponding element in the predicate vector is active, otherwise, for each element in the result vector to the right of the key element position, setting the element in the result vector equal to a result from an associative Boolean operation or a multiplication operation for which the inputs are the base value and a value in each relevant element of a first input vector from an element at the key element position to and including a predetermined element in the first input vector.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: January 29, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Publication number: 20130024672
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors.
    Type: Application
    Filed: September 28, 2012
    Publication date: January 24, 2013
    Applicant: APPLE INC.
    Inventor: APPLE INC.
  • Publication number: 20130024671
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 27, 2012
    Publication date: January 24, 2013
    Applicant: Apple Inc.
    Inventor: Apple Inc.
  • Publication number: 20130024670
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 24, 2012
    Publication date: January 24, 2013
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130024669
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 24, 2012
    Publication date: January 24, 2013
    Applicant: APPLE INC.
    Inventor: APPLE INC.
  • Patent number: 8359460
    Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: January 22, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Patent number: 8359461
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: January 22, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20130019084
    Abstract: Apparatus (100) is provided which is arranged to accept an input data stream. In some embodiments, the apparatus (100) comprises a sampler arranged to sample the input data stream to provide k samples thereof, wherein each of the samples is n bits long and a string selector arranged to select m binary strings n bits long from at least a chosen subset of all random binary strings of a predetermined length. The apparatus (100) may further comprise a logical operator arranged to perform a logical function for each of the k samples with each of the selected binary strings to provide a vector, a memory arranged to store a matrix of the vectors generated from k samples, and an address generator arranged to generate RAM address segments from the matrix. In embodiments, the apparatus (100) may comprise a processor for, for example, pattern matching; feature detection, image recognition.
    Type: Application
    Filed: September 28, 2010
    Publication date: January 17, 2013
    Applicant: QINETIQ LIMITED
    Inventors: David Arthur Orchard, Rebecca Anne Wilson, Jonathan Alexander Skoyles Pritchard, Martin James Cooper, Terence John Shepherd, Andrew Charles Lewin, Paul Richard Tapster, Charlotte Rachel Helen Bennett
  • Publication number: 20130013901
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Application
    Filed: June 11, 2012
    Publication date: January 10, 2013
    Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 8352528
    Abstract: The present invention relates to a efficient implementation of integer and fractional 8-length or 4-length, or 8×8 or 4×4 DCT in a SIMD processor as part of MPEG and other video compression standards.
    Type: Grant
    Filed: September 20, 2009
    Date of Patent: January 8, 2013
    Inventor: Tibet Mimar
  • Publication number: 20130007422
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Inventor: Jeffry E. Gonion
  • Patent number: 8341204
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Grant
    Filed: July 2, 2009
    Date of Patent: December 25, 2012
    Assignee: Renesas Electronics Corporation
    Inventors: Fumio Arakawa, Tetsuya Yamada
  • Patent number: 8335912
    Abstract: Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.
    Type: Grant
    Filed: April 22, 2009
    Date of Patent: December 18, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Jama I. Barreh, Jeffrey S. Brooks, Howard L. Levy
  • Publication number: 20120317401
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Application
    Filed: June 12, 2012
    Publication date: December 13, 2012
    Inventor: Patrice Roussel
  • Patent number: 8327120
    Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. In an embodiment, at least one of the one or more floating point operation settings is to cause a modification to one of the one or more default settings during execution of the instruction, wherein the second logic is to perform the floating point operation, at least in part, based on the modified default setting. Other embodiments are also described.
    Type: Grant
    Filed: December 29, 2007
    Date of Patent: December 4, 2012
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan
  • Publication number: 20120290819
    Abstract: A specialized processing block includes a first floating-point arithmetic operator stage, a second floating-point arithmetic operator stage, and configurable interconnect within the specialized processing block for routing signals into and out of each of the first and second floating-point arithmetic operator stages. In some embodiments, the configurable interconnect may be configurable to route a plurality of block inputs to inputs of the first floating-point arithmetic operator stage, at least one of the block inputs to an input of the second floating-point arithmetic operator stage, output of the first floating-point arithmetic operator stage to an input of the second floating-point arithmetic operator stage, at least one of the block inputs to a direct-connect output to another such block, output of the first floating-point arithmetic operator stage to the direct-connect output, and a direct-connect input from another such block to an input of the second floating-point arithmetic operator stage.
    Type: Application
    Filed: July 21, 2011
    Publication date: November 15, 2012
    Applicant: ALTERA CORPORATION
    Inventor: Martin Langhammer
  • Publication number: 20120272046
    Abstract: Techniques for vector completion mask (VCM) handling are provided. A data structure includes a mask field for each operand of a particular operation. A processor attempts to execute the operation with multiple operands, which are identified in the data structure by the mask fields. If operands are successfully retrieved for execution with the operation, then the corresponding mask field within the data structure is cleared. The processor can reset if any field remains set within the data structure and can re-process the operation with operands that were not previously handled with the operation.
    Type: Application
    Filed: June 28, 2012
    Publication date: October 25, 2012
    Inventors: Stephan Jourdan, Michael Fetterman, Michael Cornaby, Per Hammarlund, Ronak Signhal, Glenn Hinton
  • Publication number: 20120254585
    Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.
    Type: Application
    Filed: December 25, 2009
    Publication date: October 4, 2012
    Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
  • Publication number: 20120239911
    Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.
    Type: Application
    Filed: May 31, 2012
    Publication date: September 20, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20120239910
    Abstract: The described embodiments include a vector processor that executes a ConditionalExtract instruction. In the described embodiments, the processor receives an input scalar variable, an input vector, and a predicate vector, wherein each of the vectors has N elements. The processor then executes the ConditionalExtract instruction, which causes the processor to determine if at least one element in the predicate vector is active. If so, the processor copies a value from a last element in the input vector for which a corresponding element in the predicate vector is active into a scalar result variable. Otherwise, of no elements of the predicate vector are active, the processor copies a value from the input scalar variable into the scalar result variable.
    Type: Application
    Filed: May 31, 2012
    Publication date: September 20, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20120221837
    Abstract: The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector.
    Type: Application
    Filed: May 3, 2012
    Publication date: August 30, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20120204013
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Application
    Filed: December 2, 2011
    Publication date: August 9, 2012
    Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.
    Inventors: Craig HANSEN, John MOUSSOURIS, Alexia MASSALIN
  • Patent number: 8239439
    Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.
    Type: Grant
    Filed: December 13, 2007
    Date of Patent: August 7, 2012
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Matthew R. Tubbs
  • Publication number: 20120191956
    Abstract: Methods and apparatuses are provided for achieving increased processor performance and energy saving via reordering operand mapping as opposed to the actual operand data. The apparatus comprises a plurality of physical registers available for use storing operands and includes a unit capable of mapping logical registers to the plurality of physical registers. A multiplexer then reorders the operands by reordering the mapping of logical registers to the plurality of physical registers, which increases processor performance and energy saving by reordering narrow registers instead of wide registers. The method comprises mapping logical registers storing to physical registers storing operands in a processor and then reordering the mapping to achieve the equivalent of reordering the operands without reordering the operands from the physical registers in the processor.
    Type: Application
    Filed: January 26, 2011
    Publication date: July 26, 2012
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventor: Jay FLEISCHMAN
  • Publication number: 20120191955
    Abstract: A system for performing floating point operations comprising a floating point multiply function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point add function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor. A floating point normalize function that utilizes one or more fixed point functional blocks of a processor and one or more dedicated floating point functional blocks of the processor.
    Type: Application
    Filed: January 20, 2011
    Publication date: July 26, 2012
    Inventors: Ragnar H. Jonsson, Sveinn V. Grimsson, Vilhjalmur Thorvaldsson, Trausti Thormundsson
  • Publication number: 20120191957
    Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.
    Type: Application
    Filed: April 20, 2011
    Publication date: July 26, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20120173854
    Abstract: Methods and apparatuses are provided for an efficient technique for processing registers having a known value while improving processor performance. The apparatus comprises a processor having a plurality of physical registers available for use in computations and a decoder for determining that a logical register contains a known value. A renaming unit maps the logical register containing the known value to an address outside an address range for the plurality of physical registers once the known value is determined. Thereafter, scheduling and execution units perform computations using the known value without storing the known value in one of the plurality of physical registers. The method comprises determining that a logical register of a processor has a known value and then mapping that logical register to a physical register address outside an expected range of physical register addresses; which indicates that the logical register represents the known value.
    Type: Application
    Filed: December 29, 2010
    Publication date: July 5, 2012
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Jay FLEISCHMAN, Debjit Das Sarma, Michael SEDMAK
  • Patent number: 8203567
    Abstract: A graphics processing method and apparatus described herein is capable of converting graphics processing of a window system into a vector-based application program interface (API) format usable in the GPU and performing the converted graphics processing in the GPU. For example, the vector-based API may be based on an OpenVG standard or an EGL standard.
    Type: Grant
    Filed: July 3, 2009
    Date of Patent: June 19, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-kyun Jeong, Soo-chan Lim, Na-min Kim
  • Publication number: 20120151191
    Abstract: Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described.
    Type: Application
    Filed: December 14, 2010
    Publication date: June 14, 2012
    Inventors: Brent R. Boswell, Thierry Pons, Tom Aviram
  • Patent number: 8200948
    Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.
    Type: Grant
    Filed: December 4, 2007
    Date of Patent: June 12, 2012
    Assignee: ARM Limited
    Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
  • Publication number: 20120117441
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Application
    Filed: January 19, 2012
    Publication date: May 10, 2012
    Applicant: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20120096244
    Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.
    Type: Application
    Filed: October 14, 2010
    Publication date: April 19, 2012
    Inventors: Bin ZHANG, Ren Wu, Meichun Hsu
  • Patent number: 8161266
    Abstract: An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: April 17, 2012
    Assignee: STMicroelectronics Inc.
    Inventor: Osvaldo M. Colavin
  • Publication number: 20120079252
    Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventor: ERIC S. SPRANGLE
  • Publication number: 20120079253
    Abstract: A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver, Eric W. Mahurin
  • Publication number: 20120060020
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector.
    Type: Application
    Filed: November 8, 2011
    Publication date: March 8, 2012
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8131981
    Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.
    Type: Grant
    Filed: August 12, 2009
    Date of Patent: March 6, 2012
    Assignee: Marvell International Ltd.
    Inventors: Nigel C. Paver, Bradley C. Aldrich
  • Patent number: 8117426
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: February 14, 2012
    Assignee: Microunity Systems Engineering, Inc
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 8108657
    Abstract: A computing system capable of handling floating point operations during program code conversion is described, comprising a processor including a floating point unit and an integer unit. The computing system further comprises a translator unit arranged to receive subject code instructions including at least one instruction relating to a floating point operation and in response to generate corresponding target code for execution on said processor. To handle floating point operations a floating point status unit and a floating point control unit are provided within the translator. These units are cause the translator unit to generate either: target code for performing the floating point operations directly on the floating point unit; or target code for performing the floating point operations indirectly, for example using a combination of the integer unit and the floating point unit. In this way the efficiency of the computing system is improved.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: January 31, 2012
    Assignee: International Business Machines Corporation
    Inventors: Gavin Barraclough, James Richard Mulcahy, David James Rigby
  • Patent number: 8103858
    Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: January 24, 2012
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Arnit Gradstein, Guy Bale, Thierry Pons
  • Publication number: 20120011348
    Abstract: Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.
    Type: Application
    Filed: July 12, 2010
    Publication date: January 12, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels, Valentina Salapura
  • Publication number: 20110314251
    Abstract: Concepts and technologies are described herein for determining memory safety of floating-point computations. The concepts and technologies described herein analyze code to determine if any floating-point computations exist in the code, and if so, if the floating-point computations are memory safe. The analysis can include identifying floating-point instructions and conditional statements in the code. The code can be symbolically executed, and behavior of the floating-point instructions and the conditional statements can be monitored to determine if a floating point calculation is ever involved in computation of any memory address during the execution of the code.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Patrice Godefroid, Johannes Kinder
  • Publication number: 20110302394
    Abstract: A system and method for performing regular expression computations includes loading a plurality of input values corresponding to one or more input streams as elements of a vector register implemented on programmable storage media. New state indexes are computed using the input values, and current state values corresponding to different automata by using single instruction, multiple data (SIMD) vector operations. New state values associated with the different automata are determined using the new state indexes to look up new state values such that state transitions for a plurality of regular expressions are processed concurrently.
    Type: Application
    Filed: June 8, 2010
    Publication date: December 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: GREGORY F. RUSSELL, Valentina Salapura, Daniele P. Scarpazza
  • Patent number: 8074058
    Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
    Type: Grant
    Filed: June 8, 2009
    Date of Patent: December 6, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian