Floating Point Or Vector Patents (Class 712/222)
  • Publication number: 20100268920
    Abstract: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.
    Type: Application
    Filed: April 16, 2009
    Publication date: October 21, 2010
    Inventors: Jeffrey S. Brooks, Paul J. Jordan, Christopher H. Olson
  • Patent number: 7818539
    Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
    Type: Grant
    Filed: August 28, 2006
    Date of Patent: October 19, 2010
    Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology
    Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
  • Patent number: 7818548
    Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, and (c) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: October 19, 2010
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 7818540
    Abstract: A vector processing system for executing vector instructions, each instruction defining multiple value pairs, an operation to be executed and a modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and, when selected, to implement an operation on said value pair to generate a result, each processing unit comprising at least one flag and being selectable in dependence on a condition defined by said at least one flag, wherein the modifier defines the condition under which the parallel processing unit is individually selected.
    Type: Grant
    Filed: May 19, 2006
    Date of Patent: October 19, 2010
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 7814303
    Abstract: Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence.
    Type: Grant
    Filed: October 23, 2008
    Date of Patent: October 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20100257342
    Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.
    Type: Application
    Filed: June 17, 2010
    Publication date: October 7, 2010
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Chun Gi LYUH, Yil Suk YANG, Se Wan HEO, Soon Il YEO, Tae Moon ROH, Jong Dae KIM, Ki Chul KIM, Se Hoon YOO
  • Patent number: 7788469
    Abstract: A hardware accelerator is used to execute a floating-point byte-code in an information processing device. For a floating-point byte-code, a byte-code accelerator BCA feeds an instruction stream for using a FPU to a CPU. When the FPU is used, first the data is transferred to the FPU register from a general-purpose register, and then an FPU operation is performed. For data, such as a denormalized number, that cannot be processed by the FPU, in order to call a floating-point math library of software, the processing of the BCA is completed and the processing moves to processing by software. In order to realize this, data on a data transfer bus from the CPU to the FPU is snooped by the hardware accelerator, and a cancel request is signaled to the CPU to inhibit execution of the FPU operation when corresponding data is detected in a data checking part.
    Type: Grant
    Filed: July 6, 2004
    Date of Patent: August 31, 2010
    Assignee: Renesas Technology Corp.
    Inventors: Tetsuya Yamada, Naohiko Irie, Takahiro Irita, Masayuki Kabasawa
  • Patent number: 7788471
    Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: August 31, 2010
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Chengke Sheng
  • Publication number: 20100218194
    Abstract: A method for controlling a data processing system, a data processing system executing a similar method, and a computer readable medium with instructions for a similar method. The method includes receiving, by an operating system executing on a data processing system, an execution request from an application, the execution request including at least one resource-defining attribute corresponding to an execution thread of the application. The method also includes allocating processor resources to the execution thread by the operating system according to the at least one resource-defining attribute, and allowing execution of the execution thread on the data processing system according to the allocated processor resources.
    Type: Application
    Filed: August 27, 2009
    Publication date: August 26, 2010
    Applicant: Siemens Product Lifecycle Management Software Inc.
    Inventors: John Gordon Dallman, Peter Philip Lonsdale Nanson
  • Publication number: 20100211761
    Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. A vector math instruction decoded by the instruction decode unit causes input vectors and output vectors to be aligned with a maximum boundary of the register set and causes parallel operations by the work units.
    Type: Application
    Filed: February 18, 2010
    Publication date: August 19, 2010
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Udayan DASGUPTA
  • Patent number: 7779237
    Abstract: A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency.
    Type: Grant
    Filed: July 11, 2007
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventors: Anthony Correale, Jr., Kenichi Tsuchiya
  • Publication number: 20100205411
    Abstract: A result processor access a result table for an entry associated with a predetermined sub-expression of a regular expression in response to a finite state machine finding the predetermined sub-expression in the input stream. The result processor executes an instruction associated with the entry, the instruction including one or more operations to be performed on one or more bits in a bit vector register, and determines as a function of the one or more bits in the bit vector register whether the complex regular expression has been found in the input stream.
    Type: Application
    Filed: February 11, 2009
    Publication date: August 12, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Jan van Lunteren
  • Patent number: 7769981
    Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: August 3, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
  • Publication number: 20100191939
    Abstract: A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product.
    Type: Application
    Filed: January 27, 2009
    Publication date: July 29, 2010
    Applicant: International Business Machines Corporation
    Inventors: Adam J. Muff, Matthew R. Tubbs
  • Patent number: 7765389
    Abstract: Exception handling is simulated. An exception simulator is employed to simulate exceptions generated from routines simulating operations. The exception simulator provides an indication of the exception and invokes an interruption, when appropriate. The exception simulator includes an instruction invoked to handle the exception and any interruption.
    Type: Grant
    Filed: April 25, 2007
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: Shawn D. Lundvall, Ronald M. Smith, Sr., Phil C. Yeh
  • Patent number: 7765386
    Abstract: An embodiment of the present invention is a technique to perform floating-point operations for vector processing. An input queue captures a plurality of vector inputs. A scheduler dispatches the vector inputs. A plurality of floating-point (FP) pipelines generates FP results from operating on scalar components of the vector inputs dispatched from the scheduler. An arbiter and assembly unit arbitrates use of output section and assembles the FP results to write to the output section.
    Type: Grant
    Filed: September 28, 2005
    Date of Patent: July 27, 2010
    Assignee: Intel Corporation
    Inventors: David D. Donofrio, Michael Dwyer
  • Publication number: 20100174891
    Abstract: In a reconfigurable SIMD processor, a unit of operation for executing an instruction corresponds to one group, and the one group that includes a plurality of PEs implements at least a part of an operation unit that executes at least one of an integer divide instruction: a floating decimal point add/subtract instruction; a floating decimal point multiply instruction; and a floating decimal point divide instruction, using operation units and general purpose registers provided in a plurality of the PEs. The number of the PEs that compose the one group is varied in accordance with the instruction.
    Type: Application
    Filed: March 27, 2008
    Publication date: July 8, 2010
    Inventor: Shohei Nomoto
  • Patent number: 7747842
    Abstract: An output buffer in a multi-threaded processor is managed to store a variable amount of output data. Parallel threads produce a variable amount of output data. A controller is configured to determine how much output buffer space is needed per thread and how many threads can execute in parallel, given the available space in the output buffer. The controller also determines where each thread writes to in the output buffer.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: June 29, 2010
    Assignee: NVIDIA Corporation
    Inventors: Mark R. Goudy, Andrew J. Tao, Dominic Acocella
  • Publication number: 20100153692
    Abstract: Exemplary apparatus, method, and system embodiments provide for accelerated hardware processing of an action script for a graphical image for visual display. An exemplary apparatus comprises: a first memory; and a plurality of processors to separate the action script from other data, to convert a plurality of descriptive elements of the action script into a plurality of hardware-level operational or control codes, and to perform one or more operations corresponding to an operational code of the plurality of operational codes using corresponding data to generate pixel data for the graphical image. In an exemplary embodiment, at least one processor further is to parse the action script into the plurality of descriptive elements and the corresponding data, and to extract data from the action script and to store the extracted data in the first memory as a plurality of control words having the corresponding data in predetermined fields.
    Type: Application
    Filed: February 14, 2009
    Publication date: June 17, 2010
    Applicant: PERSONAL WEB SYSTEMS, INC.
    Inventors: Bhaskar Kota, Lakshmikanth Surya Naga Satyavolu, Ganapathi Venkata Puppala, Praveen Kumar Bollam, Sairam Sambaraju, Paul L. Master
  • Patent number: 7730287
    Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways, and (c) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: June 1, 2010
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20100122070
    Abstract: Subvector slices x(i,r,s) of a first vector x(i) are stored (e.g., in a CAM array) in a bit-parallel word-serial manner. For each of the stored subvector slices and in parallel on bits of said each subvector slice, an operation is executed that outputs a pre-calculated inner product result of the said bits and a second vector a. If the subvector slices x(i,r,s) of the first vector x(i) are initially stored in a bit-serial word-serial manner, there is a transform to store them in the bit-parallel word serial manner by copying relevant bits of each of the subvector slices from a 0th column of a content-addressable memory array to elements of a tags register and, for each kth iteration, shifting bits in the elements of the tags register by m positions and copying the shifted bits to a column of the CAM array. An associative processor outputs the pre-calculated inner product result in a distributed arithmetic manner.
    Type: Application
    Filed: November 7, 2008
    Publication date: May 13, 2010
    Inventors: David Guevorkian, Timo Yli-Pietila, Petri Liuha
  • Publication number: 20100115232
    Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
  • Publication number: 20100100713
    Abstract: A floating point processor unit executes a floating point compare instruction with two operands of the same or different precision by comparing the two operands in integer format, which speeds up the execution of the floating point compare instruction significantly. The floating point processor now executes the floating point compare instruction at least twice as fast or faster (e.g., two clock cycles instead of five clock cycles in the prior art) for nearly most operand cases (e.g., 99% of all cases). Only the rare corner cases require additional operations on one of the operands and thus require additional cycles of execution time because the integer compare operation will not work for these corner cases. This is due to the fact that one operand is a single precision subnormal number in an unnormalized representation (i.e., has two representations) and the other operand is in the SP subnormal range such that the integer compare operation will fail.
    Type: Application
    Filed: October 22, 2008
    Publication date: April 22, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Maarten J. Boersma, Michael Kroener, Silvia M. Meuller, Jochen Preiss
  • Publication number: 20100095099
    Abstract: A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers.
    Type: Application
    Filed: October 14, 2008
    Publication date: April 15, 2010
    Applicant: International Business Machines Corporation
    Inventors: Maarten Boersma, Michael Kroener, Petra Leber, Silvia M. Mueller, Jochen Preiss, Kerstin Schelm
  • Publication number: 20100095097
    Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.
    Type: Application
    Filed: October 14, 2008
    Publication date: April 15, 2010
    Applicant: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Publication number: 20100095098
    Abstract: Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.
    Type: Application
    Filed: October 14, 2008
    Publication date: April 15, 2010
    Applicant: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Publication number: 20100058037
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: August 14, 2009
    Publication date: March 4, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100049950
    Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: August 14, 2009
    Publication date: February 25, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100049951
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then writes the product of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: August 14, 2009
    Publication date: February 25, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100042812
    Abstract: A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.
    Type: Application
    Filed: August 14, 2008
    Publication date: February 18, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Publication number: 20100042815
    Abstract: The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction.
    Type: Application
    Filed: April 7, 2009
    Publication date: February 18, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100042817
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: June 30, 2009
    Publication date: February 18, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100042818
    Abstract: The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: June 30, 2009
    Publication date: February 18, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20100042807
    Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Application
    Filed: June 30, 2009
    Publication date: February 18, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, JR.
  • Publication number: 20100042816
    Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.
    Type: Application
    Filed: April 7, 2009
    Publication date: February 18, 2010
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 7664930
    Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the imaginary components of the second operand from the real components of the first operand and to add the real components of the second operand to the imaginary components of the first operand.
    Type: Grant
    Filed: May 30, 2008
    Date of Patent: February 16, 2010
    Assignee: Marvell International Ltd
    Inventors: Nigel C. Paver, Moinul H. Khan, Bradley C. Aldrich
  • Patent number: 7660973
    Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: February 9, 2010
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 7660972
    Abstract: A method and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying three registers each containing a plurality of data elements, the execution unit operable to multiply the first and second registers and add the third register to produce a catenated result containing a plurality of data elements. Additional instructions provide group floating-point subtract, add, multiply, set less, and set greater equal operations. The set less and set greater equal operations produce alternatively zero or an identity element for each element of a catenated result, the result facilitating alternative selection of individual data elements using bitwise Boolean operations and without requiring conditional branch operations.
    Type: Grant
    Filed: January 16, 2004
    Date of Patent: February 9, 2010
    Assignee: Microunity Systems Engineering, Inc
    Inventors: Craig Hansen, John Moussouris
  • Patent number: 7659911
    Abstract: A method and apparatus for perfectly lossless and minimal-loss interconversion of digital color data between spectral color spaces (RGB) and perceptually based luma-chroma color spaces (Y?CBCR) is disclosed. In particular, the present invention provides a process for converting digital pixels from R?G?B? space to Y?CBCR space and back, or from Y?CBCR space to R?G?B? space and back, with zero error, or, in constant-precision implementations, with guaranteed minimal error. This invention permits digital video editing and image editing systems to repeatedly interconvert between color spaces without accumulating errors. In image codecs, this invention can improve the quality of lossy image compressors independently of their core algorithms, and enables lossless image compressors to operate in a different color space than the source data without thereby becoming lossy.
    Type: Grant
    Filed: April 21, 2005
    Date of Patent: February 9, 2010
    Inventor: Andreas Wittenstein
  • Patent number: 7660967
    Abstract: A computer processor is responsive to successive processing instructions in an issue order to process regular vectors to generate a result vector without use of a cache. At least two architectural registers having input-vector capability are selectively coupled to memory to receive corresponding vector-elements of two vectors and transfer the vector-elements to a selected functional unit. At least one architectural register having output capability is selectively coupled to an output, which in turn is coupled to transfer result vector-elements to the memory. The functional unit performs a function on the vector-elements to generate a respective result-element. The result-elements are transferred to a selected architectural register for processing as operands in performance of further functions by a functional unit, or are transferred to the output for transfer to memory. In either case, the order of the result vector-elements is restored to the issue order of the successive processing instructions.
    Type: Grant
    Filed: January 30, 2008
    Date of Patent: February 9, 2010
    Assignee: Efficient Memory Technology
    Inventor: Maurice L. Hutson
  • Publication number: 20100031009
    Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.
    Type: Application
    Filed: August 1, 2008
    Publication date: February 4, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Publication number: 20090327665
    Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.
    Type: Application
    Filed: June 30, 2008
    Publication date: December 31, 2009
    Inventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Amit Gradstein, Guy Bale, Thierry Pons
  • Publication number: 20090271591
    Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.
    Type: Application
    Filed: July 2, 2009
    Publication date: October 29, 2009
    Inventors: Fumio ARAKAWA, Tetsuya Yamada
  • Publication number: 20090265409
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
    Type: Application
    Filed: March 23, 2009
    Publication date: October 22, 2009
    Inventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
  • Publication number: 20090265529
    Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.
    Type: Application
    Filed: April 7, 2009
    Publication date: October 22, 2009
    Applicant: NEC CORPORATION
    Inventor: Yusuke Kobayashi
  • Patent number: 7600104
    Abstract: System and method are provided for parallel vector data processing having a data processor capable of vector data having a defined first bit-length. In one embodiment, at least one of first and second operand registers is used for storing operands, and an additional data storage element is used to have a size to store a number of bits corresponding to the first bit-length, and the storage element is segmented into a defined number of segments. An instruction set storage unit is used for storing a set of instructions for the data processor to process a set of data in parallel by use of the additional storage element.
    Type: Grant
    Filed: August 15, 2006
    Date of Patent: October 6, 2009
    Inventor: Peter Neumann
  • Publication number: 20090249040
    Abstract: An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. According to an embedded control system in the present invention, when discrete data in the floating-point format is stored in a read-only memory, the discrete data in the floating-point format is converted into data in a significand-reduced floating-point format before being stored. Here, a significand-reduced floating-point number is a number obtained by deleting low-order bits of the significand of a floating-point number. Further, an interpolation search is performed using discrete data, the discrete data in the significand-reduced floating-point format stored in the read-only memory is brought back to the discrete data in the floating-point format before an interpolation search being performed.
    Type: Application
    Filed: February 19, 2009
    Publication date: October 1, 2009
    Applicant: Hitachi, Ltd.
    Inventors: Shinya FUJIMOTO, Keiichiro Ohkawa
  • Publication number: 20090240917
    Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.
    Type: Application
    Filed: October 10, 2006
    Publication date: September 24, 2009
    Applicant: Altera Corporation
    Inventor: Michael Fitton
  • Publication number: 20090240927
    Abstract: A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data.
    Type: Application
    Filed: November 25, 2008
    Publication date: September 24, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Toshio YOSHIDA
  • Publication number: 20090210678
    Abstract: A data processing apparatus operate to process floating point operands is disclosed. The data processing apparatus comprises: an instruction decoder operable to decode an instruction for processing floating point operands; and a data processor operable to perform data processing operations controlled by the instruction decoder wherein: in response to the decoded instruction indicating operation according to a flush-to-zero semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as zero operands; and in response to the decoded instruction indicating operation according to a denormal semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as denormal operands.
    Type: Application
    Filed: August 1, 2005
    Publication date: August 20, 2009
    Inventor: Simon Ford