Floating Point Or Vector Patents (Class 712/222)
-
Publication number: 20100268920Abstract: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.Type: ApplicationFiled: April 16, 2009Publication date: October 21, 2010Inventors: Jeffrey S. Brooks, Paul J. Jordan, Christopher H. Olson
-
Patent number: 7818539Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.Type: GrantFiled: August 28, 2006Date of Patent: October 19, 2010Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of TechnologyInventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
-
Patent number: 7818548Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, and (c) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways.Type: GrantFiled: July 27, 2007Date of Patent: October 19, 2010Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 7818540Abstract: A vector processing system for executing vector instructions, each instruction defining multiple value pairs, an operation to be executed and a modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and, when selected, to implement an operation on said value pair to generate a result, each processing unit comprising at least one flag and being selectable in dependence on a condition defined by said at least one flag, wherein the modifier defines the condition under which the parallel processing unit is individually selected.Type: GrantFiled: May 19, 2006Date of Patent: October 19, 2010Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 7814303Abstract: Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence.Type: GrantFiled: October 23, 2008Date of Patent: October 12, 2010Assignee: International Business Machines CorporationInventors: Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20100257342Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.Type: ApplicationFiled: June 17, 2010Publication date: October 7, 2010Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Chun Gi LYUH, Yil Suk YANG, Se Wan HEO, Soon Il YEO, Tae Moon ROH, Jong Dae KIM, Ki Chul KIM, Se Hoon YOO
-
Patent number: 7788469Abstract: A hardware accelerator is used to execute a floating-point byte-code in an information processing device. For a floating-point byte-code, a byte-code accelerator BCA feeds an instruction stream for using a FPU to a CPU. When the FPU is used, first the data is transferred to the FPU register from a general-purpose register, and then an FPU operation is performed. For data, such as a denormalized number, that cannot be processed by the FPU, in order to call a floating-point math library of software, the processing of the BCA is completed and the processing moves to processing by software. In order to realize this, data on a data transfer bus from the CPU to the FPU is snooped by the hardware accelerator, and a cancel request is signaled to the CPU to inhibit execution of the FPU operation when corresponding data is detected in a data checking part.Type: GrantFiled: July 6, 2004Date of Patent: August 31, 2010Assignee: Renesas Technology Corp.Inventors: Tetsuya Yamada, Naohiko Irie, Takahiro Irita, Masayuki Kabasawa
-
Patent number: 7788471Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.Type: GrantFiled: September 18, 2006Date of Patent: August 31, 2010Assignee: Freescale Semiconductor, Inc.Inventor: Chengke Sheng
-
Publication number: 20100218194Abstract: A method for controlling a data processing system, a data processing system executing a similar method, and a computer readable medium with instructions for a similar method. The method includes receiving, by an operating system executing on a data processing system, an execution request from an application, the execution request including at least one resource-defining attribute corresponding to an execution thread of the application. The method also includes allocating processor resources to the execution thread by the operating system according to the at least one resource-defining attribute, and allowing execution of the execution thread on the data processing system according to the allocated processor resources.Type: ApplicationFiled: August 27, 2009Publication date: August 26, 2010Applicant: Siemens Product Lifecycle Management Software Inc.Inventors: John Gordon Dallman, Peter Philip Lonsdale Nanson
-
Publication number: 20100211761Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. A vector math instruction decoded by the instruction decode unit causes input vectors and output vectors to be aligned with a maximum boundary of the register set and causes parallel operations by the work units.Type: ApplicationFiled: February 18, 2010Publication date: August 19, 2010Applicant: TEXAS INSTRUMENTS INCORPORATEDInventor: Udayan DASGUPTA
-
Patent number: 7779237Abstract: A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency.Type: GrantFiled: July 11, 2007Date of Patent: August 17, 2010Assignee: International Business Machines CorporationInventors: Anthony Correale, Jr., Kenichi Tsuchiya
-
Publication number: 20100205411Abstract: A result processor access a result table for an entry associated with a predetermined sub-expression of a regular expression in response to a finite state machine finding the predetermined sub-expression in the input stream. The result processor executes an instruction associated with the entry, the instruction including one or more operations to be performed on one or more bits in a bit vector register, and determines as a function of the one or more bits in the bit vector register whether the complex regular expression has been found in the input stream.Type: ApplicationFiled: February 11, 2009Publication date: August 12, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Jan van Lunteren
-
Patent number: 7769981Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.Type: GrantFiled: March 11, 2008Date of Patent: August 3, 2010Assignee: Electronics and Telecommunications Research InstituteInventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
-
Publication number: 20100191939Abstract: A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product.Type: ApplicationFiled: January 27, 2009Publication date: July 29, 2010Applicant: International Business Machines CorporationInventors: Adam J. Muff, Matthew R. Tubbs
-
Patent number: 7765389Abstract: Exception handling is simulated. An exception simulator is employed to simulate exceptions generated from routines simulating operations. The exception simulator provides an indication of the exception and invokes an interruption, when appropriate. The exception simulator includes an instruction invoked to handle the exception and any interruption.Type: GrantFiled: April 25, 2007Date of Patent: July 27, 2010Assignee: International Business Machines CorporationInventors: Shawn D. Lundvall, Ronald M. Smith, Sr., Phil C. Yeh
-
Patent number: 7765386Abstract: An embodiment of the present invention is a technique to perform floating-point operations for vector processing. An input queue captures a plurality of vector inputs. A scheduler dispatches the vector inputs. A plurality of floating-point (FP) pipelines generates FP results from operating on scalar components of the vector inputs dispatched from the scheduler. An arbiter and assembly unit arbitrates use of output section and assembles the FP results to write to the output section.Type: GrantFiled: September 28, 2005Date of Patent: July 27, 2010Assignee: Intel CorporationInventors: David D. Donofrio, Michael Dwyer
-
Publication number: 20100174891Abstract: In a reconfigurable SIMD processor, a unit of operation for executing an instruction corresponds to one group, and the one group that includes a plurality of PEs implements at least a part of an operation unit that executes at least one of an integer divide instruction: a floating decimal point add/subtract instruction; a floating decimal point multiply instruction; and a floating decimal point divide instruction, using operation units and general purpose registers provided in a plurality of the PEs. The number of the PEs that compose the one group is varied in accordance with the instruction.Type: ApplicationFiled: March 27, 2008Publication date: July 8, 2010Inventor: Shohei Nomoto
-
Patent number: 7747842Abstract: An output buffer in a multi-threaded processor is managed to store a variable amount of output data. Parallel threads produce a variable amount of output data. A controller is configured to determine how much output buffer space is needed per thread and how many threads can execute in parallel, given the available space in the output buffer. The controller also determines where each thread writes to in the output buffer.Type: GrantFiled: December 19, 2005Date of Patent: June 29, 2010Assignee: NVIDIA CorporationInventors: Mark R. Goudy, Andrew J. Tao, Dominic Acocella
-
Publication number: 20100153692Abstract: Exemplary apparatus, method, and system embodiments provide for accelerated hardware processing of an action script for a graphical image for visual display. An exemplary apparatus comprises: a first memory; and a plurality of processors to separate the action script from other data, to convert a plurality of descriptive elements of the action script into a plurality of hardware-level operational or control codes, and to perform one or more operations corresponding to an operational code of the plurality of operational codes using corresponding data to generate pixel data for the graphical image. In an exemplary embodiment, at least one processor further is to parse the action script into the plurality of descriptive elements and the corresponding data, and to extract data from the action script and to store the extracted data in the first memory as a plurality of control words having the corresponding data in predetermined fields.Type: ApplicationFiled: February 14, 2009Publication date: June 17, 2010Applicant: PERSONAL WEB SYSTEMS, INC.Inventors: Bhaskar Kota, Lakshmikanth Surya Naga Satyavolu, Ganapathi Venkata Puppala, Praveen Kumar Bollam, Sairam Sambaraju, Paul L. Master
-
Patent number: 7730287Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways, and (c) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.Type: GrantFiled: July 27, 2007Date of Patent: June 1, 2010Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Publication number: 20100122070Abstract: Subvector slices x(i,r,s) of a first vector x(i) are stored (e.g., in a CAM array) in a bit-parallel word-serial manner. For each of the stored subvector slices and in parallel on bits of said each subvector slice, an operation is executed that outputs a pre-calculated inner product result of the said bits and a second vector a. If the subvector slices x(i,r,s) of the first vector x(i) are initially stored in a bit-serial word-serial manner, there is a transform to store them in the bit-parallel word serial manner by copying relevant bits of each of the subvector slices from a 0th column of a content-addressable memory array to elements of a tags register and, for each kth iteration, shifting bits in the elements of the tags register by m positions and copying the shifted bits to a column of the CAM array. An associative processor outputs the pre-calculated inner product result in a distributed arithmetic manner.Type: ApplicationFiled: November 7, 2008Publication date: May 13, 2010Inventors: David Guevorkian, Timo Yli-Pietila, Petri Liuha
-
Publication number: 20100115232Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
-
Publication number: 20100100713Abstract: A floating point processor unit executes a floating point compare instruction with two operands of the same or different precision by comparing the two operands in integer format, which speeds up the execution of the floating point compare instruction significantly. The floating point processor now executes the floating point compare instruction at least twice as fast or faster (e.g., two clock cycles instead of five clock cycles in the prior art) for nearly most operand cases (e.g., 99% of all cases). Only the rare corner cases require additional operations on one of the operands and thus require additional cycles of execution time because the integer compare operation will not work for these corner cases. This is due to the fact that one operand is a single precision subnormal number in an unnormalized representation (i.e., has two representations) and the other operand is in the SP subnormal range such that the integer compare operation will fail.Type: ApplicationFiled: October 22, 2008Publication date: April 22, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Maarten J. Boersma, Michael Kroener, Silvia M. Meuller, Jochen Preiss
-
Publication number: 20100095099Abstract: A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers.Type: ApplicationFiled: October 14, 2008Publication date: April 15, 2010Applicant: International Business Machines CorporationInventors: Maarten Boersma, Michael Kroener, Petra Leber, Silvia M. Mueller, Jochen Preiss, Kerstin Schelm
-
Publication number: 20100095097Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.Type: ApplicationFiled: October 14, 2008Publication date: April 15, 2010Applicant: International Business Machines CorporationInventor: Michael K. Gschwind
-
Publication number: 20100095098Abstract: Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.Type: ApplicationFiled: October 14, 2008Publication date: April 15, 2010Applicant: International Business Machines CorporationInventor: Michael K. Gschwind
-
Publication number: 20100058037Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: August 14, 2009Publication date: March 4, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100049950Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: August 14, 2009Publication date: February 25, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100049951Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then writes the product of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: August 14, 2009Publication date: February 25, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100042812Abstract: A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.Type: ApplicationFiled: August 14, 2008Publication date: February 18, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
-
Publication number: 20100042815Abstract: The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction.Type: ApplicationFiled: April 7, 2009Publication date: February 18, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100042817Abstract: The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: June 30, 2009Publication date: February 18, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100042818Abstract: The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: June 30, 2009Publication date: February 18, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20100042807Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.Type: ApplicationFiled: June 30, 2009Publication date: February 18, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff, JR.
-
Publication number: 20100042816Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.Type: ApplicationFiled: April 7, 2009Publication date: February 18, 2010Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Patent number: 7664930Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the imaginary components of the second operand from the real components of the first operand and to add the real components of the second operand to the imaginary components of the first operand.Type: GrantFiled: May 30, 2008Date of Patent: February 16, 2010Assignee: Marvell International LtdInventors: Nigel C. Paver, Moinul H. Khan, Bradley C. Aldrich
-
Patent number: 7660973Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.Type: GrantFiled: July 27, 2007Date of Patent: February 9, 2010Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 7660972Abstract: A method and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying three registers each containing a plurality of data elements, the execution unit operable to multiply the first and second registers and add the third register to produce a catenated result containing a plurality of data elements. Additional instructions provide group floating-point subtract, add, multiply, set less, and set greater equal operations. The set less and set greater equal operations produce alternatively zero or an identity element for each element of a catenated result, the result facilitating alternative selection of individual data elements using bitwise Boolean operations and without requiring conditional branch operations.Type: GrantFiled: January 16, 2004Date of Patent: February 9, 2010Assignee: Microunity Systems Engineering, IncInventors: Craig Hansen, John Moussouris
-
Patent number: 7659911Abstract: A method and apparatus for perfectly lossless and minimal-loss interconversion of digital color data between spectral color spaces (RGB) and perceptually based luma-chroma color spaces (Y?CBCR) is disclosed. In particular, the present invention provides a process for converting digital pixels from R?G?B? space to Y?CBCR space and back, or from Y?CBCR space to R?G?B? space and back, with zero error, or, in constant-precision implementations, with guaranteed minimal error. This invention permits digital video editing and image editing systems to repeatedly interconvert between color spaces without accumulating errors. In image codecs, this invention can improve the quality of lossy image compressors independently of their core algorithms, and enables lossless image compressors to operate in a different color space than the source data without thereby becoming lossy.Type: GrantFiled: April 21, 2005Date of Patent: February 9, 2010Inventor: Andreas Wittenstein
-
Patent number: 7660967Abstract: A computer processor is responsive to successive processing instructions in an issue order to process regular vectors to generate a result vector without use of a cache. At least two architectural registers having input-vector capability are selectively coupled to memory to receive corresponding vector-elements of two vectors and transfer the vector-elements to a selected functional unit. At least one architectural register having output capability is selectively coupled to an output, which in turn is coupled to transfer result vector-elements to the memory. The functional unit performs a function on the vector-elements to generate a respective result-element. The result-elements are transferred to a selected architectural register for processing as operands in performance of further functions by a functional unit, or are transferred to the output for transfer to memory. In either case, the order of the result vector-elements is restored to the issue order of the successive processing instructions.Type: GrantFiled: January 30, 2008Date of Patent: February 9, 2010Assignee: Efficient Memory TechnologyInventor: Maurice L. Hutson
-
Publication number: 20100031009Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.Type: ApplicationFiled: August 1, 2008Publication date: February 4, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam James Muff, Matthew Ray Tubbs
-
Publication number: 20090327665Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.Type: ApplicationFiled: June 30, 2008Publication date: December 31, 2009Inventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Amit Gradstein, Guy Bale, Thierry Pons
-
Publication number: 20090271591Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: ApplicationFiled: July 2, 2009Publication date: October 29, 2009Inventors: Fumio ARAKAWA, Tetsuya Yamada
-
Publication number: 20090265409Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.Type: ApplicationFiled: March 23, 2009Publication date: October 22, 2009Inventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
-
Publication number: 20090265529Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.Type: ApplicationFiled: April 7, 2009Publication date: October 22, 2009Applicant: NEC CORPORATIONInventor: Yusuke Kobayashi
-
Patent number: 7600104Abstract: System and method are provided for parallel vector data processing having a data processor capable of vector data having a defined first bit-length. In one embodiment, at least one of first and second operand registers is used for storing operands, and an additional data storage element is used to have a size to store a number of bits corresponding to the first bit-length, and the storage element is segmented into a defined number of segments. An instruction set storage unit is used for storing a set of instructions for the data processor to process a set of data in parallel by use of the additional storage element.Type: GrantFiled: August 15, 2006Date of Patent: October 6, 2009Inventor: Peter Neumann
-
Publication number: 20090249040Abstract: An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. According to an embedded control system in the present invention, when discrete data in the floating-point format is stored in a read-only memory, the discrete data in the floating-point format is converted into data in a significand-reduced floating-point format before being stored. Here, a significand-reduced floating-point number is a number obtained by deleting low-order bits of the significand of a floating-point number. Further, an interpolation search is performed using discrete data, the discrete data in the significand-reduced floating-point format stored in the read-only memory is brought back to the discrete data in the floating-point format before an interpolation search being performed.Type: ApplicationFiled: February 19, 2009Publication date: October 1, 2009Applicant: Hitachi, Ltd.Inventors: Shinya FUJIMOTO, Keiichiro Ohkawa
-
Publication number: 20090240917Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.Type: ApplicationFiled: October 10, 2006Publication date: September 24, 2009Applicant: Altera CorporationInventor: Michael Fitton
-
Publication number: 20090240927Abstract: A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data.Type: ApplicationFiled: November 25, 2008Publication date: September 24, 2009Applicant: FUJITSU LIMITEDInventor: Toshio YOSHIDA
-
Publication number: 20090210678Abstract: A data processing apparatus operate to process floating point operands is disclosed. The data processing apparatus comprises: an instruction decoder operable to decode an instruction for processing floating point operands; and a data processor operable to perform data processing operations controlled by the instruction decoder wherein: in response to the decoded instruction indicating operation according to a flush-to-zero semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as zero operands; and in response to the decoded instruction indicating operation according to a denormal semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as denormal operands.Type: ApplicationFiled: August 1, 2005Publication date: August 20, 2009Inventor: Simon Ford