Floating Point Or Vector Patents (Class 712/222)

MECHANISM FOR HANDLING UNFUSED MULTIPLY-ACCUMULATE ACCRUED EXCEPTION BITS IN A PROCESSOR

Publication number: 20100268920

Abstract: A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state.

Type: Application

Filed: April 16, 2009

Publication date: October 21, 2010

Inventors: Jeffrey S. Brooks, Paul J. Jordan, Christopher H. Olson
System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values

Patent number: 7818539

Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.

Type: Grant

Filed: August 28, 2006

Date of Patent: October 19, 2010

Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology

Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
Method and software for group data operations

Patent number: 7818548

Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, and (c) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways.

Type: Grant

Filed: July 27, 2007

Date of Patent: October 19, 2010

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Vector processing system

Patent number: 7818540

Abstract: A vector processing system for executing vector instructions, each instruction defining multiple value pairs, an operation to be executed and a modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and, when selected, to implement an operation on said value pair to generate a result, each processing unit comprising at least one flag and being selectable in dependence on a condition defined by said at least one flag, wherein the modifier defines the condition under which the parallel processing unit is individually selected.

Type: Grant

Filed: May 19, 2006

Date of Patent: October 19, 2010

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
Execution of a sequence of vector instructions preceded by a swizzle sequence instruction specifying data element shuffle orders respectively

Patent number: 7814303

Abstract: Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence.

Type: Grant

Filed: October 23, 2008

Date of Patent: October 12, 2010

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
ROW OF FLOATING POINT ACCUMULATORS COUPLED TO RESPECTIVE PES IN UPPERMOST ROW OF PE ARRAY FOR PERFORMING ADDITION OPERATION

Publication number: 20100257342

Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.

Type: Application

Filed: June 17, 2010

Publication date: October 7, 2010

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Chun Gi LYUH, Yil Suk YANG, Se Wan HEO, Soon Il YEO, Tae Moon ROH, Jong Dae KIM, Ki Chul KIM, Se Hoon YOO
Information processing device having arrangements to inhibit coprocessor upon encountering data that the coprocessor cannot handle

Patent number: 7788469

Abstract: A hardware accelerator is used to execute a floating-point byte-code in an information processing device. For a floating-point byte-code, a byte-code accelerator BCA feeds an instruction stream for using a FPU to a CPU. When the FPU is used, first the data is transferred to the FPU register from a general-purpose register, and then an FPU operation is performed. For data, such as a denormalized number, that cannot be processed by the FPU, in order to call a floating-point math library of software, the processing of the BCA is completed and the processing moves to processing by software. In order to realize this, data on a data transfer bus from the CPU to the FPU is snooped by the hardware accelerator, and a cancel request is signaled to the CPU to inhibit execution of the FPU operation when corresponding data is detected in a data checking part.

Type: Grant

Filed: July 6, 2004

Date of Patent: August 31, 2010

Assignee: Renesas Technology Corp.

Inventors: Tetsuya Yamada, Naohiko Irie, Takahiro Irita, Masayuki Kabasawa
Data processor and methods thereof

Patent number: 7788471

Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.

Type: Grant

Filed: September 18, 2006

Date of Patent: August 31, 2010

Assignee: Freescale Semiconductor, Inc.

Inventor: Chengke Sheng
SYSTEM AND METHOD FOR THREAD SCHEDULING IN PROCESSORS

Publication number: 20100218194

Abstract: A method for controlling a data processing system, a data processing system executing a similar method, and a computer readable medium with instructions for a similar method. The method includes receiving, by an operating system executing on a data processing system, an execution request from an application, the execution request including at least one resource-defining attribute corresponding to an execution thread of the application. The method also includes allocating processor resources to the execution thread by the operating system according to the at least one resource-defining attribute, and allowing execution of the execution thread on the data processing system according to the allocated processor resources.

Type: Application

Filed: August 27, 2009

Publication date: August 26, 2010

Applicant: Siemens Product Lifecycle Management Software Inc.

Inventors: John Gordon Dallman, Peter Philip Lonsdale Nanson
DIGITAL SIGNAL PROCESSOR (DSP) WITH VECTOR MATH INSTRUCTION

Publication number: 20100211761

Abstract: In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. A vector math instruction decoded by the instruction decode unit causes input vectors and output vectors to be aligned with a maximum boundary of the register set and causes parallel operations by the work units.

Type: Application

Filed: February 18, 2010

Publication date: August 19, 2010

Applicant: TEXAS INSTRUMENTS INCORPORATED

Inventor: Udayan DASGUPTA
Adaptive execution frequency control method for enhanced instruction throughput

Patent number: 7779237

Abstract: A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency.

Type: Grant

Filed: July 11, 2007

Date of Patent: August 17, 2010

Assignee: International Business Machines Corporation

Inventors: Anthony Correale, Jr., Kenichi Tsuchiya
HANDLING COMPLEX REGEX PATTERNS STORAGE-EFFICIENTLY USING THE LOCAL RESULT PROCESSOR

Publication number: 20100205411

Abstract: A result processor access a result table for an entry associated with a predetermined sub-expression of a regular expression in response to a finite state machine finding the predetermined sub-expression in the input stream. The result processor executes an instruction associated with the entry, the instruction including one or more operations to be performed on one or more bits in a bit vector register, and determines as a function of the one or more bits in the bit vector register whether the complex regular expression has been found in the input stream.

Type: Application

Filed: February 11, 2009

Publication date: August 12, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Jan van Lunteren
Row of floating point accumulators coupled to respective PEs in uppermost row of PE array for performing addition operation

Patent number: 7769981

Abstract: Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.

Type: Grant

Filed: March 11, 2008

Date of Patent: August 3, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Chun Gi Lyuh, Yil Suk Yang, Se Wan Heo, Soon Il Yeo, Tae Moon Roh, Jong Dae Kim, Ki Chul Kim, Se Hoon Yoo
TRIGONOMETRIC SUMMATION VECTOR EXECUTION UNIT

Publication number: 20100191939

Abstract: A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product.

Type: Application

Filed: January 27, 2009

Publication date: July 29, 2010

Applicant: International Business Machines Corporation

Inventors: Adam J. Muff, Matthew R. Tubbs
Management of exceptions and hardware interruptions by an exception simulator

Patent number: 7765389

Abstract: Exception handling is simulated. An exception simulator is employed to simulate exceptions generated from routines simulating operations. The exception simulator provides an indication of the exception and invokes an interruption, when appropriate. The exception simulator includes an instruction invoked to handle the exception and any interruption.

Type: Grant

Filed: April 25, 2007

Date of Patent: July 27, 2010

Assignee: International Business Machines Corporation

Inventors: Shawn D. Lundvall, Ronald M. Smith, Sr., Phil C. Yeh
Scalable parallel pipeline floating-point unit for vector processing

Patent number: 7765386

Abstract: An embodiment of the present invention is a technique to perform floating-point operations for vector processing. An input queue captures a plurality of vector inputs. A scheduler dispatches the vector inputs. A plurality of floating-point (FP) pipelines generates FP results from operating on scalar components of the vector inputs dispatched from the scheduler. An arbiter and assembly unit arbitrates use of output section and assembles the FP results to write to the output section.

Type: Grant

Filed: September 28, 2005

Date of Patent: July 27, 2010

Assignee: Intel Corporation

Inventors: David D. Donofrio, Michael Dwyer
RECONFIGURABLE SIMD PROCESSOR AND METHOD FOR CONTROLLING ITS INSTRUCTION EXECUTION

Publication number: 20100174891

Abstract: In a reconfigurable SIMD processor, a unit of operation for executing an instruction corresponds to one group, and the one group that includes a plurality of PEs implements at least a part of an operation unit that executes at least one of an integer divide instruction: a floating decimal point add/subtract instruction; a floating decimal point multiply instruction; and a floating decimal point divide instruction, using operation units and general purpose registers provided in a plurality of the PEs. The number of the PEs that compose the one group is varied in accordance with the instruction.

Type: Application

Filed: March 27, 2008

Publication date: July 8, 2010

Inventor: Shohei Nomoto
Configurable output buffer ganging for a parallel processor

Patent number: 7747842

Abstract: An output buffer in a multi-threaded processor is managed to store a variable amount of output data. Parallel threads produce a variable amount of output data. A controller is configured to determine how much output buffer space is needed per thread and how many threads can execute in parallel, given the available space in the output buffer. The controller also determines where each thread writes to in the output buffer.

Type: Grant

Filed: December 19, 2005

Date of Patent: June 29, 2010

Assignee: NVIDIA Corporation

Inventors: Mark R. Goudy, Andrew J. Tao, Dominic Acocella
Media Action Script Acceleration Apparatus

Publication number: 20100153692

Abstract: Exemplary apparatus, method, and system embodiments provide for accelerated hardware processing of an action script for a graphical image for visual display. An exemplary apparatus comprises: a first memory; and a plurality of processors to separate the action script from other data, to convert a plurality of descriptive elements of the action script into a plurality of hardware-level operational or control codes, and to perform one or more operations corresponding to an operational code of the plurality of operational codes using corresponding data to generate pixel data for the graphical image. In an exemplary embodiment, at least one processor further is to parse the action script into the plurality of descriptive elements and the corresponding data, and to extract data from the action script and to store the extracted data in the first memory as a plurality of control words having the corresponding data in predetermined fields.

Type: Application

Filed: February 14, 2009

Publication date: June 17, 2010

Applicant: PERSONAL WEB SYSTEMS, INC.

Inventors: Bhaskar Kota, Lakshmikanth Surya Naga Satyavolu, Ganapathi Venkata Puppala, Praveen Kumar Bollam, Sairam Sambaraju, Paul L. Master
Method and software for group floating-point arithmetic operations

Patent number: 7730287

Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways, and (c) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.

Type: Grant

Filed: July 27, 2007

Date of Patent: June 1, 2010

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Combined associative and distributed arithmetics for multiple inner products

Publication number: 20100122070

Abstract: Subvector slices x(i,r,s) of a first vector x(i) are stored (e.g., in a CAM array) in a bit-parallel word-serial manner. For each of the stored subvector slices and in parallel on bits of said each subvector slice, an operation is executed that outputs a pre-calculated inner product result of the said bits and a second vector a. If the subvector slices x(i,r,s) of the first vector x(i) are initially stored in a bit-serial word-serial manner, there is a transform to store them in the bit-parallel word serial manner by copying relevant bits of each of the subvector slices from a 0th column of a content-addressable memory array to elements of a tags register and, for each kth iteration, shifting bits in the elements of the tags register by m positions and copying the shifted bits to a column of the CAM array. An associative processor outputs the pre-calculated inner product result in a distributed arithmetic manner.

Type: Application

Filed: November 7, 2008

Publication date: May 13, 2010

Inventors: David Guevorkian, Timo Yli-Pietila, Petri Liuha
LARGE INTEGER SUPPORT IN VECTOR OPERATIONS

Publication number: 20100115232

Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.

Type: Application

Filed: October 31, 2008

Publication date: May 6, 2010

Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
FAST FLOATING POINT COMPARE WITH SLOWER BACKUP FOR CORNER CASES

Publication number: 20100100713

Abstract: A floating point processor unit executes a floating point compare instruction with two operands of the same or different precision by comparing the two operands in integer format, which speeds up the execution of the floating point compare instruction significantly. The floating point processor now executes the floating point compare instruction at least twice as fast or faster (e.g., two clock cycles instead of five clock cycles in the prior art) for nearly most operand cases (e.g., 99% of all cases). Only the rare corner cases require additional operations on one of the operands and thus require additional cycles of execution time because the integer compare operation will not work for these corner cases. This is due to the fact that one operand is a single precision subnormal number in an unnormalized representation (i.e., has two representations) and the other operand is in the SP subnormal range such that the integer compare operation will fail.

Type: Application

Filed: October 22, 2008

Publication date: April 22, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Maarten J. Boersma, Michael Kroener, Silvia M. Meuller, Jochen Preiss
SYSTEM AND METHOD FOR STORING NUMBERS IN FIRST AND SECOND FORMATS IN A REGISTER FILE

Publication number: 20100095099

Abstract: A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers.

Type: Application

Filed: October 14, 2008

Publication date: April 15, 2010

Applicant: International Business Machines Corporation

Inventors: Maarten Boersma, Michael Kroener, Petra Leber, Silvia M. Mueller, Jochen Preiss, Kerstin Schelm
Floating Point Only Single Instruction Multiple Data Instruction Set Architecture

Publication number: 20100095097

Abstract: Mechanisms for implementing a floating point only single instruction multiple data instruction set architecture are provided. A processor is provided that comprises an issue unit, an execution unit coupled to the issue unit, and a vector register file coupled to the execution unit. The execution unit has logic that implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA). The floating point vector registers of the vector register file store both scalar and floating point values as vectors having a plurality of vector elements. The processor may be part of a data processing system.

Type: Application

Filed: October 14, 2008

Publication date: April 15, 2010

Applicant: International Business Machines Corporation

Inventor: Michael K. Gschwind
Generating and Executing Programs for a Floating Point Single Instruction Multiple Data Instruction Set Architecture

Publication number: 20100095098

Abstract: Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.

Type: Application

Filed: October 14, 2008

Publication date: April 15, 2010

Applicant: International Business Machines Corporation

Inventor: Michael K. Gschwind
RUNNING-SHIFT INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100058037

Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: August 14, 2009

Publication date: March 4, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
RUNNING-SUM INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100049950

Abstract: The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: August 14, 2009

Publication date: February 25, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
RUNNING-AND, RUNNING-OR, RUNNING-XOR, AND RUNNING-MULTIPLY INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100049951

Abstract: The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then writes the product of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: August 14, 2009

Publication date: February 25, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Data Dependent Instruction Decode

Publication number: 20100042812

Abstract: A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.

Type: Application

Filed: August 14, 2008

Publication date: February 18, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
METHOD AND APPARATUS FOR EXECUTING PROGRAM CODE

Publication number: 20100042815

Abstract: The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction.

Type: Application

Filed: April 7, 2009

Publication date: February 18, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
SHIFT-IN-RIGHT INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100042817

Abstract: The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: June 30, 2009

Publication date: February 18, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
COPY-PROPAGATE, PROPAGATE-POST, AND PROPAGATE-PRIOR INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100042818

Abstract: The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: June 30, 2009

Publication date: February 18, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
INCREMENT-PROPAGATE AND DECREMENT-PROPAGATE INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100042807

Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.

Type: Application

Filed: June 30, 2009

Publication date: February 18, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, JR.
BREAK, PRE-BREAK, AND REMAINING INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20100042816

Abstract: The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector.

Type: Application

Filed: April 7, 2009

Publication date: February 18, 2010

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Add-subtract coprocessor instruction execution on complex number components with saturation and conditioned on main processor condition flags

Patent number: 7664930

Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the imaginary components of the second operand from the real components of the first operand and to add the real components of the second operand to the imaginary components of the first operand.

Type: Grant

Filed: May 30, 2008

Date of Patent: February 16, 2010

Assignee: Marvell International Ltd

Inventors: Nigel C. Paver, Moinul H. Khan, Bradley C. Aldrich
System and apparatus for group data operations

Patent number: 7660973

Abstract: Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.

Type: Grant

Filed: July 27, 2007

Date of Patent: February 9, 2010

Assignee: Microunity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Method and software for partitioned floating-point multiply-add operation

Patent number: 7660972

Abstract: A method and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying three registers each containing a plurality of data elements, the execution unit operable to multiply the first and second registers and add the third register to produce a catenated result containing a plurality of data elements. Additional instructions provide group floating-point subtract, add, multiply, set less, and set greater equal operations. The set less and set greater equal operations produce alternatively zero or an identity element for each element of a catenated result, the result facilitating alternative selection of individual data elements using bitwise Boolean operations and without requiring conditional branch operations.

Type: Grant

Filed: January 16, 2004

Date of Patent: February 9, 2010

Assignee: Microunity Systems Engineering, Inc

Inventors: Craig Hansen, John Moussouris
Method and apparatus for lossless and minimal-loss color conversion

Patent number: 7659911

Abstract: A method and apparatus for perfectly lossless and minimal-loss interconversion of digital color data between spectral color spaces (RGB) and perceptually based luma-chroma color spaces (Y?CBCR) is disclosed. In particular, the present invention provides a process for converting digital pixels from R?G?B? space to Y?CBCR space and back, or from Y?CBCR space to R?G?B? space and back, with zero error, or, in constant-precision implementations, with guaranteed minimal error. This invention permits digital video editing and image editing systems to repeatedly interconvert between color spaces without accumulating errors. In image codecs, this invention can improve the quality of lossy image compressors independently of their core algorithms, and enables lossless image compressors to operate in a different color space than the source data without thereby becoming lossy.

Type: Grant

Filed: April 21, 2005

Date of Patent: February 9, 2010

Inventor: Andreas Wittenstein
Result data forwarding in parallel vector data processor based on scalar operation issue order

Patent number: 7660967

Abstract: A computer processor is responsive to successive processing instructions in an issue order to process regular vectors to generate a result vector without use of a cache. At least two architectural registers having input-vector capability are selectively coupled to memory to receive corresponding vector-elements of two vectors and transfer the vector-elements to a selected functional unit. At least one architectural register having output capability is selectively coupled to an output, which in turn is coupled to transfer result vector-elements to the memory. The functional unit performs a function on the vector-elements to generate a respective result-element. The result-elements are transferred to a selected architectural register for processing as operands in performance of further functions by a functional unit, or are transferred to the output for transfer to memory. In either case, the order of the result vector-elements is restored to the issue order of the successive processing instructions.

Type: Grant

Filed: January 30, 2008

Date of Patent: February 9, 2010

Assignee: Efficient Memory Technology

Inventor: Maurice L. Hutson
Floating Point Execution Unit for Calculating a One Minus Dot Product Value in a Single Pass

Publication number: 20100031009

Abstract: A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects.

Type: Application

Filed: August 1, 2008

Publication date: February 4, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam James Muff, Matthew Ray Tubbs
Efficient parallel floating point exception handling in a processor

Publication number: 20090327665

Abstract: Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.

Type: Application

Filed: June 30, 2008

Publication date: December 31, 2009

Inventors: Zeev Sperber, Shachar Finkelstein, Gregory Pribush, Amit Gradstein, Guy Bale, Thierry Pons
VECTOR SIMD PROCESSOR

Publication number: 20090271591

Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.

Type: Application

Filed: July 2, 2009

Publication date: October 29, 2009

Inventors: Fumio ARAKAWA, Tetsuya Yamada
PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA

Publication number: 20090265409

Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.

Type: Application

Filed: March 23, 2009

Publication date: October 22, 2009

Inventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
Processor apparatus and method of processing multiple data by single instructions

Publication number: 20090265529

Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.

Type: Application

Filed: April 7, 2009

Publication date: October 22, 2009

Applicant: NEC CORPORATION

Inventor: Yusuke Kobayashi
Method and system for parallel vector data processing of vector data having a number of data elements including a defined first bit-length

Patent number: 7600104

Abstract: System and method are provided for parallel vector data processing having a data processor capable of vector data having a defined first bit-length. In one embodiment, at least one of first and second operand registers is used for storing operands, and an additional data storage element is used to have a size to store a number of bits corresponding to the first bit-length, and the storage element is segmented into a defined number of segments. An instruction set storage unit is used for storing a set of instructions for the data processor to process a set of data in parallel by use of the additional storage element.

Type: Grant

Filed: August 15, 2006

Date of Patent: October 6, 2009

Inventor: Peter Neumann
Embedded Control System

Publication number: 20090249040

Abstract: An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. According to an embedded control system in the present invention, when discrete data in the floating-point format is stored in a read-only memory, the discrete data in the floating-point format is converted into data in a significand-reduced floating-point format before being stored. Here, a significand-reduced floating-point number is a number obtained by deleting low-order bits of the significand of a floating-point number. Further, an interpolation search is performed using discrete data, the discrete data in the significand-reduced floating-point format stored in the read-only memory is brought back to the discrete data in the floating-point format before an interpolation search being performed.

Type: Application

Filed: February 19, 2009

Publication date: October 1, 2009

Applicant: Hitachi, Ltd.

Inventors: Shinya FUJIMOTO, Keiichiro Ohkawa
Method and apparatus for matrix decomposition in programmable logic devices

Publication number: 20090240917

Abstract: A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.

Type: Application

Filed: October 10, 2006

Publication date: September 24, 2009

Applicant: Altera Corporation

Inventor: Michael Fitton
PROCESSOR AND INFORMATION PROCESSING APPARATUS

Publication number: 20090240927

Abstract: A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data.

Type: Application

Filed: November 25, 2008

Publication date: September 24, 2009

Applicant: FUJITSU LIMITED

Inventor: Toshio YOSHIDA
Handling of Denormals In Floating Point Number Processim

Publication number: 20090210678

Abstract: A data processing apparatus operate to process floating point operands is disclosed. The data processing apparatus comprises: an instruction decoder operable to decode an instruction for processing floating point operands; and a data processor operable to perform data processing operations controlled by the instruction decoder wherein: in response to the decoded instruction indicating operation according to a flush-to-zero semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as zero operands; and in response to the decoded instruction indicating operation according to a denormal semantic, the data processor is operable to process the floating point operands in accordance with the decoded instruction such that floating point operands having a denormal value are treated as denormal operands.

Type: Application

Filed: August 1, 2005

Publication date: August 20, 2009

Inventor: Simon Ford

prev … 3 4 5 6 7 8 9 10 11 … next