Distributing Of Vector Data To Vector Registers Patents (Class 712/4)

Masking to control an access to data in vector register (Class 712/5)

GENERATING PREDICATE VALUES DURING VECTOR PROCESSING

Publication number: 20080288745

Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.

Type: Application

Filed: July 11, 2008

Publication date: November 20, 2008

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Multi-Magnitudinal Vectors with Resolution Based on Source Vector Features

Publication number: 20080256329

Abstract: Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.

Type: Application

Filed: April 13, 2007

Publication date: October 16, 2008

Inventors: Daniel T. Heinze, Mark L. Morsch
Filter and Method For Filtering

Publication number: 20080244220

Abstract: A filter and method of filtering modifies the computation order to accommodate horizontal symmetric filtering, and modifies the source operands while modifying the SIMD computation, so as to eliminate such heavy overhead of transposing a pixel matrix.

Type: Application

Filed: October 11, 2007

Publication date: October 2, 2008

Inventors: Guo Hui Lin, Yang Liu, Lu Wan, Min Zhu
Method and apparatus for indirectly addressed vector load-add -store across multi-processors

Patent number: 7421565

Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.

Type: Grant

Filed: August 18, 2003

Date of Patent: September 2, 2008

Assignee: Cray Inc.

Inventor: James R. Kohn
PROCESSOR AND PROGRAM EXECUTION METHOD CAPABLE OF EFFICIENT PROGRAM EXECUTION

Publication number: 20080209162

Abstract: A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs.

Type: Application

Filed: April 28, 2008

Publication date: August 28, 2008

Inventors: Kazuya Furukawa, Tetsuya Tanaka, Nobuo Higaki, Kunihiko Hayashi, Hiroshi Kadota, Tokuzo Kiyohara, Kozo Kimura, Hideshi Nishida, Kazushi Kurata, Shigeki Fujii, Toshio Sugimura
Flow optimization and prediction for VSSE memory operations

Patent number: 7404065

Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.

Type: Grant

Filed: December 21, 2005

Date of Patent: July 22, 2008

Assignee: Intel Corporation

Inventors: Stephan Jourdan, Per Hammarlund, Michael Fetterman, Michael P. Cornaby, Glenn Hinton, Avinash Sodani
Two dimensional addressing of a matrix-vector register array

Patent number: 7386703

Abstract: A processor and method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?1, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.

Type: Grant

Filed: November 18, 2003

Date of Patent: June 10, 2008

Assignee: International Business Machines Corporation

Inventors: Peter A. Sandon, R. Michael P. West
METHOD AND APPARATUS FOR LOADING OR STORING MULTIPLE REGISTERS IN A DATA PROCESSING SYSTEM

Publication number: 20080126744

Abstract: A method for operating a data processing system includes providing an application binary interface (ABI) which determines a set of non-contiguous volatile registers and a set of non-volatile registers. The set of non-contiguous volatile registers includes a plurality of general purpose registers (GPRs) and a plurality of special purpose registers (SPRs). The method includes providing less than three instructions which collectively load or store all of the set of non-contiguous volatile registers determined by the ABI. A system includes a set of volatile registers including a plurality of volatile GPRs, a plurality of volatile supervisor SPRs, and a plurality of volatile user SPRs, and execution circuitry for executing a first instruction that loads or stores the plurality of volatile supervisor SPRs, for executing a second instruction that loads or stores the plurality of volatile GPRs, and for executing a third instruction that loads or stores the plurality of volatile user SPRs.

Type: Application

Filed: August 29, 2006

Publication date: May 29, 2008

Inventor: William C. Moyer
Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor

Publication number: 20080126745

Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.

Type: Application

Filed: October 26, 2007

Publication date: May 29, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
Indirectly addressed vector load-operate-store method and apparatus

Patent number: 7366873

Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.

Type: Grant

Filed: August 18, 2003

Date of Patent: April 29, 2008

Assignee: Cray, Inc.

Inventor: James R. Kohn
Area Optimized Full Vector Width Vector Cross Product

Publication number: 20080082784

Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.

Type: Application

Filed: October 26, 2007

Publication date: April 3, 2008

Applicant: International Business Machines Corporation

Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
Circuit to extract nonadjacent bits from data packets

Patent number: 7353371

Abstract: A method and device to copy data fields from one or more source packets to one or more result packets. In a SET function, adjacent data fields in a source packet is copied to respective destination data fields in a result packet governed by a field locator packet. In an ESET function, data fields in respective source packets are copied to adjacent data fields in a result packet governed by a field locator packet. In an EXTRACT function, data fields in a source packet are copied to adjacent data fields in a result packet governed by a field locator packet. In a SCATTER function, adjacent data fields in a source packet are copied to data fields in respective result packets governed by a field locator packet.

Type: Grant

Filed: December 5, 2002

Date of Patent: April 1, 2008

Assignee: Intel Corporation

Inventors: Corey Gee, Bapi Vinnakota
Scalar result producing method in vector/scalar system by vector unit from vector results according to modifier in vector instruction

Patent number: 7350057

Abstract: Described herein is a method and system for executing instructions. The system comprises a scalar unit for executing scalar instructions each defining a single value pair; a vector unit for executing vector instructions each defining multiple value pairs; and an instruction decoder for receiving a single stream of instructions including scalar instructions and vector instructions and operable to direct scalar instructions to the scalar unit and vector instructions to the vector unit. The vector unit can comprises a plurality of value processing units and a scalar result unit. The scalar unit can comprise a scalar register file. Communication between the vector unit and the scalar unit is enabled by allowing the vector unit to access the scalar register file and allowing the scalar unit to access output from the scalar result unit. The output of the scalar result unit may be based on the relative magnitudes of outputs from the plurality of value processing units.

Type: Grant

Filed: November 6, 2006

Date of Patent: March 25, 2008

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
Data processor and methods thereof

Publication number: 20080072010

Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.

Type: Application

Filed: September 18, 2006

Publication date: March 20, 2008

Applicant: Freescale Semiconductor, Inc.

Inventor: Chengke Sheng
Data processing system having instruction specifiers for SIMD register operands and method thereof

Patent number: 7315932

Abstract: Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned-extended when transferred.

Type: Grant

Filed: September 8, 2003

Date of Patent: January 1, 2008

Inventor: William C. Moyer
Data processor and method for using a data processor with debug circuit

Patent number: 7293258

Abstract: A data processor has a debug circuit arranged to monitor whether operand data used for execution of a program meets a debug exception condition. The debug exception condition tests a two or more of multi-bit subfields of a vector operand independently. Debug action is taken if one or more of the multi-bit subfields meet the corresponding conditions.

Type: Grant

Filed: May 17, 2000

Date of Patent: November 6, 2007

Assignee: NXP B.V.

Inventors: Hendrikus Petrus Elisabeth Vranken, Kornelis Antonius Vissers, Fransiscus Wilhelmus Sijstermans
Functional-level instruction-set computer architecture for processing application-layer content-service requests such as file-access requests

Patent number: 7254696

Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.

Type: Grant

Filed: December 12, 2002

Date of Patent: August 7, 2007

Assignee: Alacritech, Inc.

Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
Method and apparatus for image blending

Patent number: 7230633

Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.

Type: Grant

Filed: January 11, 2006

Date of Patent: June 12, 2007

Assignee: Apple Inc.

Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
Load/store operation of memory misaligned vector data using alignment register storing realigned data portion for combining with remaining portion

Patent number: 7219212

Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.

Type: Grant

Filed: February 25, 2005

Date of Patent: May 15, 2007

Assignee: Tensilica, Inc.

Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
Microprocessor with high speed memory integrated in load/store unit to efficiently perform scatter and gather operations

Patent number: 7216218

Abstract: The present invention relates to the field of (micro)computer design and architecture, and in particular to microarchitecture associated with moving data values between a (micro)processor and memory components. Particularly, the present invention relates to a computer system with an processor architecture in which register addresses are generated with more than one execution channel controlled by one central processing unit with at least one load/store unit for loading and storing data objects, and at least one cache memory associated to the processor holding data objects accessed by the processor, wherein said processor's load/store unit contains a high speed memory directly interfacing said load/store unit to the cache and directly accessible by the cache memory for implementing scatter and gather operations. The present invention improves the performance of architectures with dual ported microprocessor implementations comprising two execution pipelines capable of two load/store data transactions per cycle.

Type: Grant

Filed: June 2, 2004

Date of Patent: May 8, 2007

Assignee: Broadcom Corporation

Inventor: Sophie Wilson
Method and apparatus for a network processor having an architecture that supports burst writes and/or reads

Patent number: 7206857

Abstract: A method is described that involves recognizing that an input queue state has reached a buffer's worth of information. The method also involves generating a first request to read a buffer's worth of information from an input RAM that implements the input queue. The method further involves recognizing that an output queue has room to receive information and that an intermediate queue that provides information to the output queue does not have information waiting to be forwarded to the output queue. The method also involves generating a second request to read information from the input RAM so that at least a portion of the room can be filled. The method also involves granting one of the first and second requests.

Type: Grant

Filed: May 10, 2002

Date of Patent: April 17, 2007

Assignee: Altera Corporation

Inventors: Neil Mammen, Greg Maturi, Mammen Thomas
Two dimensional data access in a processor

Patent number: 7200724

Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grou

Type: Grant

Filed: January 17, 2006

Date of Patent: April 3, 2007

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
Alignment and ordering of vector elements for single instruction multiple data processing

Patent number: 7197625

Abstract: The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register.

Type: Grant

Filed: September 15, 2000

Date of Patent: March 27, 2007

Assignee: MIPS Technologies, Inc.

Inventors: Timothy J. van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
Method and apparatus for multi-thread accumulation buffering in a computation engine

Patent number: 7111156

Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.

Type: Grant

Filed: April 21, 2000

Date of Patent: September 19, 2006

Assignee: ATI Technologies, Inc.

Inventors: Michael Andrew Mang, Michael Mantor
System and method for using hardware assist functions to process multiple arbitrary sized data elements in a register

Patent number: 7107435

Abstract: A system and method for processing multiple arbitrary sized data elements in a register. A method of the invention comprises the steps of: creating a mask register that defines a set of arbitrary sized segments for a register; storing a plurality of arbitrary sized data elements in a segmented data register arranged in accordance with the mask register, wherein the arbitrary sized data elements are sign extended; simultaneously operating on each of the of the data elements in the segmented data register to generate a set of resulting data elements in response to a machine instruction, wherein the resulting data elements depend on each other; and unpacking the resulting data elements to provide a plurality of arbitrary sized results that are independent of each other.

Type: Grant

Filed: May 27, 2003

Date of Patent: September 12, 2006

Assignee: International Business Machines Corporation

Inventors: Michael T. Brady, Jennifer Q. Trelewicz, Joan L. Mitchell
System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values

Patent number: 7100026

Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.

Type: Grant

Filed: May 30, 2001

Date of Patent: August 29, 2006

Assignees: The Massachusetts Institute of Technology, The Board of Trustees of the Leland Stanford Junior University

Inventors: William J. Dally, Scott Rixner, John D. Owens, Ujval J. Kapasi
Code sequence for vector gather and scatter

Patent number: 7093102

Abstract: Gather and scatter operations are used when elements of a vector which may be operated on in parallel are not located at successive addresses in memory. Prior data processing systems required complex address calculation hardware and other hardware to perform vector gather and scatter operations. By contrast, one embodiment of the present invention implements gather and scatter operations using a plurality of deposit and extract instructions. As a result, gather and scatter operations may be efficiently performed within a general purpose processing environment and without the need for dedicated gather/scatter hardware.

Type: Grant

Filed: March 29, 2000

Date of Patent: August 15, 2006

Assignee: Intel Corporation

Inventor: Carole Dulong
Data access in a processor

Patent number: 7080216

Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grou

Type: Grant

Filed: October 31, 2002

Date of Patent: July 18, 2006

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
Method for producing software for controlling mechanisms and technical systems

Patent number: 7065413

Abstract: In a method for controlling mechanisms or technical systems, the mechanisms or technical systems to be controlled are stored in a controller with their states, and with associated signal formers of sensors and actuators, whereby starting from a defined reference state at the onset of the activation of the controller, the actual states signaled by the technical system via the sensors are continuously compared with the specified state, the specified state being stored in the controller, and, based on this comparison, every deviation from the specified state is identified in the technical system, and, when initiated, a new instruction that changes the state of the mechanisms or of the technical system updates the specified state for the comparison and monitors the time till the acknowledgment of the new state, and sensor signals and comparable information exclusively serve the state identification of elementary functions and state changes exclusively ensue upon the initiation of elementary instructions.

Type: Grant

Filed: April 3, 2001

Date of Patent: June 20, 2006

Assignee: Technische Universitaet Dresden

Inventors: Volker Moebius, Knut Grossmann
Apparatus for parallel vector table look-up

Patent number: 7055018

Abstract: Methods and apparatuses for performing simultaneous table look-up using multiple look-up tables. In one aspect of the invention, an execution unit in a microprocessor includes: look-up memory and a first circuit coupled to the look-up memory. In response to the microprocessor receiving a first instruction, the first circuit partitions the look-up memory into a first plurality of look-up tables. In response to the microprocessor receiving a second instruction, the first circuit partitions the look-up memory into a second plurality of look-up tables; and the second plurality of look-up tables simultaneously look up a plurality of entries.

Type: Grant

Filed: December 31, 2001

Date of Patent: May 30, 2006

Assignee: Apple Computer, Inc.

Inventors: Joseph P. Bratt, Sushma Shrikant Trivedi
Method and apparatus for image blending

Patent number: 7034849

Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.

Type: Grant

Filed: December 31, 2001

Date of Patent: April 25, 2006

Assignee: Apple Computer, Inc.

Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
Multi-mode specification-driven disassembler

Patent number: 7036112

Abstract: One embodiment of the present invention provides a system that facilitates implementing multi-mode specification-driven disassembler. During operation, the disassembler receives a machine-code version of a computer program. In order to disassemble a specific machine-code instruction from this machine-code version, the system compares the machine-code instruction against a set of instruction templates for assembly code instructions to identify a set of matching templates. Next, the system selects a matching template from the set of matching templates based on the state of a mode variable, which indicates a specificity mode for the disassembler. The system then disassembles the machine-code instruction using the operand fields defined by the matching template to produce a corresponding assembly code instruction.

Type: Grant

Filed: August 16, 2002

Date of Patent: April 25, 2006

Assignee: SUN Microsystems, Inc.

Inventors: David M. Ungar, Mario I. Wolczko, Bernd J. W. Mathiske
Large table vectorized lookup by selecting entries of vectors resulting from permute operations on sub-tables

Patent number: 7000099

Abstract: A lookup operation is carried out on a data table by logically dividing the data table into a number of smaller sets of data that can be indexed with a single byte of data. Each set of data consists of two vectors, which constitute the operands for a permute instruction. Only a limited number of bits are required to index into the table during the execution of this instruction. The remaining bits of each index are used as masks into a series of select instructions. The select instruction chooses between two vector components, based on the mask, and places the selected components into a new vector. The mask is generated by shifting one of the higher order bits of the index to the most significant position, and then propagating that bit throughout a byte, for example by means of an arithmetic shift. This procedure is carried out for all of the index bytes in the vector, to generate a select mask.

Type: Grant

Filed: July 9, 2002

Date of Patent: February 14, 2006

Assignee: Apple Computer Inc.

Inventor: Ali Sazegari
Fast and flexible scan conversion and matrix transpose in a SIMD processor

Patent number: 6963341

Abstract: The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.

Type: Grant

Filed: May 20, 2003

Date of Patent: November 8, 2005

Inventor: Tibet Mimar
Hardware supported software pipelined loop prologue optimization

Patent number: 6954927

Abstract: A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.

Type: Grant

Filed: October 4, 2001

Date of Patent: October 11, 2005

Assignee: Elbrus International

Inventor: Alexander Y. Ostanevich
Parallel vector table look-up with replicated index element vector

Patent number: 6931511

Abstract: Methods and apparatuses for looking up vectors in parallel using vector table look up operations. In one aspect of the invention, a method to look up a plurality of data items indexed by a vector of indices includes: generating a second vector of indices in a vector register where each index of the second vector of indices is one of a first vector of indices and at least one index in the first vector of indices is replicated as a plurality of duplicated indices in the second vector of indices; and looking up simultaneously a first vector of data items from a plurality of look up tables using the second vector of indices.

Type: Grant

Filed: December 31, 2001

Date of Patent: August 16, 2005

Assignee: Apple Computer, Inc.

Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
Efficient function interpolation using SIMD vector permute functionality

Patent number: 6924802

Abstract: A system, method, and computer program product are provided for generating display data. The data processing system loads coefficient values corresponding to a behavior of a selected function in pre-defined ranges of input data. The data processing system then determines, responsive to items of input data, the range of input data in which the selected function is to be estimated. The data processing system then selects, through the use of a vector permute function, the coefficient values, and evaluates an index function at the each of the items of input data. It then estimates the value of the selected function through parallel mathematical operations on the items of input data, the selected coefficient values, and the values of the index function, and, responsive to the one or more values of the selected function, generates display data.

Type: Grant

Filed: September 12, 2002

Date of Patent: August 2, 2005

Assignee: International Business Machines Corporation

Inventors: Gordon Clyde Fossum, Harm Peter Hofstee, Barry L. Minor, Mark Richard Nutter
SIMD processor with concurrent operation of vector pointer datapath and vector computation datapath

Patent number: 6915411

Abstract: A digital signal processor (DSP) includes a SIMD-based organization wherein operations are executed on a plurality of single-instruction multiple data (SIMD) datapaths or stages connected in cascade. The functionality and data values at each stage may be different, including a different width (e.g., a different number of bits per value) in each stage. The operands and destination for data in a computational datapath are selected indirectly through vector pointer registers in a vector pointers datapath. Each vector pointer register contains a plurality of pointers into a register file of a computational datapath.

Type: Grant

Filed: July 18, 2002

Date of Patent: July 5, 2005

Assignee: International Business Machines Corporation

Inventors: Jamie H. Moreno, Jeffrey Haskell Derby, Uzi Shvadron, Fredy Daniel Neeser, Victor Zyuban, Ayal Zaks, Shay Ben-David
Aggregation of sensory data for distributed decision-making

Patent number: 6865517

Abstract: A method, apparatus and computer product that enables a processor associated with a node in a computer system having various nodes, the nodes having sensors which provide data, and the nodes being connected by a communications facility acquiring local data from the sensor and remote data from other nodes via the data transfer facility. The nodes process data from a local sensor at the node and from remote sensors at other nodes; and analyze the local data, data from other nodes and local decisions made at and received from other nodes to make a local decision for action at the node. A local decision made at a node is in turn communicated to other nodes.

Type: Grant

Filed: December 11, 2002

Date of Patent: March 8, 2005

Assignee: International Business Machines Corporation

Inventors: David F. Bantz, John S. Davis, II, Rafah A. Hosn, Nicholas M. Mitchell, Veronique Perret, Daby M. Sow, Jeremy B. Sussman
Method for referring to address of vector data and vector processor

Publication number: 20040250044

Abstract: The object of the invention is to efficiently perform indirect index vector reference. An element register of a vector register or a scalar register specified in the “index” is divided into multiple areas, and a particular index vector is acquired by selecting any of the divided areas. Accordingly, it is possible to store substantially multiple index vectors in one vector register, and therefore register resources can be efficiently used. The procedure for providing index vectors is similar to that for providing one index vector, and therefore the code size and the process cycles of the program are almost not increased. That is, according to the present invention, indirect index vector reference can be more efficiently performed.

Type: Application

Filed: March 17, 2004

Publication date: December 9, 2004

Applicant: SEIKO EPSON CORPORATION

Inventor: Masakazu Isomura
Data processing system with register store/load utilizing data packing/unpacking

Patent number: 6829696

Abstract: A data processing system (e.g., microprocessor 30) for packing register data while storing it to memory and unpacking data read from memory while loading it into registers using single processor instructions. The system comprises a memory (42) and a central processing unit core (44) with at least one register file (76). The core is responsive to a load instruction (e.g., LDW_BH[U] instruction 184) to retrieve at least one data word from memory and parse the data word over selected parts of at least two data registers in the register file. The core is responsive to a store instruction (e.g., STBH_W instruction 198) to concatenate data from selected parts of at least two data registers into at least one data word and save the data word to memory. The number of data registers is greater than the number of data words parsed into or concatenated from the data registers. Both memory storage space and central processor unit resources are utilized efficiently when working with packed data.

Type: Grant

Filed: October 13, 2000

Date of Patent: December 7, 2004

Assignee: Texas Instruments Incorporated

Inventors: Keith Balmer, Karl M. Guttag, Lewis Nardini
Vector processor and register addressing method

Publication number: 20040243788

Abstract: The object of the invention is to efficiently perform a vector operation using a vector register. A vector processor is provided with a vector register forming a ring buffer, and any address of the ring buffer can be specified as the top address. Accordingly, when multiple vector data to be processed are overlapped, it is possible to circularly read or write the vector data stored in one vector register without storing the vector data in separate vector registers. Thus, it is possible to prevent the same data from being redundantly read as well as to decrease register resources to be required, thereby enabling an efficient vector operation using a vector register.

Type: Application

Filed: March 17, 2004

Publication date: December 2, 2004

Applicant: SEIKO EPSON CORPORATION

Inventor: Masakazu Isomura
Cache consistent control of subsequent overlapping memory access during specified vector scatter instruction execution

Patent number: 6816960

Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.

Type: Grant

Filed: July 10, 2001

Date of Patent: November 9, 2004

Assignee: NEC Corporation

Inventor: Hisao Koyanagi
Method and apparatus for transferring vector data between memory and a register file

Patent number: 6813701

Abstract: A compiler and vector data transfer instructions for use in a vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. The compiler identifies the use of vector data in an application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. A vector is partitioned by the compiler into variable-sized streams which are transferred into and out of the processor as burst transactions. The compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers and each vector buffer is used at a specific time.

Type: Grant

Filed: August 17, 1999

Date of Patent: November 2, 2004

Assignee: NEC Electronics America, Inc.

Inventor: Ahmad R. Ansari
Data reordering processor and method for use in an active memory device

Publication number: 20040193839

Abstract: An active memory device includes a command engine that receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored. The active memory device includes a vector processing and re-ordering system coupled to the array control unit and the memory device. The vector processing and re-ordering system re-orders data received from the memory device into a vector of contiguous data, process the data in accordance with an instruction received from the array control unit to provide results data, and passes the results data to the memory device.

Type: Application

Filed: July 28, 2003

Publication date: September 30, 2004

Inventor: Graham Kirsch
Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page

Publication number: 20040186980

Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.

Type: Application

Filed: March 29, 2004

Publication date: September 23, 2004

Inventor: Ahmad R. Ansari
Method and apparatus for instruction execution in a data processing system

Patent number: 6795908

Abstract: A method for processing scalar and vector executions, where vector executions may be “true” vector operations, CVA, or pseudo-vector operations, PVA. All three types of executions are processed using one architecture. In one embodiment, a compiler analyzes code to identify sections that are vectorizable, and applies either CVA, PVA, or a combination of the two to process these sections. Register overlay is provided for storing load address information and data in PVA mode. Within each CVA and PVA instruction, enable bits describe the data streaming function of the operation. A temporary memory, TM, accommodates variable size vectors, and is used in vector operations, similar to a vector register, to store temporary vectors.

Type: Grant

Filed: June 12, 2000

Date of Patent: September 21, 2004

Assignee: Freescale Semiconductor, Inc.

Inventors: Lea Hwang Lee, William C. Moyer
Apparatus and method for updating pointers for indirect and parallel register access

Publication number: 20040181646

Abstract: An apparatus and method are provided for updating one or more pluralities of pointers (i.e. one or more vector pointers) which are used for accessing one or more pluralities of data elements (i.e. one or more vector data elements) in a multi-ported memory. A first register file holds the vector pointers, a second register file holds stride data, and a plurality of functional units combine data from the second register file with data from the first register file. The results of combining the data are transferred to the first register file and represent updated vector pointers. Furthermore, a third register file is provided for holding modulus selector data to specify the size of a circular buffer for circular addressing.

Type: Application

Filed: March 14, 2003

Publication date: September 16, 2004

Applicant: International Business Machines Corporation

Inventors: Shay Ben-David, Jeffrey Haskell Derby, Thomas W. Fox, Fredy Daniel Neeser, Jaime H. Moreno, Uzi Shvadron, Ayal Zaks
Safety net paradigm for managing two computer execution modes

Patent number: 6789181

Abstract: A method and computer for executing the method. A source program is translated into an object program, in a manner in which the translated object program has a different execution behavior than the source program. The translated object program is executed under a monitor capable of detecting any deviation from fully-correct interpretation before any side-effect of the different execution behavior is irreversibly committed. When the monitor detects the deviation, or when an interrupt occurs during execution of the object program, a state of the program is established corresponding to a state that would have occurred during an execution of the source program, and from which execution can continue. Execution of the source program continues primarily in a hardware emulator designed to execute instructions of an instruction set non-native to the computer.

Type: Grant

Filed: November 3, 1999

Date of Patent: September 7, 2004

Assignee: ATI International, SRL

Inventors: John S. Yates, David L. Reese, Korbin S. Van Dyke, Paul H. Hohensee
Synchronous periodical orthogonal data converter

Publication number: 20040172517

Abstract: An orthogonal data converter for converting the components of a sequential vector component flow to a parallel vector component flow. The data converter has an input rotator configured to rotate corresponding vector components of the sequential vector component flow by a prescribed amount, and a bank of register files configured to store the rotated vector components. The converter also has an output rotator configured to rotate the position of the vector components read from the bank of register files by a prescribed amount. A controller of the converter is operative to control the addressing of the bank of register files and the rotating of the vector components. In this regard, the controller is operative to write the vector components to the bank of register files in a prescribed order and read the vector components in a prescribed order to generate the parallel vector component flow.

Type: Application

Filed: September 19, 2003

Publication date: September 2, 2004

Inventors: Boris Prokopenko, Timour Paltashev

prev 1 2 3 4 5 6 next