Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
  • Publication number: 20080288745
    Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.
    Type: Application
    Filed: July 11, 2008
    Publication date: November 20, 2008
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20080256329
    Abstract: Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.
    Type: Application
    Filed: April 13, 2007
    Publication date: October 16, 2008
    Inventors: Daniel T. Heinze, Mark L. Morsch
  • Publication number: 20080244220
    Abstract: A filter and method of filtering modifies the computation order to accommodate horizontal symmetric filtering, and modifies the source operands while modifying the SIMD computation, so as to eliminate such heavy overhead of transposing a pixel matrix.
    Type: Application
    Filed: October 11, 2007
    Publication date: October 2, 2008
    Inventors: Guo Hui Lin, Yang Liu, Lu Wan, Min Zhu
  • Patent number: 7421565
    Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: September 2, 2008
    Assignee: Cray Inc.
    Inventor: James R. Kohn
  • Publication number: 20080209162
    Abstract: A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs.
    Type: Application
    Filed: April 28, 2008
    Publication date: August 28, 2008
    Inventors: Kazuya Furukawa, Tetsuya Tanaka, Nobuo Higaki, Kunihiko Hayashi, Hiroshi Kadota, Tokuzo Kiyohara, Kozo Kimura, Hideshi Nishida, Kazushi Kurata, Shigeki Fujii, Toshio Sugimura
  • Patent number: 7404065
    Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: July 22, 2008
    Assignee: Intel Corporation
    Inventors: Stephan Jourdan, Per Hammarlund, Michael Fetterman, Michael P. Cornaby, Glenn Hinton, Avinash Sodani
  • Patent number: 7386703
    Abstract: A processor and method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?1, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.
    Type: Grant
    Filed: November 18, 2003
    Date of Patent: June 10, 2008
    Assignee: International Business Machines Corporation
    Inventors: Peter A. Sandon, R. Michael P. West
  • Publication number: 20080126744
    Abstract: A method for operating a data processing system includes providing an application binary interface (ABI) which determines a set of non-contiguous volatile registers and a set of non-volatile registers. The set of non-contiguous volatile registers includes a plurality of general purpose registers (GPRs) and a plurality of special purpose registers (SPRs). The method includes providing less than three instructions which collectively load or store all of the set of non-contiguous volatile registers determined by the ABI. A system includes a set of volatile registers including a plurality of volatile GPRs, a plurality of volatile supervisor SPRs, and a plurality of volatile user SPRs, and execution circuitry for executing a first instruction that loads or stores the plurality of volatile supervisor SPRs, for executing a second instruction that loads or stores the plurality of volatile GPRs, and for executing a third instruction that loads or stores the plurality of volatile user SPRs.
    Type: Application
    Filed: August 29, 2006
    Publication date: May 29, 2008
    Inventor: William C. Moyer
  • Publication number: 20080126745
    Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.
    Type: Application
    Filed: October 26, 2007
    Publication date: May 29, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7366873
    Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: April 29, 2008
    Assignee: Cray, Inc.
    Inventor: James R. Kohn
  • Publication number: 20080082784
    Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.
    Type: Application
    Filed: October 26, 2007
    Publication date: April 3, 2008
    Applicant: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7353371
    Abstract: A method and device to copy data fields from one or more source packets to one or more result packets. In a SET function, adjacent data fields in a source packet is copied to respective destination data fields in a result packet governed by a field locator packet. In an ESET function, data fields in respective source packets are copied to adjacent data fields in a result packet governed by a field locator packet. In an EXTRACT function, data fields in a source packet are copied to adjacent data fields in a result packet governed by a field locator packet. In a SCATTER function, adjacent data fields in a source packet are copied to data fields in respective result packets governed by a field locator packet.
    Type: Grant
    Filed: December 5, 2002
    Date of Patent: April 1, 2008
    Assignee: Intel Corporation
    Inventors: Corey Gee, Bapi Vinnakota
  • Patent number: 7350057
    Abstract: Described herein is a method and system for executing instructions. The system comprises a scalar unit for executing scalar instructions each defining a single value pair; a vector unit for executing vector instructions each defining multiple value pairs; and an instruction decoder for receiving a single stream of instructions including scalar instructions and vector instructions and operable to direct scalar instructions to the scalar unit and vector instructions to the vector unit. The vector unit can comprises a plurality of value processing units and a scalar result unit. The scalar unit can comprise a scalar register file. Communication between the vector unit and the scalar unit is enabled by allowing the vector unit to access the scalar register file and allowing the scalar unit to access output from the scalar result unit. The output of the scalar result unit may be based on the relative magnitudes of outputs from the plurality of value processing units.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: March 25, 2008
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Publication number: 20080072010
    Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.
    Type: Application
    Filed: September 18, 2006
    Publication date: March 20, 2008
    Applicant: Freescale Semiconductor, Inc.
    Inventor: Chengke Sheng
  • Patent number: 7315932
    Abstract: Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned-extended when transferred.
    Type: Grant
    Filed: September 8, 2003
    Date of Patent: January 1, 2008
    Inventor: William C. Moyer
  • Patent number: 7293258
    Abstract: A data processor has a debug circuit arranged to monitor whether operand data used for execution of a program meets a debug exception condition. The debug exception condition tests a two or more of multi-bit subfields of a vector operand independently. Debug action is taken if one or more of the multi-bit subfields meet the corresponding conditions.
    Type: Grant
    Filed: May 17, 2000
    Date of Patent: November 6, 2007
    Assignee: NXP B.V.
    Inventors: Hendrikus Petrus Elisabeth Vranken, Kornelis Antonius Vissers, Fransiscus Wilhelmus Sijstermans
  • Patent number: 7254696
    Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.
    Type: Grant
    Filed: December 12, 2002
    Date of Patent: August 7, 2007
    Assignee: Alacritech, Inc.
    Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
  • Patent number: 7230633
    Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.
    Type: Grant
    Filed: January 11, 2006
    Date of Patent: June 12, 2007
    Assignee: Apple Inc.
    Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
  • Patent number: 7219212
    Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.
    Type: Grant
    Filed: February 25, 2005
    Date of Patent: May 15, 2007
    Assignee: Tensilica, Inc.
    Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
  • Patent number: 7216218
    Abstract: The present invention relates to the field of (micro)computer design and architecture, and in particular to microarchitecture associated with moving data values between a (micro)processor and memory components. Particularly, the present invention relates to a computer system with an processor architecture in which register addresses are generated with more than one execution channel controlled by one central processing unit with at least one load/store unit for loading and storing data objects, and at least one cache memory associated to the processor holding data objects accessed by the processor, wherein said processor's load/store unit contains a high speed memory directly interfacing said load/store unit to the cache and directly accessible by the cache memory for implementing scatter and gather operations. The present invention improves the performance of architectures with dual ported microprocessor implementations comprising two execution pipelines capable of two load/store data transactions per cycle.
    Type: Grant
    Filed: June 2, 2004
    Date of Patent: May 8, 2007
    Assignee: Broadcom Corporation
    Inventor: Sophie Wilson
  • Patent number: 7206857
    Abstract: A method is described that involves recognizing that an input queue state has reached a buffer's worth of information. The method also involves generating a first request to read a buffer's worth of information from an input RAM that implements the input queue. The method further involves recognizing that an output queue has room to receive information and that an intermediate queue that provides information to the output queue does not have information waiting to be forwarded to the output queue. The method also involves generating a second request to read information from the input RAM so that at least a portion of the room can be filled. The method also involves granting one of the first and second requests.
    Type: Grant
    Filed: May 10, 2002
    Date of Patent: April 17, 2007
    Assignee: Altera Corporation
    Inventors: Neil Mammen, Greg Maturi, Mammen Thomas
  • Patent number: 7200724
    Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grou
    Type: Grant
    Filed: January 17, 2006
    Date of Patent: April 3, 2007
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 7197625
    Abstract: The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register.
    Type: Grant
    Filed: September 15, 2000
    Date of Patent: March 27, 2007
    Assignee: MIPS Technologies, Inc.
    Inventors: Timothy J. van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
  • Patent number: 7111156
    Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: September 19, 2006
    Assignee: ATI Technologies, Inc.
    Inventors: Michael Andrew Mang, Michael Mantor
  • Patent number: 7107435
    Abstract: A system and method for processing multiple arbitrary sized data elements in a register. A method of the invention comprises the steps of: creating a mask register that defines a set of arbitrary sized segments for a register; storing a plurality of arbitrary sized data elements in a segmented data register arranged in accordance with the mask register, wherein the arbitrary sized data elements are sign extended; simultaneously operating on each of the of the data elements in the segmented data register to generate a set of resulting data elements in response to a machine instruction, wherein the resulting data elements depend on each other; and unpacking the resulting data elements to provide a plurality of arbitrary sized results that are independent of each other.
    Type: Grant
    Filed: May 27, 2003
    Date of Patent: September 12, 2006
    Assignee: International Business Machines Corporation
    Inventors: Michael T. Brady, Jennifer Q. Trelewicz, Joan L. Mitchell
  • Patent number: 7100026
    Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
    Type: Grant
    Filed: May 30, 2001
    Date of Patent: August 29, 2006
    Assignees: The Massachusetts Institute of Technology, The Board of Trustees of the Leland Stanford Junior University
    Inventors: William J. Dally, Scott Rixner, John D. Owens, Ujval J. Kapasi
  • Patent number: 7093102
    Abstract: Gather and scatter operations are used when elements of a vector which may be operated on in parallel are not located at successive addresses in memory. Prior data processing systems required complex address calculation hardware and other hardware to perform vector gather and scatter operations. By contrast, one embodiment of the present invention implements gather and scatter operations using a plurality of deposit and extract instructions. As a result, gather and scatter operations may be efficiently performed within a general purpose processing environment and without the need for dedicated gather/scatter hardware.
    Type: Grant
    Filed: March 29, 2000
    Date of Patent: August 15, 2006
    Assignee: Intel Corporation
    Inventor: Carole Dulong
  • Patent number: 7080216
    Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grou
    Type: Grant
    Filed: October 31, 2002
    Date of Patent: July 18, 2006
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 7065413
    Abstract: In a method for controlling mechanisms or technical systems, the mechanisms or technical systems to be controlled are stored in a controller with their states, and with associated signal formers of sensors and actuators, whereby starting from a defined reference state at the onset of the activation of the controller, the actual states signaled by the technical system via the sensors are continuously compared with the specified state, the specified state being stored in the controller, and, based on this comparison, every deviation from the specified state is identified in the technical system, and, when initiated, a new instruction that changes the state of the mechanisms or of the technical system updates the specified state for the comparison and monitors the time till the acknowledgment of the new state, and sensor signals and comparable information exclusively serve the state identification of elementary functions and state changes exclusively ensue upon the initiation of elementary instructions.
    Type: Grant
    Filed: April 3, 2001
    Date of Patent: June 20, 2006
    Assignee: Technische Universitaet Dresden
    Inventors: Volker Moebius, Knut Grossmann
  • Patent number: 7055018
    Abstract: Methods and apparatuses for performing simultaneous table look-up using multiple look-up tables. In one aspect of the invention, an execution unit in a microprocessor includes: look-up memory and a first circuit coupled to the look-up memory. In response to the microprocessor receiving a first instruction, the first circuit partitions the look-up memory into a first plurality of look-up tables. In response to the microprocessor receiving a second instruction, the first circuit partitions the look-up memory into a second plurality of look-up tables; and the second plurality of look-up tables simultaneously look up a plurality of entries.
    Type: Grant
    Filed: December 31, 2001
    Date of Patent: May 30, 2006
    Assignee: Apple Computer, Inc.
    Inventors: Joseph P. Bratt, Sushma Shrikant Trivedi
  • Patent number: 7034849
    Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.
    Type: Grant
    Filed: December 31, 2001
    Date of Patent: April 25, 2006
    Assignee: Apple Computer, Inc.
    Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
  • Patent number: 7036112
    Abstract: One embodiment of the present invention provides a system that facilitates implementing multi-mode specification-driven disassembler. During operation, the disassembler receives a machine-code version of a computer program. In order to disassemble a specific machine-code instruction from this machine-code version, the system compares the machine-code instruction against a set of instruction templates for assembly code instructions to identify a set of matching templates. Next, the system selects a matching template from the set of matching templates based on the state of a mode variable, which indicates a specificity mode for the disassembler. The system then disassembles the machine-code instruction using the operand fields defined by the matching template to produce a corresponding assembly code instruction.
    Type: Grant
    Filed: August 16, 2002
    Date of Patent: April 25, 2006
    Assignee: SUN Microsystems, Inc.
    Inventors: David M. Ungar, Mario I. Wolczko, Bernd J. W. Mathiske
  • Patent number: 7000099
    Abstract: A lookup operation is carried out on a data table by logically dividing the data table into a number of smaller sets of data that can be indexed with a single byte of data. Each set of data consists of two vectors, which constitute the operands for a permute instruction. Only a limited number of bits are required to index into the table during the execution of this instruction. The remaining bits of each index are used as masks into a series of select instructions. The select instruction chooses between two vector components, based on the mask, and places the selected components into a new vector. The mask is generated by shifting one of the higher order bits of the index to the most significant position, and then propagating that bit throughout a byte, for example by means of an arithmetic shift. This procedure is carried out for all of the index bytes in the vector, to generate a select mask.
    Type: Grant
    Filed: July 9, 2002
    Date of Patent: February 14, 2006
    Assignee: Apple Computer Inc.
    Inventor: Ali Sazegari
  • Patent number: 6963341
    Abstract: The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: November 8, 2005
    Inventor: Tibet Mimar
  • Patent number: 6954927
    Abstract: A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.
    Type: Grant
    Filed: October 4, 2001
    Date of Patent: October 11, 2005
    Assignee: Elbrus International
    Inventor: Alexander Y. Ostanevich
  • Patent number: 6931511
    Abstract: Methods and apparatuses for looking up vectors in parallel using vector table look up operations. In one aspect of the invention, a method to look up a plurality of data items indexed by a vector of indices includes: generating a second vector of indices in a vector register where each index of the second vector of indices is one of a first vector of indices and at least one index in the first vector of indices is replicated as a plurality of duplicated indices in the second vector of indices; and looking up simultaneously a first vector of data items from a plurality of look up tables using the second vector of indices.
    Type: Grant
    Filed: December 31, 2001
    Date of Patent: August 16, 2005
    Assignee: Apple Computer, Inc.
    Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
  • Patent number: 6924802
    Abstract: A system, method, and computer program product are provided for generating display data. The data processing system loads coefficient values corresponding to a behavior of a selected function in pre-defined ranges of input data. The data processing system then determines, responsive to items of input data, the range of input data in which the selected function is to be estimated. The data processing system then selects, through the use of a vector permute function, the coefficient values, and evaluates an index function at the each of the items of input data. It then estimates the value of the selected function through parallel mathematical operations on the items of input data, the selected coefficient values, and the values of the index function, and, responsive to the one or more values of the selected function, generates display data.
    Type: Grant
    Filed: September 12, 2002
    Date of Patent: August 2, 2005
    Assignee: International Business Machines Corporation
    Inventors: Gordon Clyde Fossum, Harm Peter Hofstee, Barry L. Minor, Mark Richard Nutter
  • Patent number: 6915411
    Abstract: A digital signal processor (DSP) includes a SIMD-based organization wherein operations are executed on a plurality of single-instruction multiple data (SIMD) datapaths or stages connected in cascade. The functionality and data values at each stage may be different, including a different width (e.g., a different number of bits per value) in each stage. The operands and destination for data in a computational datapath are selected indirectly through vector pointer registers in a vector pointers datapath. Each vector pointer register contains a plurality of pointers into a register file of a computational datapath.
    Type: Grant
    Filed: July 18, 2002
    Date of Patent: July 5, 2005
    Assignee: International Business Machines Corporation
    Inventors: Jamie H. Moreno, Jeffrey Haskell Derby, Uzi Shvadron, Fredy Daniel Neeser, Victor Zyuban, Ayal Zaks, Shay Ben-David
  • Patent number: 6865517
    Abstract: A method, apparatus and computer product that enables a processor associated with a node in a computer system having various nodes, the nodes having sensors which provide data, and the nodes being connected by a communications facility acquiring local data from the sensor and remote data from other nodes via the data transfer facility. The nodes process data from a local sensor at the node and from remote sensors at other nodes; and analyze the local data, data from other nodes and local decisions made at and received from other nodes to make a local decision for action at the node. A local decision made at a node is in turn communicated to other nodes.
    Type: Grant
    Filed: December 11, 2002
    Date of Patent: March 8, 2005
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, John S. Davis, II, Rafah A. Hosn, Nicholas M. Mitchell, Veronique Perret, Daby M. Sow, Jeremy B. Sussman
  • Publication number: 20040250044
    Abstract: The object of the invention is to efficiently perform indirect index vector reference. An element register of a vector register or a scalar register specified in the “index” is divided into multiple areas, and a particular index vector is acquired by selecting any of the divided areas. Accordingly, it is possible to store substantially multiple index vectors in one vector register, and therefore register resources can be efficiently used. The procedure for providing index vectors is similar to that for providing one index vector, and therefore the code size and the process cycles of the program are almost not increased. That is, according to the present invention, indirect index vector reference can be more efficiently performed.
    Type: Application
    Filed: March 17, 2004
    Publication date: December 9, 2004
    Applicant: SEIKO EPSON CORPORATION
    Inventor: Masakazu Isomura
  • Patent number: 6829696
    Abstract: A data processing system (e.g., microprocessor 30) for packing register data while storing it to memory and unpacking data read from memory while loading it into registers using single processor instructions. The system comprises a memory (42) and a central processing unit core (44) with at least one register file (76). The core is responsive to a load instruction (e.g., LDW_BH[U] instruction 184) to retrieve at least one data word from memory and parse the data word over selected parts of at least two data registers in the register file. The core is responsive to a store instruction (e.g., STBH_W instruction 198) to concatenate data from selected parts of at least two data registers into at least one data word and save the data word to memory. The number of data registers is greater than the number of data words parsed into or concatenated from the data registers. Both memory storage space and central processor unit resources are utilized efficiently when working with packed data.
    Type: Grant
    Filed: October 13, 2000
    Date of Patent: December 7, 2004
    Assignee: Texas Instruments Incorporated
    Inventors: Keith Balmer, Karl M. Guttag, Lewis Nardini
  • Publication number: 20040243788
    Abstract: The object of the invention is to efficiently perform a vector operation using a vector register. A vector processor is provided with a vector register forming a ring buffer, and any address of the ring buffer can be specified as the top address. Accordingly, when multiple vector data to be processed are overlapped, it is possible to circularly read or write the vector data stored in one vector register without storing the vector data in separate vector registers. Thus, it is possible to prevent the same data from being redundantly read as well as to decrease register resources to be required, thereby enabling an efficient vector operation using a vector register.
    Type: Application
    Filed: March 17, 2004
    Publication date: December 2, 2004
    Applicant: SEIKO EPSON CORPORATION
    Inventor: Masakazu Isomura
  • Patent number: 6816960
    Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.
    Type: Grant
    Filed: July 10, 2001
    Date of Patent: November 9, 2004
    Assignee: NEC Corporation
    Inventor: Hisao Koyanagi
  • Patent number: 6813701
    Abstract: A compiler and vector data transfer instructions for use in a vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. The compiler identifies the use of vector data in an application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. A vector is partitioned by the compiler into variable-sized streams which are transferred into and out of the processor as burst transactions. The compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers and each vector buffer is used at a specific time.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: November 2, 2004
    Assignee: NEC Electronics America, Inc.
    Inventor: Ahmad R. Ansari
  • Publication number: 20040193839
    Abstract: An active memory device includes a command engine that receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored. The active memory device includes a vector processing and re-ordering system coupled to the array control unit and the memory device. The vector processing and re-ordering system re-orders data received from the memory device into a vector of contiguous data, process the data in accordance with an instruction received from the array control unit to provide results data, and passes the results data to the memory device.
    Type: Application
    Filed: July 28, 2003
    Publication date: September 30, 2004
    Inventor: Graham Kirsch
  • Publication number: 20040186980
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.
    Type: Application
    Filed: March 29, 2004
    Publication date: September 23, 2004
    Inventor: Ahmad R. Ansari
  • Patent number: 6795908
    Abstract: A method for processing scalar and vector executions, where vector executions may be “true” vector operations, CVA, or pseudo-vector operations, PVA. All three types of executions are processed using one architecture. In one embodiment, a compiler analyzes code to identify sections that are vectorizable, and applies either CVA, PVA, or a combination of the two to process these sections. Register overlay is provided for storing load address information and data in PVA mode. Within each CVA and PVA instruction, enable bits describe the data streaming function of the operation. A temporary memory, TM, accommodates variable size vectors, and is used in vector operations, similar to a vector register, to store temporary vectors.
    Type: Grant
    Filed: June 12, 2000
    Date of Patent: September 21, 2004
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Lea Hwang Lee, William C. Moyer
  • Publication number: 20040181646
    Abstract: An apparatus and method are provided for updating one or more pluralities of pointers (i.e. one or more vector pointers) which are used for accessing one or more pluralities of data elements (i.e. one or more vector data elements) in a multi-ported memory. A first register file holds the vector pointers, a second register file holds stride data, and a plurality of functional units combine data from the second register file with data from the first register file. The results of combining the data are transferred to the first register file and represent updated vector pointers. Furthermore, a third register file is provided for holding modulus selector data to specify the size of a circular buffer for circular addressing.
    Type: Application
    Filed: March 14, 2003
    Publication date: September 16, 2004
    Applicant: International Business Machines Corporation
    Inventors: Shay Ben-David, Jeffrey Haskell Derby, Thomas W. Fox, Fredy Daniel Neeser, Jaime H. Moreno, Uzi Shvadron, Ayal Zaks
  • Patent number: 6789181
    Abstract: A method and computer for executing the method. A source program is translated into an object program, in a manner in which the translated object program has a different execution behavior than the source program. The translated object program is executed under a monitor capable of detecting any deviation from fully-correct interpretation before any side-effect of the different execution behavior is irreversibly committed. When the monitor detects the deviation, or when an interrupt occurs during execution of the object program, a state of the program is established corresponding to a state that would have occurred during an execution of the source program, and from which execution can continue. Execution of the source program continues primarily in a hardware emulator designed to execute instructions of an instruction set non-native to the computer.
    Type: Grant
    Filed: November 3, 1999
    Date of Patent: September 7, 2004
    Assignee: ATI International, SRL
    Inventors: John S. Yates, David L. Reese, Korbin S. Van Dyke, Paul H. Hohensee
  • Publication number: 20040172517
    Abstract: An orthogonal data converter for converting the components of a sequential vector component flow to a parallel vector component flow. The data converter has an input rotator configured to rotate corresponding vector components of the sequential vector component flow by a prescribed amount, and a bank of register files configured to store the rotated vector components. The converter also has an output rotator configured to rotate the position of the vector components read from the bank of register files by a prescribed amount. A controller of the converter is operative to control the addressing of the bank of register files and the rotating of the vector components. In this regard, the controller is operative to write the vector components to the bank of register files in a prescribed order and read the vector components in a prescribed order to generate the parallel vector component flow.
    Type: Application
    Filed: September 19, 2003
    Publication date: September 2, 2004
    Inventors: Boris Prokopenko, Timour Paltashev