Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
-
Publication number: 20080288745Abstract: A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values.Type: ApplicationFiled: July 11, 2008Publication date: November 20, 2008Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20080256329Abstract: Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.Type: ApplicationFiled: April 13, 2007Publication date: October 16, 2008Inventors: Daniel T. Heinze, Mark L. Morsch
-
Publication number: 20080244220Abstract: A filter and method of filtering modifies the computation order to accommodate horizontal symmetric filtering, and modifies the source operands while modifying the SIMD computation, so as to eliminate such heavy overhead of transposing a pixel matrix.Type: ApplicationFiled: October 11, 2007Publication date: October 2, 2008Inventors: Guo Hui Lin, Yang Liu, Lu Wan, Min Zhu
-
Patent number: 7421565Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.Type: GrantFiled: August 18, 2003Date of Patent: September 2, 2008Assignee: Cray Inc.Inventor: James R. Kohn
-
Publication number: 20080209162Abstract: A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs.Type: ApplicationFiled: April 28, 2008Publication date: August 28, 2008Inventors: Kazuya Furukawa, Tetsuya Tanaka, Nobuo Higaki, Kunihiko Hayashi, Hiroshi Kadota, Tokuzo Kiyohara, Kozo Kimura, Hideshi Nishida, Kazushi Kurata, Shigeki Fujii, Toshio Sugimura
-
Patent number: 7404065Abstract: In one embodiment, a method for flow optimization and prediction for vector streaming single instruction, multiple data (SIMD) extension (VSSE) memory operations is disclosed. The method comprises generating an optimized micro-operation (?op) flow for an instruction to operate on a vector if the instruction is predicted to be unmasked and unit-stride, the instruction to access elements in memory, and accessing via the optimized ?op flow two or more of the elements at the same time without determining masks of the two or more elements. Other embodiments are also described.Type: GrantFiled: December 21, 2005Date of Patent: July 22, 2008Assignee: Intel CorporationInventors: Stephan Jourdan, Per Hammarlund, Michael Fetterman, Michael P. Cornaby, Glenn Hinton, Avinash Sodani
-
Patent number: 7386703Abstract: A processor and method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?1, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.Type: GrantFiled: November 18, 2003Date of Patent: June 10, 2008Assignee: International Business Machines CorporationInventors: Peter A. Sandon, R. Michael P. West
-
Publication number: 20080126744Abstract: A method for operating a data processing system includes providing an application binary interface (ABI) which determines a set of non-contiguous volatile registers and a set of non-volatile registers. The set of non-contiguous volatile registers includes a plurality of general purpose registers (GPRs) and a plurality of special purpose registers (SPRs). The method includes providing less than three instructions which collectively load or store all of the set of non-contiguous volatile registers determined by the ABI. A system includes a set of volatile registers including a plurality of volatile GPRs, a plurality of volatile supervisor SPRs, and a plurality of volatile user SPRs, and execution circuitry for executing a first instruction that loads or stores the plurality of volatile supervisor SPRs, for executing a second instruction that loads or stores the plurality of volatile GPRs, and for executing a third instruction that loads or stores the plurality of volatile user SPRs.Type: ApplicationFiled: August 29, 2006Publication date: May 29, 2008Inventor: William C. Moyer
-
Operand Multiplexor Control Modifier Instruction in a Fine Grain Multithreaded Vector Microprocessor
Publication number: 20080126745Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to an instruction set for processing images. Vector processing may involve rearranging vector operands in one or more source registers prior to performing vector operations. Typically, rearranging of operands in source registers is done by issuing a plurality of permute instructions that require excessive usage of temporary registers. Furthermore, the permute instructions may cause dependencies between instructions executing in a pipeline, thereby adversely affecting performance. Embodiments of the invention provide a level of muxing between a register file and a vector unit that allow for rearrangement of vector operands in source registers prior to providing the operands to the vector unit, thereby obviating the need for permute instructions.Type: ApplicationFiled: October 26, 2007Publication date: May 29, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs -
Patent number: 7366873Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.Type: GrantFiled: August 18, 2003Date of Patent: April 29, 2008Assignee: Cray, Inc.Inventor: James R. Kohn
-
Publication number: 20080082784Abstract: The present invention is generally related to integrated circuit devices, and more particularly, to methods, systems and design structures for the field of image processing, and more specifically to vector units for supporting image processing. A dual vector unit implementation is described wherein two vector units are configured receive data from a common register file. The vector units may independently and simultaneously process instructions. Furthermore, the vector units may be adapted to perform scalar operations thereby integrating the vector and scalar processing. The vector units may also be configured to share resources to perform an operation, for example, a cross product operation.Type: ApplicationFiled: October 26, 2007Publication date: April 3, 2008Applicant: International Business Machines CorporationInventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
-
Patent number: 7353371Abstract: A method and device to copy data fields from one or more source packets to one or more result packets. In a SET function, adjacent data fields in a source packet is copied to respective destination data fields in a result packet governed by a field locator packet. In an ESET function, data fields in respective source packets are copied to adjacent data fields in a result packet governed by a field locator packet. In an EXTRACT function, data fields in a source packet are copied to adjacent data fields in a result packet governed by a field locator packet. In a SCATTER function, adjacent data fields in a source packet are copied to data fields in respective result packets governed by a field locator packet.Type: GrantFiled: December 5, 2002Date of Patent: April 1, 2008Assignee: Intel CorporationInventors: Corey Gee, Bapi Vinnakota
-
Patent number: 7350057Abstract: Described herein is a method and system for executing instructions. The system comprises a scalar unit for executing scalar instructions each defining a single value pair; a vector unit for executing vector instructions each defining multiple value pairs; and an instruction decoder for receiving a single stream of instructions including scalar instructions and vector instructions and operable to direct scalar instructions to the scalar unit and vector instructions to the vector unit. The vector unit can comprises a plurality of value processing units and a scalar result unit. The scalar unit can comprise a scalar register file. Communication between the vector unit and the scalar unit is enabled by allowing the vector unit to access the scalar register file and allowing the scalar unit to access output from the scalar result unit. The output of the scalar result unit may be based on the relative magnitudes of outputs from the plurality of value processing units.Type: GrantFiled: November 6, 2006Date of Patent: March 25, 2008Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Publication number: 20080072010Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.Type: ApplicationFiled: September 18, 2006Publication date: March 20, 2008Applicant: Freescale Semiconductor, Inc.Inventor: Chengke Sheng
-
Patent number: 7315932Abstract: Various load and store instructions may be used to transfer multiple vector elements between registers in a register file and memory. A cnt parameter may be used to indicate a total number of elements to be transferred to or from memory, and an rcnt parameter may be used to indicate a maximum number of vector elements that may be transferred to or from a single register within a register file. Also, the instructions may use a variety of different addressing modes. The memory element size may be specified independently from the register element size such that source and destination sizes may differ within an instruction. With some instructions, a vector stream may be initiated and conditionally enqueued or dequeued. Truncation or rounding fields may be provided such that source data elements may be truncated or rounded when transferred. Also, source data elements may be sign- or unsigned-extended when transferred.Type: GrantFiled: September 8, 2003Date of Patent: January 1, 2008Inventor: William C. Moyer
-
Patent number: 7293258Abstract: A data processor has a debug circuit arranged to monitor whether operand data used for execution of a program meets a debug exception condition. The debug exception condition tests a two or more of multi-bit subfields of a vector operand independently. Debug action is taken if one or more of the multi-bit subfields meet the corresponding conditions.Type: GrantFiled: May 17, 2000Date of Patent: November 6, 2007Assignee: NXP B.V.Inventors: Hendrikus Petrus Elisabeth Vranken, Kornelis Antonius Vissers, Fransiscus Wilhelmus Sijstermans
-
Patent number: 7254696Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.Type: GrantFiled: December 12, 2002Date of Patent: August 7, 2007Assignee: Alacritech, Inc.Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
-
Patent number: 7230633Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.Type: GrantFiled: January 11, 2006Date of Patent: June 12, 2007Assignee: Apple Inc.Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
-
Patent number: 7219212Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.Type: GrantFiled: February 25, 2005Date of Patent: May 15, 2007Assignee: Tensilica, Inc.Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
-
Patent number: 7216218Abstract: The present invention relates to the field of (micro)computer design and architecture, and in particular to microarchitecture associated with moving data values between a (micro)processor and memory components. Particularly, the present invention relates to a computer system with an processor architecture in which register addresses are generated with more than one execution channel controlled by one central processing unit with at least one load/store unit for loading and storing data objects, and at least one cache memory associated to the processor holding data objects accessed by the processor, wherein said processor's load/store unit contains a high speed memory directly interfacing said load/store unit to the cache and directly accessible by the cache memory for implementing scatter and gather operations. The present invention improves the performance of architectures with dual ported microprocessor implementations comprising two execution pipelines capable of two load/store data transactions per cycle.Type: GrantFiled: June 2, 2004Date of Patent: May 8, 2007Assignee: Broadcom CorporationInventor: Sophie Wilson
-
Patent number: 7206857Abstract: A method is described that involves recognizing that an input queue state has reached a buffer's worth of information. The method also involves generating a first request to read a buffer's worth of information from an input RAM that implements the input queue. The method further involves recognizing that an output queue has room to receive information and that an intermediate queue that provides information to the output queue does not have information waiting to be forwarded to the output queue. The method also involves generating a second request to read information from the input RAM so that at least a portion of the room can be filled. The method also involves granting one of the first and second requests.Type: GrantFiled: May 10, 2002Date of Patent: April 17, 2007Assignee: Altera CorporationInventors: Neil Mammen, Greg Maturi, Mammen Thomas
-
Patent number: 7200724Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grouType: GrantFiled: January 17, 2006Date of Patent: April 3, 2007Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 7197625Abstract: The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register.Type: GrantFiled: September 15, 2000Date of Patent: March 27, 2007Assignee: MIPS Technologies, Inc.Inventors: Timothy J. van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
-
Patent number: 7111156Abstract: A method and apparatus for enhancing flexibility of instruction ordering in a multi-thread processing system that performs multiply and accumulate operations is presented. A plurality of accumulation registers is provided for storing the results of an adder, wherein each of the plurality of accumulation registers corresponds to a different thread of the plurality of threads. The contents of each of the plurality of accumulation registers can be selected as an input to the adder such that the present accumulated value can be added to a subsequently calculated produce to generate a new accumulated value.Type: GrantFiled: April 21, 2000Date of Patent: September 19, 2006Assignee: ATI Technologies, Inc.Inventors: Michael Andrew Mang, Michael Mantor
-
Patent number: 7107435Abstract: A system and method for processing multiple arbitrary sized data elements in a register. A method of the invention comprises the steps of: creating a mask register that defines a set of arbitrary sized segments for a register; storing a plurality of arbitrary sized data elements in a segmented data register arranged in accordance with the mask register, wherein the arbitrary sized data elements are sign extended; simultaneously operating on each of the of the data elements in the segmented data register to generate a set of resulting data elements in response to a machine instruction, wherein the resulting data elements depend on each other; and unpacking the resulting data elements to provide a plurality of arbitrary sized results that are independent of each other.Type: GrantFiled: May 27, 2003Date of Patent: September 12, 2006Assignee: International Business Machines CorporationInventors: Michael T. Brady, Jennifer Q. Trelewicz, Joan L. Mitchell
-
Patent number: 7100026Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.Type: GrantFiled: May 30, 2001Date of Patent: August 29, 2006Assignees: The Massachusetts Institute of Technology, The Board of Trustees of the Leland Stanford Junior UniversityInventors: William J. Dally, Scott Rixner, John D. Owens, Ujval J. Kapasi
-
Patent number: 7093102Abstract: Gather and scatter operations are used when elements of a vector which may be operated on in parallel are not located at successive addresses in memory. Prior data processing systems required complex address calculation hardware and other hardware to perform vector gather and scatter operations. By contrast, one embodiment of the present invention implements gather and scatter operations using a plurality of deposit and extract instructions. As a result, gather and scatter operations may be efficiently performed within a general purpose processing environment and without the need for dedicated gather/scatter hardware.Type: GrantFiled: March 29, 2000Date of Patent: August 15, 2006Assignee: Intel CorporationInventor: Carole Dulong
-
Patent number: 7080216Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grouType: GrantFiled: October 31, 2002Date of Patent: July 18, 2006Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 7065413Abstract: In a method for controlling mechanisms or technical systems, the mechanisms or technical systems to be controlled are stored in a controller with their states, and with associated signal formers of sensors and actuators, whereby starting from a defined reference state at the onset of the activation of the controller, the actual states signaled by the technical system via the sensors are continuously compared with the specified state, the specified state being stored in the controller, and, based on this comparison, every deviation from the specified state is identified in the technical system, and, when initiated, a new instruction that changes the state of the mechanisms or of the technical system updates the specified state for the comparison and monitors the time till the acknowledgment of the new state, and sensor signals and comparable information exclusively serve the state identification of elementary functions and state changes exclusively ensue upon the initiation of elementary instructions.Type: GrantFiled: April 3, 2001Date of Patent: June 20, 2006Assignee: Technische Universitaet DresdenInventors: Volker Moebius, Knut Grossmann
-
Patent number: 7055018Abstract: Methods and apparatuses for performing simultaneous table look-up using multiple look-up tables. In one aspect of the invention, an execution unit in a microprocessor includes: look-up memory and a first circuit coupled to the look-up memory. In response to the microprocessor receiving a first instruction, the first circuit partitions the look-up memory into a first plurality of look-up tables. In response to the microprocessor receiving a second instruction, the first circuit partitions the look-up memory into a second plurality of look-up tables; and the second plurality of look-up tables simultaneously look up a plurality of entries.Type: GrantFiled: December 31, 2001Date of Patent: May 30, 2006Assignee: Apple Computer, Inc.Inventors: Joseph P. Bratt, Sushma Shrikant Trivedi
-
Patent number: 7034849Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.Type: GrantFiled: December 31, 2001Date of Patent: April 25, 2006Assignee: Apple Computer, Inc.Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
-
Patent number: 7036112Abstract: One embodiment of the present invention provides a system that facilitates implementing multi-mode specification-driven disassembler. During operation, the disassembler receives a machine-code version of a computer program. In order to disassemble a specific machine-code instruction from this machine-code version, the system compares the machine-code instruction against a set of instruction templates for assembly code instructions to identify a set of matching templates. Next, the system selects a matching template from the set of matching templates based on the state of a mode variable, which indicates a specificity mode for the disassembler. The system then disassembles the machine-code instruction using the operand fields defined by the matching template to produce a corresponding assembly code instruction.Type: GrantFiled: August 16, 2002Date of Patent: April 25, 2006Assignee: SUN Microsystems, Inc.Inventors: David M. Ungar, Mario I. Wolczko, Bernd J. W. Mathiske
-
Patent number: 7000099Abstract: A lookup operation is carried out on a data table by logically dividing the data table into a number of smaller sets of data that can be indexed with a single byte of data. Each set of data consists of two vectors, which constitute the operands for a permute instruction. Only a limited number of bits are required to index into the table during the execution of this instruction. The remaining bits of each index are used as masks into a series of select instructions. The select instruction chooses between two vector components, based on the mask, and places the selected components into a new vector. The mask is generated by shifting one of the higher order bits of the index to the most significant position, and then propagating that bit throughout a byte, for example by means of an arithmetic shift. This procedure is carried out for all of the index bytes in the vector, to generate a select mask.Type: GrantFiled: July 9, 2002Date of Patent: February 14, 2006Assignee: Apple Computer Inc.Inventor: Ali Sazegari
-
Patent number: 6963341Abstract: The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.Type: GrantFiled: May 20, 2003Date of Patent: November 8, 2005Inventor: Tibet Mimar
-
Patent number: 6954927Abstract: A method for optimizing a software pipelineable loop in a software code is provided. The loop comprises one or more pipelined stages and one or more loop operations. The method comprises evaluating an initiation interval time (IN) for a pipelined stage of the loop. A loop operation time latency (Tld) and a number of loop operations (Np) from the pipelined stages to peel based on IN and Tld is then determined. The loop operation is peeled Np times and copied before the loop in the software code. A vector of registers is allocated and the results of the peeled loop operations and a result of an original loop operation is assigned to the vector of registers. Memory addresses for the results of the peeled loop operations and original loop operation are also assigned.Type: GrantFiled: October 4, 2001Date of Patent: October 11, 2005Assignee: Elbrus InternationalInventor: Alexander Y. Ostanevich
-
Patent number: 6931511Abstract: Methods and apparatuses for looking up vectors in parallel using vector table look up operations. In one aspect of the invention, a method to look up a plurality of data items indexed by a vector of indices includes: generating a second vector of indices in a vector register where each index of the second vector of indices is one of a first vector of indices and at least one index in the first vector of indices is replicated as a plurality of duplicated indices in the second vector of indices; and looking up simultaneously a first vector of data items from a plurality of look up tables using the second vector of indices.Type: GrantFiled: December 31, 2001Date of Patent: August 16, 2005Assignee: Apple Computer, Inc.Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
-
Patent number: 6924802Abstract: A system, method, and computer program product are provided for generating display data. The data processing system loads coefficient values corresponding to a behavior of a selected function in pre-defined ranges of input data. The data processing system then determines, responsive to items of input data, the range of input data in which the selected function is to be estimated. The data processing system then selects, through the use of a vector permute function, the coefficient values, and evaluates an index function at the each of the items of input data. It then estimates the value of the selected function through parallel mathematical operations on the items of input data, the selected coefficient values, and the values of the index function, and, responsive to the one or more values of the selected function, generates display data.Type: GrantFiled: September 12, 2002Date of Patent: August 2, 2005Assignee: International Business Machines CorporationInventors: Gordon Clyde Fossum, Harm Peter Hofstee, Barry L. Minor, Mark Richard Nutter
-
Patent number: 6915411Abstract: A digital signal processor (DSP) includes a SIMD-based organization wherein operations are executed on a plurality of single-instruction multiple data (SIMD) datapaths or stages connected in cascade. The functionality and data values at each stage may be different, including a different width (e.g., a different number of bits per value) in each stage. The operands and destination for data in a computational datapath are selected indirectly through vector pointer registers in a vector pointers datapath. Each vector pointer register contains a plurality of pointers into a register file of a computational datapath.Type: GrantFiled: July 18, 2002Date of Patent: July 5, 2005Assignee: International Business Machines CorporationInventors: Jamie H. Moreno, Jeffrey Haskell Derby, Uzi Shvadron, Fredy Daniel Neeser, Victor Zyuban, Ayal Zaks, Shay Ben-David
-
Patent number: 6865517Abstract: A method, apparatus and computer product that enables a processor associated with a node in a computer system having various nodes, the nodes having sensors which provide data, and the nodes being connected by a communications facility acquiring local data from the sensor and remote data from other nodes via the data transfer facility. The nodes process data from a local sensor at the node and from remote sensors at other nodes; and analyze the local data, data from other nodes and local decisions made at and received from other nodes to make a local decision for action at the node. A local decision made at a node is in turn communicated to other nodes.Type: GrantFiled: December 11, 2002Date of Patent: March 8, 2005Assignee: International Business Machines CorporationInventors: David F. Bantz, John S. Davis, II, Rafah A. Hosn, Nicholas M. Mitchell, Veronique Perret, Daby M. Sow, Jeremy B. Sussman
-
Publication number: 20040250044Abstract: The object of the invention is to efficiently perform indirect index vector reference. An element register of a vector register or a scalar register specified in the “index” is divided into multiple areas, and a particular index vector is acquired by selecting any of the divided areas. Accordingly, it is possible to store substantially multiple index vectors in one vector register, and therefore register resources can be efficiently used. The procedure for providing index vectors is similar to that for providing one index vector, and therefore the code size and the process cycles of the program are almost not increased. That is, according to the present invention, indirect index vector reference can be more efficiently performed.Type: ApplicationFiled: March 17, 2004Publication date: December 9, 2004Applicant: SEIKO EPSON CORPORATIONInventor: Masakazu Isomura
-
Patent number: 6829696Abstract: A data processing system (e.g., microprocessor 30) for packing register data while storing it to memory and unpacking data read from memory while loading it into registers using single processor instructions. The system comprises a memory (42) and a central processing unit core (44) with at least one register file (76). The core is responsive to a load instruction (e.g., LDW_BH[U] instruction 184) to retrieve at least one data word from memory and parse the data word over selected parts of at least two data registers in the register file. The core is responsive to a store instruction (e.g., STBH_W instruction 198) to concatenate data from selected parts of at least two data registers into at least one data word and save the data word to memory. The number of data registers is greater than the number of data words parsed into or concatenated from the data registers. Both memory storage space and central processor unit resources are utilized efficiently when working with packed data.Type: GrantFiled: October 13, 2000Date of Patent: December 7, 2004Assignee: Texas Instruments IncorporatedInventors: Keith Balmer, Karl M. Guttag, Lewis Nardini
-
Publication number: 20040243788Abstract: The object of the invention is to efficiently perform a vector operation using a vector register. A vector processor is provided with a vector register forming a ring buffer, and any address of the ring buffer can be specified as the top address. Accordingly, when multiple vector data to be processed are overlapped, it is possible to circularly read or write the vector data stored in one vector register without storing the vector data in separate vector registers. Thus, it is possible to prevent the same data from being redundantly read as well as to decrease register resources to be required, thereby enabling an efficient vector operation using a vector register.Type: ApplicationFiled: March 17, 2004Publication date: December 2, 2004Applicant: SEIKO EPSON CORPORATIONInventor: Masakazu Isomura
-
Patent number: 6816960Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.Type: GrantFiled: July 10, 2001Date of Patent: November 9, 2004Assignee: NEC CorporationInventor: Hisao Koyanagi
-
Patent number: 6813701Abstract: A compiler and vector data transfer instructions for use in a vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. The compiler identifies the use of vector data in an application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. A vector is partitioned by the compiler into variable-sized streams which are transferred into and out of the processor as burst transactions. The compiler schedules transfers of vector streams required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers and each vector buffer is used at a specific time.Type: GrantFiled: August 17, 1999Date of Patent: November 2, 2004Assignee: NEC Electronics America, Inc.Inventor: Ahmad R. Ansari
-
Publication number: 20040193839Abstract: An active memory device includes a command engine that receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored. The active memory device includes a vector processing and re-ordering system coupled to the array control unit and the memory device. The vector processing and re-ordering system re-orders data received from the memory device into a vector of contiguous data, process the data in accordance with an instruction received from the array control unit to provide results data, and passes the results data to the memory device.Type: ApplicationFiled: July 28, 2003Publication date: September 30, 2004Inventor: Graham Kirsch
-
Publication number: 20040186980Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: ApplicationFiled: March 29, 2004Publication date: September 23, 2004Inventor: Ahmad R. Ansari
-
Patent number: 6795908Abstract: A method for processing scalar and vector executions, where vector executions may be “true” vector operations, CVA, or pseudo-vector operations, PVA. All three types of executions are processed using one architecture. In one embodiment, a compiler analyzes code to identify sections that are vectorizable, and applies either CVA, PVA, or a combination of the two to process these sections. Register overlay is provided for storing load address information and data in PVA mode. Within each CVA and PVA instruction, enable bits describe the data streaming function of the operation. A temporary memory, TM, accommodates variable size vectors, and is used in vector operations, similar to a vector register, to store temporary vectors.Type: GrantFiled: June 12, 2000Date of Patent: September 21, 2004Assignee: Freescale Semiconductor, Inc.Inventors: Lea Hwang Lee, William C. Moyer
-
Publication number: 20040181646Abstract: An apparatus and method are provided for updating one or more pluralities of pointers (i.e. one or more vector pointers) which are used for accessing one or more pluralities of data elements (i.e. one or more vector data elements) in a multi-ported memory. A first register file holds the vector pointers, a second register file holds stride data, and a plurality of functional units combine data from the second register file with data from the first register file. The results of combining the data are transferred to the first register file and represent updated vector pointers. Furthermore, a third register file is provided for holding modulus selector data to specify the size of a circular buffer for circular addressing.Type: ApplicationFiled: March 14, 2003Publication date: September 16, 2004Applicant: International Business Machines CorporationInventors: Shay Ben-David, Jeffrey Haskell Derby, Thomas W. Fox, Fredy Daniel Neeser, Jaime H. Moreno, Uzi Shvadron, Ayal Zaks
-
Patent number: 6789181Abstract: A method and computer for executing the method. A source program is translated into an object program, in a manner in which the translated object program has a different execution behavior than the source program. The translated object program is executed under a monitor capable of detecting any deviation from fully-correct interpretation before any side-effect of the different execution behavior is irreversibly committed. When the monitor detects the deviation, or when an interrupt occurs during execution of the object program, a state of the program is established corresponding to a state that would have occurred during an execution of the source program, and from which execution can continue. Execution of the source program continues primarily in a hardware emulator designed to execute instructions of an instruction set non-native to the computer.Type: GrantFiled: November 3, 1999Date of Patent: September 7, 2004Assignee: ATI International, SRLInventors: John S. Yates, David L. Reese, Korbin S. Van Dyke, Paul H. Hohensee
-
Publication number: 20040172517Abstract: An orthogonal data converter for converting the components of a sequential vector component flow to a parallel vector component flow. The data converter has an input rotator configured to rotate corresponding vector components of the sequential vector component flow by a prescribed amount, and a bank of register files configured to store the rotated vector components. The converter also has an output rotator configured to rotate the position of the vector components read from the bank of register files by a prescribed amount. A controller of the converter is operative to control the addressing of the bank of register files and the rotating of the vector components. In this regard, the controller is operative to write the vector components to the bank of register files in a prescribed order and read the vector components in a prescribed order to generate the parallel vector component flow.Type: ApplicationFiled: September 19, 2003Publication date: September 2, 2004Inventors: Boris Prokopenko, Timour Paltashev