Vector Processor Operation Patents (Class 712/7)
  • Patent number: 6665790
    Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.
    Type: Grant
    Filed: February 29, 2000
    Date of Patent: December 16, 2003
    Assignee: International Business Machines Corporation
    Inventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
  • Patent number: 6636828
    Abstract: The coefficient matrix, corresponding to the simultaneous linear equations to be solved, is divided into a plurality of row sets. The row sets as divided are processed in a parallel fashion, and entries specifying the nonzero elements contained in the first to nth row sets are added to the entry sets E1 to En. Moreover, in regard to each row set, fill-ins which take place at the time of eliminating the ith variable are obtained in a parallel fashion, and entries specifying the fill-ins are added to the entry sets E1 to En. The coefficient matrix is compressed using those entry sets E1 to En.
    Type: Grant
    Filed: May 10, 1999
    Date of Patent: October 21, 2003
    Assignee: NEC Electronics Corp.
    Inventor: Koutaro Hachiya
  • Publication number: 20030163667
    Abstract: A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction.
    Type: Application
    Filed: October 31, 2002
    Publication date: August 28, 2003
    Applicant: ALPHAMOSAIC LIMITED
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Publication number: 20030159016
    Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grou
    Type: Application
    Filed: October 31, 2002
    Publication date: August 21, 2003
    Applicant: ALPHAMOSAIC LIMITED
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 6571386
    Abstract: An optimizer (100) comprises a memory (110) and a processor (130). The memory stores a program (200) to be optimized and optimization software (301). Controlled by the optimization software, the processor (120) (a) determines local vectors (“local”) in instructions of the program (200) which indicate the use of resources by the instructions (use-vectors, exh-vectors); (b) scans the program (200) for Single-Entry-Single-Exit (SESE) structures (U, T, V, S); and (c) determines SESE vectors from the local vectors. The SESE vectors indicate the use of resources by the SESE structures and can be combined by the optimizer to obtain a program vector. When some instructions are modified, then optimizer (100) only re-calculates the SESE vector of the corresponding SESE and re-combines the old SESE vector with the modified SESE vector to determine a new program vector.
    Type: Grant
    Filed: March 13, 2000
    Date of Patent: May 27, 2003
    Assignee: Motorola, Inc.
    Inventors: Mikhail Figurin, Mikhail Okrugin, Dmitriy Barmenkov
  • Patent number: 6557097
    Abstract: A processing engine 10 provides computation of an output vector as a linear combination of N input vectors with N coefficients in an efficient manner. The processing engine includes a coefficient register 940 for holding a representation of each of N coefficients of a first input vector. A test unit 950 is provided for testing selected parts (e.g. bits) of the coefficient register for respective coefficient representations. An arithmetic unit 970 computes respective coordinates of an output vector by selective addition/subtraction of coordinates of a second input vector dependent on results of the coefficient representation tests. Power consumption can be kept low due to the use of a coefficient test operation in parallel with an ALU operation.
    Type: Grant
    Filed: October 1, 1999
    Date of Patent: April 29, 2003
    Assignee: Texas Instruments Incorporated
    Inventors: Gael Clave, Karim Djafarian, Gilbert Laurenti
  • Patent number: 6553486
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor by one or more application programs in a computer system. A compiler identifies the use of vector data in the application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. The compiler also schedules transfers of portions of the vector data required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers based on configuration information including the number of vectors buffers required by an application program and the size required for each vector buffer. The vector buffers are allocated for exclusive use by an application program that is executing in the data processor.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: April 22, 2003
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Publication number: 20030070059
    Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
    Type: Application
    Filed: May 30, 2001
    Publication date: April 10, 2003
    Inventors: William J. Dally, Scott Rixner, John Owens, Ujval J. Kapasi
  • Publication number: 20030037221
    Abstract: An improved processor implementation is described in which scalar and vector processing components are merged to reduce complexity. In particular, the implementation includes a scalar-vector register file for storing scalar and vector instructions, as well as a parallel vector unit comprising functional units that can process vector or scalar instructions as required. A further aspect of the invention provides the ability to disable unused functional units in the parallel vector unit, such as during a scalar operation, to achieve significant power savings.
    Type: Application
    Filed: August 14, 2001
    Publication date: February 20, 2003
    Applicant: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin Edward Hopkins
  • Patent number: 6513107
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.
    Type: Grant
    Filed: August 17, 1999
    Date of Patent: January 28, 2003
    Assignee: NEC Electronics, Inc.
    Inventor: Ahmad R. Ansari
  • Patent number: 6470440
    Abstract: An apparatus for compare and maximum/minimum and a method therefor are implemented. Selection circuitry selects a data value signal for outputting between an a pair of vector operands and “true” and “false” comparison value signals for the corresponding operand data type. Each input operand may include a plurality of subvector operands of a preselected data type, each data type has having a corresponding length. The selection circuitry selects the data value signal in response to a plurality of second signals. The second signals are generated from carry-out signals from the subvector operands, and first signals that are generated using instruction information for the executing instruction. The second signals may be generated by logically combining the first signals with carry propagate, carry generate and carry-out signals from carry lookahead logic receiving the subvector operands as input.
    Type: Grant
    Filed: May 20, 1999
    Date of Patent: October 22, 2002
    Assignee: International Business Machines Corporation
    Inventors: Huy Van Nguyen, Charles Philip Roth
  • Patent number: 6446193
    Abstract: A method and apparatus for reducing instruction cycles in a digital signal processor wherein the processor includes a multiplier unit, an adder, a memory, and at least one pair of first and second accumulators. The accumulators include respective guard, high and low parts. The method and apparatus enable vectoring the respective first and second high parts from the accumulators to define a single vectored register responsive to a single instruction cycle and processing the data in the vectored register.
    Type: Grant
    Filed: September 8, 1997
    Date of Patent: September 3, 2002
    Assignee: Agere Systems Guardian Corp.
    Inventors: Mazhar M. Alidina, Sivanand Simanapalli, Larry R. Tate
  • Patent number: 6401194
    Abstract: A vector processor provides a data path divided into smaller slices of data, with each slice processed in parallel with the other slices. Furthermore, an execution unit provides smaller arithmetic and functional units chained together to execute more complex microprocessor instructions requiring multiple cycles by sharing single-cycle operations, thereby reducing both costs and size of the microprocessor. One embodiment handles 288-bit data widths using 36-bit data path slices. Another embodiment executes integer multiply and multiply-and-accumulate and floating point add/subtract and multiply operations using single-cycle arithmetic logic units. Other embodiments support 8-bit, 9-bit, 16-bit, and 32-bit integer data types and 32-bit floating data types.
    Type: Grant
    Filed: January 28, 1997
    Date of Patent: June 4, 2002
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Le Trong Nguyen, Heonchul Park, Roney S. Wong, Ted Nguyen, Edward H. Yu
  • Publication number: 20020062435
    Abstract: A multi-streaming processor has multiple streams for processing multiple threads, and an instruction scheduler including a priority record of priority codes for one or more of the streams. The priority codes determine in some embodiments relative access to resources as well as which stream has access at any point in time. In other embodiments priorities are determined dynamically and altered on-the-fly, which may be done by various criteria, such as on-chip processing statistics, by executing one or more priority algorithms, by input from off-chip, according to stream loading, or by combinations of these and other methods. In one embodiment a special code is used for disabling a stream, and streams may be enabled and disabled dynamically by various methods, such as by on-chip events, processing statistics, inpu from off-chip, and by processor interrupts. Some specific applications are taught, including for IP-routers and digital signal processors.
    Type: Application
    Filed: December 16, 1998
    Publication date: May 23, 2002
    Inventors: MARIO D. NEMIROVSKY, ADOLFO M. NEMIROVSKY, NARENDRA SANKAR
  • Patent number: 6385632
    Abstract: A system and method for evaluating one or more functions using a succession of CORDIC stages/iterations followed by a residual rotation. The succession of CORDIC stages are preferably partitioned into (a) a Z path which operates on an input angle and generates an output angle, and (b) an X/Y path which operates on an input point and generates an output point. The residual rotation rotates the output point by the output angle to generate a resultant point using a small angle approximation for sine and an accurate evaluation for sine of the output angle. The number of CORDIC stages in the succession is chosen so that the error in the coordinates of the resultant point induced by the approximation of sine is smaller than a desired amount. In particular, the number of CORDIC stages in the succession is chosen to be greater than or equal to (N+1)/3 in order to guarantee N bits of precision in coordinates of the resultant point. The Z path has a propagation time which is smaller than the X/Y path.
    Type: Grant
    Filed: June 18, 1999
    Date of Patent: May 7, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gwangwoo Choe, James R. MacDonald
  • Patent number: 6385633
    Abstract: The phase of a complex number I+jQ is computed using a hybrid lookup table and computation approach suitable for DSP implementation and useful in remote access/networking and wireless applications. An approximate phase &thgr;˜ for an approximation complex number I˜+jQ˜ is determined through memory table lookup. This is added to a correction phase &Dgr;&thgr; which is determined by calculation of a correction term C=(I˜*Q−Q˜*I)/(I*I˜+Q*Q˜) which represents the imaginary part divided by the real part of the complex multiplication of the complex number and the conjugate of the approximate complex number.
    Type: Grant
    Filed: June 30, 1999
    Date of Patent: May 7, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Timothy M. Schmidl
  • Patent number: 6336179
    Abstract: A first counter sequentially counts a plurality of numbers from respective sources requesting transfer of data. Each of the numbers represents an amount of isochronous data to transfer over the bus from the respective ones of the sources during a frame on a bus. A count value in a second counter is selectably incremented when the first counter is counting, to provide a remaining count value indicative a remaining amount of data to transfer during the frame. The remaining count value in the second counter is decremented for each isochronous transfer on the bus after the remaining amount of data to transfer has been determined from all sources requesting transfer of isochronous data during the frame. A third counter tracks the time remaining in the frame and compares the remaining count value to the time remaining in the frame to determine a priority mode on the bus.
    Type: Grant
    Filed: August 21, 1998
    Date of Patent: January 1, 2002
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Dale E. Gulick
  • Patent number: 6324600
    Abstract: A method and an apparatus for controlling movement of data between any host and any network including a set of devices in a computing system environment having a main memory with a queuing mechanism having a plurality of queues capable of being shared between a plurality of independent processes running on at least one host and at least one I/O adapter. A finite-state machine (FSM) is provided in the main memory and the FSM is divided into two disjoint sets of states, one of which represents state-values processed by the host and set by the adapter, and said other set represents state-values processed by the adapter and set by said host. Using each of these set of states free-running, non-deadlocking processes are provided within the host and the adapter so that the processes sequence circularly and continuously through a vector related to the FSMs.
    Type: Grant
    Filed: February 19, 1999
    Date of Patent: November 27, 2001
    Assignee: International Business Machines Corporation
    Inventors: Frank W. Brice, Richard P. Tarcza, Leslie W. Wyman
  • Patent number: 6324638
    Abstract: A processor capable of executing vector instructions includes at least an instruction sequencing unit and a vector processing unit that receives vector instructions to be executed from the instruction sequencing unit. The vector processing unit includes a plurality of multiply structures, each containing only a single multiply array, that each correspond to at least one element of a vector input operand. Utilizing the single multiply array, each of the plurality of multiply structures is capable of performing a multiplication operation on one element of a vector input operand and is also capable of performing a multiplication operation on multiple elements of a vector input operand concurrently. In an embodiment in which the maximum length of an element of a vector input operand is N bits, each of the plurality of multiply arrays can handle both N by N bit integer multiplication and M by M bit integer multiplication, where N is a non-unitary integer multiple of M.
    Type: Grant
    Filed: March 31, 1999
    Date of Patent: November 27, 2001
    Assignee: International Business Machines Corporation
    Inventors: Thomas Elmer, Michael Putrino
  • Patent number: 6219073
    Abstract: In a data processing apparatus, data is transferred in accordance with a meta-instruction embedded in the data. To put it in detail, first of all, a first meta-instruction is read out from an address ADDR0 stored in a tag address register Dn_TADR. Then, data following the first meta-instruction with a length specified by the meta-instruction is transferred. Subsequently, a second meta-instruction stored at an address ADDR2 specified in the first meta-instruction is read out and data following the second meta-instruction with a length specified by the second meta-instruction is transferred. A third next meta-instruction stored at an address ADDR1 specified in the second meta-instruction is further read out and data following the third meta-instruction with a length specified by the third meta-instruction is then transferred.
    Type: Grant
    Filed: March 25, 1998
    Date of Patent: April 17, 2001
    Assignee: Sony Computer Entertainment, Inc.
    Inventor: Masakazu Suzuoki
  • Patent number: 6212622
    Abstract: A processor employs ordering dependencies for load instruction operations upon store address instruction operations. The processor divides store operations into store address instruction operations and store data instruction operations. The store address instruction operations generate the address of the store, and the store data instruction operations route the corresponding data to the load/store unit. The processor maintains a store address dependency vector indicating each of the outstanding store addresses and records ordering dependencies upon the store address instruction operations for each load instruction operation. Accordingly, the load instruction operation is not scheduled until each prior store address instruction operation has been scheduled. Store addresses are available for dependency checking against the load address upon execution of the load instruction operation. If a memory dependency exists, it may be detected upon execution of the load instruction operation.
    Type: Grant
    Filed: August 24, 1998
    Date of Patent: April 3, 2001
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David B. Witt
  • Patent number: 6212618
    Abstract: A method and apparatus for including in a processor, instructions for performing multiply-intra-add operations on packed data is described. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first and a second packed data. The processor performs operations on data elements in the first packed data and the second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction. At least two of the plurality of data elements in the third packed data store the result of multiply-intra-add operations.
    Type: Grant
    Filed: March 31, 1998
    Date of Patent: April 3, 2001
    Assignee: Intel Corporation
    Inventor: Patrice L. Roussel
  • Patent number: 6195747
    Abstract: A system and method for reducing data traffic between the processor and the system controller in a data processing system during the execution of a vector or matrix instruction. When the processor receives an operation command requiring that a large quantity of data be processed, the processor issues a local operation request containing the desired operation, addressing information of the operands and a destination location for the result to the system. The system controller includes a local operation unit for locally executing the local operation request issued from the processor. Because the operand data associated with the operation need not be transferred over the system bus connected between the processor and the system controller, the data traffic between the processor and the system controller is reduced.
    Type: Grant
    Filed: September 28, 1998
    Date of Patent: February 27, 2001
    Assignee: Mentor Arc Inc.
    Inventor: Chien-Tzu Hou
  • Patent number: 6175907
    Abstract: An apparatus and method for calculating a square root of an operand in a microprocessor are provided. The microprocessor has a plurality of square root instructions, each of which specifies a square root calculation precision. The apparatus includes translation logic and execution logic. The translation logic decodes the square root macro instruction into a plurality of prescribed-precision machine instructions according to the square root calculation precision specified by the plurality of square root instructions. The execution logic, coupled to the translation logic, receives the plurality of prescribed-precision machine instructions and calculates the square root of the operand according to the specified square root calculation precision. At least one of the plurality of square root instructions specifies the square root calculation precision such that less significant bits are calculated in the square root than are provided in the operand.
    Type: Grant
    Filed: July 17, 1998
    Date of Patent: January 16, 2001
    Assignee: IP First, L.L.C
    Inventors: Timothy A. Elliott, G. Glenn Henry
  • Patent number: 6061777
    Abstract: One aspect of the invention relates to a method for operating a processor. In one version of the invention, the method includes the steps of dispatching an instruction; determining a presently architected RMAP entry for the architectural register targeted by the dispatched instruction; selecting the RMAP entries which are associated with physical registers that contain operands for the dispatched instruction; updating a use indicator in the selected RMAP entries; determining whether the dispatched instruction is interruptible; and updating an architectural indicator and a historical indicator in the presently architected RMAP entry if the dispatched instruction is uninterruptible.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: May 9, 2000
    Assignee: International Business Machines Corporation
    Inventors: Hoichi Cheong, Paul Joseph Jordan, Hung Qui Le, Soummya Mallick
  • Patent number: 6058465
    Abstract: A vector processor architecture provides vector registers of fixed size having data elements of programmable size and type. The type and size for data elements are defined by instructions which manipulate operands associated with the vector registers. The data size defined by an instruction determines the number of the data elements in a vector register and the number of parallel operations performed to complete the instruction. One embodiment of the invention supports 8-bit, 9-bit, 16-bit, and 32-bit data element sizes of integer type for all sizes and floating point data type for the 32-bit data elements.
    Type: Grant
    Filed: August 19, 1996
    Date of Patent: May 2, 2000
    Inventor: Le Trong Nguyen
  • Patent number: 6047372
    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.
    Type: Grant
    Filed: April 13, 1999
    Date of Patent: April 4, 2000
    Assignees: Compaq Computer Corp., Advanced Micro Devices, Inc.
    Inventors: John S. Thayer, Brian E. Longhenry, John G. Favor, Frederick D. Weber
  • Patent number: 6006315
    Abstract: A method is provided for writing a scalar value to a vector V1 without reading the vector from a storage device. A scalar value to be written into the vector at a specified position and a scalar value (index) representing such position are read from a storage device into an Arithmetic Logic Unit (ALU) of a vector processor. The ALU then generates another vector V2 having multiple copies of the scalar value to be written into V1. ALU also generates a mask representing the index. The vector V2 is then delivered to the storage storing V1, but the mask is applied so that only one or more, but not all, copies of the scalar value are written from V2 to the storage. The rest of the vector V1 remains unchanged. The invention reduces register file read contention. Furthermore, if the updated V1 (i.e. V1 having the scalar value) is to be used in the next instruction, a copy of V1 is read from the storage and is updated from V2 and the mask, simultaneously with V1 being updated in the storage.
    Type: Grant
    Filed: October 18, 1996
    Date of Patent: December 21, 1999
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Heonchul Park
  • Patent number: 5996057
    Abstract: The data processing system of the present invention loads three input operands, including two input vectors and a control vector, into vector registers and performs a permutation of the two input vectors as specified by the control vector, and further stores the result of the operation as the output operand in an output register. The control vector consists of sixteen indices, each uniquely identifying a single byte of input data in either of the input registers, and can be specified in the operational code or be the result of a computation previously performed within the vector registers. The specification of the control vector allows a vector-matrix operation to be performed on the input vectors by rearranging or replicating the input operand bytes in the bytes of the output register as a function of the control vector.
    Type: Grant
    Filed: April 17, 1998
    Date of Patent: November 30, 1999
    Assignees: Apple, IBM, Motorola Inc.
    Inventors: Hunter Ledbetter Scales, III, Keith Everett Diefendorff, Brett Olsson, Pradeep Kumar Dubey, Ronald Ray Hochsprung
  • Patent number: 5991865
    Abstract: A routable operand and selectable operation processor multimedia extension unit is employed to motion compensate MPEG video using improved vector processing. A vector processing unit executes an add and divide instruction that adds two vector registers and divides the result in a single instruction. This is implemented through loading a first vector register with a first plurality of elements from a source block. A second vector register is then loaded with a second plurality of elements that are adjacent to the first plurality of elements. The add and divide instruction is then executed on the first and second vector registers, yielding an interpolated source element that is stored in a resultant vector register.
    Type: Grant
    Filed: December 31, 1996
    Date of Patent: November 23, 1999
    Assignee: Compaq Computer Corporation
    Inventors: Brian E. Longhenry, Gary W. Thome, John S. Thayer