Vector Processor Operation Patents (Class 712/7)
-
Patent number: 6665790Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.Type: GrantFiled: February 29, 2000Date of Patent: December 16, 2003Assignee: International Business Machines CorporationInventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
-
Patent number: 6636828Abstract: The coefficient matrix, corresponding to the simultaneous linear equations to be solved, is divided into a plurality of row sets. The row sets as divided are processed in a parallel fashion, and entries specifying the nonzero elements contained in the first to nth row sets are added to the entry sets E1 to En. Moreover, in regard to each row set, fill-ins which take place at the time of eliminating the ith variable are obtained in a parallel fashion, and entries specifying the fill-ins are added to the entry sets E1 to En. The coefficient matrix is compressed using those entry sets E1 to En.Type: GrantFiled: May 10, 1999Date of Patent: October 21, 2003Assignee: NEC Electronics Corp.Inventor: Koutaro Hachiya
-
Publication number: 20030163667Abstract: A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction.Type: ApplicationFiled: October 31, 2002Publication date: August 28, 2003Applicant: ALPHAMOSAIC LIMITEDInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Publication number: 20030159016Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grouType: ApplicationFiled: October 31, 2002Publication date: August 21, 2003Applicant: ALPHAMOSAIC LIMITEDInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 6571386Abstract: An optimizer (100) comprises a memory (110) and a processor (130). The memory stores a program (200) to be optimized and optimization software (301). Controlled by the optimization software, the processor (120) (a) determines local vectors (“local”) in instructions of the program (200) which indicate the use of resources by the instructions (use-vectors, exh-vectors); (b) scans the program (200) for Single-Entry-Single-Exit (SESE) structures (U, T, V, S); and (c) determines SESE vectors from the local vectors. The SESE vectors indicate the use of resources by the SESE structures and can be combined by the optimizer to obtain a program vector. When some instructions are modified, then optimizer (100) only re-calculates the SESE vector of the corresponding SESE and re-combines the old SESE vector with the modified SESE vector to determine a new program vector.Type: GrantFiled: March 13, 2000Date of Patent: May 27, 2003Assignee: Motorola, Inc.Inventors: Mikhail Figurin, Mikhail Okrugin, Dmitriy Barmenkov
-
Patent number: 6557097Abstract: A processing engine 10 provides computation of an output vector as a linear combination of N input vectors with N coefficients in an efficient manner. The processing engine includes a coefficient register 940 for holding a representation of each of N coefficients of a first input vector. A test unit 950 is provided for testing selected parts (e.g. bits) of the coefficient register for respective coefficient representations. An arithmetic unit 970 computes respective coordinates of an output vector by selective addition/subtraction of coordinates of a second input vector dependent on results of the coefficient representation tests. Power consumption can be kept low due to the use of a coefficient test operation in parallel with an ALU operation.Type: GrantFiled: October 1, 1999Date of Patent: April 29, 2003Assignee: Texas Instruments IncorporatedInventors: Gael Clave, Karim Djafarian, Gilbert Laurenti
-
Patent number: 6553486Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor by one or more application programs in a computer system. A compiler identifies the use of vector data in the application program and implements one or more vector instructions for transferring the vector data between memory and registers used to perform calculations on the vector data. The compiler also schedules transfers of portions of the vector data required in a calculation so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers based on configuration information including the number of vectors buffers required by an application program and the size required for each vector buffer. The vector buffers are allocated for exclusive use by an application program that is executing in the data processor.Type: GrantFiled: August 17, 1999Date of Patent: April 22, 2003Assignee: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Publication number: 20030070059Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.Type: ApplicationFiled: May 30, 2001Publication date: April 10, 2003Inventors: William J. Dally, Scott Rixner, John Owens, Ujval J. Kapasi
-
Publication number: 20030037221Abstract: An improved processor implementation is described in which scalar and vector processing components are merged to reduce complexity. In particular, the implementation includes a scalar-vector register file for storing scalar and vector instructions, as well as a parallel vector unit comprising functional units that can process vector or scalar instructions as required. A further aspect of the invention provides the ability to disable unused functional units in the parallel vector unit, such as during a scalar operation, to achieve significant power savings.Type: ApplicationFiled: August 14, 2001Publication date: February 20, 2003Applicant: International Business Machines CorporationInventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin Edward Hopkins
-
Patent number: 6513107Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: GrantFiled: August 17, 1999Date of Patent: January 28, 2003Assignee: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Patent number: 6470440Abstract: An apparatus for compare and maximum/minimum and a method therefor are implemented. Selection circuitry selects a data value signal for outputting between an a pair of vector operands and “true” and “false” comparison value signals for the corresponding operand data type. Each input operand may include a plurality of subvector operands of a preselected data type, each data type has having a corresponding length. The selection circuitry selects the data value signal in response to a plurality of second signals. The second signals are generated from carry-out signals from the subvector operands, and first signals that are generated using instruction information for the executing instruction. The second signals may be generated by logically combining the first signals with carry propagate, carry generate and carry-out signals from carry lookahead logic receiving the subvector operands as input.Type: GrantFiled: May 20, 1999Date of Patent: October 22, 2002Assignee: International Business Machines CorporationInventors: Huy Van Nguyen, Charles Philip Roth
-
Patent number: 6446193Abstract: A method and apparatus for reducing instruction cycles in a digital signal processor wherein the processor includes a multiplier unit, an adder, a memory, and at least one pair of first and second accumulators. The accumulators include respective guard, high and low parts. The method and apparatus enable vectoring the respective first and second high parts from the accumulators to define a single vectored register responsive to a single instruction cycle and processing the data in the vectored register.Type: GrantFiled: September 8, 1997Date of Patent: September 3, 2002Assignee: Agere Systems Guardian Corp.Inventors: Mazhar M. Alidina, Sivanand Simanapalli, Larry R. Tate
-
Patent number: 6401194Abstract: A vector processor provides a data path divided into smaller slices of data, with each slice processed in parallel with the other slices. Furthermore, an execution unit provides smaller arithmetic and functional units chained together to execute more complex microprocessor instructions requiring multiple cycles by sharing single-cycle operations, thereby reducing both costs and size of the microprocessor. One embodiment handles 288-bit data widths using 36-bit data path slices. Another embodiment executes integer multiply and multiply-and-accumulate and floating point add/subtract and multiply operations using single-cycle arithmetic logic units. Other embodiments support 8-bit, 9-bit, 16-bit, and 32-bit integer data types and 32-bit floating data types.Type: GrantFiled: January 28, 1997Date of Patent: June 4, 2002Assignee: Samsung Electronics Co., Ltd.Inventors: Le Trong Nguyen, Heonchul Park, Roney S. Wong, Ted Nguyen, Edward H. Yu
-
Publication number: 20020062435Abstract: A multi-streaming processor has multiple streams for processing multiple threads, and an instruction scheduler including a priority record of priority codes for one or more of the streams. The priority codes determine in some embodiments relative access to resources as well as which stream has access at any point in time. In other embodiments priorities are determined dynamically and altered on-the-fly, which may be done by various criteria, such as on-chip processing statistics, by executing one or more priority algorithms, by input from off-chip, according to stream loading, or by combinations of these and other methods. In one embodiment a special code is used for disabling a stream, and streams may be enabled and disabled dynamically by various methods, such as by on-chip events, processing statistics, inpu from off-chip, and by processor interrupts. Some specific applications are taught, including for IP-routers and digital signal processors.Type: ApplicationFiled: December 16, 1998Publication date: May 23, 2002Inventors: MARIO D. NEMIROVSKY, ADOLFO M. NEMIROVSKY, NARENDRA SANKAR
-
Patent number: 6385632Abstract: A system and method for evaluating one or more functions using a succession of CORDIC stages/iterations followed by a residual rotation. The succession of CORDIC stages are preferably partitioned into (a) a Z path which operates on an input angle and generates an output angle, and (b) an X/Y path which operates on an input point and generates an output point. The residual rotation rotates the output point by the output angle to generate a resultant point using a small angle approximation for sine and an accurate evaluation for sine of the output angle. The number of CORDIC stages in the succession is chosen so that the error in the coordinates of the resultant point induced by the approximation of sine is smaller than a desired amount. In particular, the number of CORDIC stages in the succession is chosen to be greater than or equal to (N+1)/3 in order to guarantee N bits of precision in coordinates of the resultant point. The Z path has a propagation time which is smaller than the X/Y path.Type: GrantFiled: June 18, 1999Date of Patent: May 7, 2002Assignee: Advanced Micro Devices, Inc.Inventors: Gwangwoo Choe, James R. MacDonald
-
Patent number: 6385633Abstract: The phase of a complex number I+jQ is computed using a hybrid lookup table and computation approach suitable for DSP implementation and useful in remote access/networking and wireless applications. An approximate phase &thgr;˜ for an approximation complex number I˜+jQ˜ is determined through memory table lookup. This is added to a correction phase &Dgr;&thgr; which is determined by calculation of a correction term C=(I˜*Q−Q˜*I)/(I*I˜+Q*Q˜) which represents the imaginary part divided by the real part of the complex multiplication of the complex number and the conjugate of the approximate complex number.Type: GrantFiled: June 30, 1999Date of Patent: May 7, 2002Assignee: Texas Instruments IncorporatedInventor: Timothy M. Schmidl
-
Patent number: 6336179Abstract: A first counter sequentially counts a plurality of numbers from respective sources requesting transfer of data. Each of the numbers represents an amount of isochronous data to transfer over the bus from the respective ones of the sources during a frame on a bus. A count value in a second counter is selectably incremented when the first counter is counting, to provide a remaining count value indicative a remaining amount of data to transfer during the frame. The remaining count value in the second counter is decremented for each isochronous transfer on the bus after the remaining amount of data to transfer has been determined from all sources requesting transfer of isochronous data during the frame. A third counter tracks the time remaining in the frame and compares the remaining count value to the time remaining in the frame to determine a priority mode on the bus.Type: GrantFiled: August 21, 1998Date of Patent: January 1, 2002Assignee: Advanced Micro Devices, Inc.Inventor: Dale E. Gulick
-
Patent number: 6324600Abstract: A method and an apparatus for controlling movement of data between any host and any network including a set of devices in a computing system environment having a main memory with a queuing mechanism having a plurality of queues capable of being shared between a plurality of independent processes running on at least one host and at least one I/O adapter. A finite-state machine (FSM) is provided in the main memory and the FSM is divided into two disjoint sets of states, one of which represents state-values processed by the host and set by the adapter, and said other set represents state-values processed by the adapter and set by said host. Using each of these set of states free-running, non-deadlocking processes are provided within the host and the adapter so that the processes sequence circularly and continuously through a vector related to the FSMs.Type: GrantFiled: February 19, 1999Date of Patent: November 27, 2001Assignee: International Business Machines CorporationInventors: Frank W. Brice, Richard P. Tarcza, Leslie W. Wyman
-
Patent number: 6324638Abstract: A processor capable of executing vector instructions includes at least an instruction sequencing unit and a vector processing unit that receives vector instructions to be executed from the instruction sequencing unit. The vector processing unit includes a plurality of multiply structures, each containing only a single multiply array, that each correspond to at least one element of a vector input operand. Utilizing the single multiply array, each of the plurality of multiply structures is capable of performing a multiplication operation on one element of a vector input operand and is also capable of performing a multiplication operation on multiple elements of a vector input operand concurrently. In an embodiment in which the maximum length of an element of a vector input operand is N bits, each of the plurality of multiply arrays can handle both N by N bit integer multiplication and M by M bit integer multiplication, where N is a non-unitary integer multiple of M.Type: GrantFiled: March 31, 1999Date of Patent: November 27, 2001Assignee: International Business Machines CorporationInventors: Thomas Elmer, Michael Putrino
-
Patent number: 6219073Abstract: In a data processing apparatus, data is transferred in accordance with a meta-instruction embedded in the data. To put it in detail, first of all, a first meta-instruction is read out from an address ADDR0 stored in a tag address register Dn_TADR. Then, data following the first meta-instruction with a length specified by the meta-instruction is transferred. Subsequently, a second meta-instruction stored at an address ADDR2 specified in the first meta-instruction is read out and data following the second meta-instruction with a length specified by the second meta-instruction is transferred. A third next meta-instruction stored at an address ADDR1 specified in the second meta-instruction is further read out and data following the third meta-instruction with a length specified by the third meta-instruction is then transferred.Type: GrantFiled: March 25, 1998Date of Patent: April 17, 2001Assignee: Sony Computer Entertainment, Inc.Inventor: Masakazu Suzuoki
-
Patent number: 6212622Abstract: A processor employs ordering dependencies for load instruction operations upon store address instruction operations. The processor divides store operations into store address instruction operations and store data instruction operations. The store address instruction operations generate the address of the store, and the store data instruction operations route the corresponding data to the load/store unit. The processor maintains a store address dependency vector indicating each of the outstanding store addresses and records ordering dependencies upon the store address instruction operations for each load instruction operation. Accordingly, the load instruction operation is not scheduled until each prior store address instruction operation has been scheduled. Store addresses are available for dependency checking against the load address upon execution of the load instruction operation. If a memory dependency exists, it may be detected upon execution of the load instruction operation.Type: GrantFiled: August 24, 1998Date of Patent: April 3, 2001Assignee: Advanced Micro Devices, Inc.Inventor: David B. Witt
-
Patent number: 6212618Abstract: A method and apparatus for including in a processor, instructions for performing multiply-intra-add operations on packed data is described. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first and a second packed data. The processor performs operations on data elements in the first packed data and the second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction. At least two of the plurality of data elements in the third packed data store the result of multiply-intra-add operations.Type: GrantFiled: March 31, 1998Date of Patent: April 3, 2001Assignee: Intel CorporationInventor: Patrice L. Roussel
-
Patent number: 6195747Abstract: A system and method for reducing data traffic between the processor and the system controller in a data processing system during the execution of a vector or matrix instruction. When the processor receives an operation command requiring that a large quantity of data be processed, the processor issues a local operation request containing the desired operation, addressing information of the operands and a destination location for the result to the system. The system controller includes a local operation unit for locally executing the local operation request issued from the processor. Because the operand data associated with the operation need not be transferred over the system bus connected between the processor and the system controller, the data traffic between the processor and the system controller is reduced.Type: GrantFiled: September 28, 1998Date of Patent: February 27, 2001Assignee: Mentor Arc Inc.Inventor: Chien-Tzu Hou
-
Patent number: 6175907Abstract: An apparatus and method for calculating a square root of an operand in a microprocessor are provided. The microprocessor has a plurality of square root instructions, each of which specifies a square root calculation precision. The apparatus includes translation logic and execution logic. The translation logic decodes the square root macro instruction into a plurality of prescribed-precision machine instructions according to the square root calculation precision specified by the plurality of square root instructions. The execution logic, coupled to the translation logic, receives the plurality of prescribed-precision machine instructions and calculates the square root of the operand according to the specified square root calculation precision. At least one of the plurality of square root instructions specifies the square root calculation precision such that less significant bits are calculated in the square root than are provided in the operand.Type: GrantFiled: July 17, 1998Date of Patent: January 16, 2001Assignee: IP First, L.L.CInventors: Timothy A. Elliott, G. Glenn Henry
-
Patent number: 6061777Abstract: One aspect of the invention relates to a method for operating a processor. In one version of the invention, the method includes the steps of dispatching an instruction; determining a presently architected RMAP entry for the architectural register targeted by the dispatched instruction; selecting the RMAP entries which are associated with physical registers that contain operands for the dispatched instruction; updating a use indicator in the selected RMAP entries; determining whether the dispatched instruction is interruptible; and updating an architectural indicator and a historical indicator in the presently architected RMAP entry if the dispatched instruction is uninterruptible.Type: GrantFiled: October 28, 1997Date of Patent: May 9, 2000Assignee: International Business Machines CorporationInventors: Hoichi Cheong, Paul Joseph Jordan, Hung Qui Le, Soummya Mallick
-
Patent number: 6058465Abstract: A vector processor architecture provides vector registers of fixed size having data elements of programmable size and type. The type and size for data elements are defined by instructions which manipulate operands associated with the vector registers. The data size defined by an instruction determines the number of the data elements in a vector register and the number of parallel operations performed to complete the instruction. One embodiment of the invention supports 8-bit, 9-bit, 16-bit, and 32-bit data element sizes of integer type for all sizes and floating point data type for the 32-bit data elements.Type: GrantFiled: August 19, 1996Date of Patent: May 2, 2000Inventor: Le Trong Nguyen
-
Patent number: 6047372Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.Type: GrantFiled: April 13, 1999Date of Patent: April 4, 2000Assignees: Compaq Computer Corp., Advanced Micro Devices, Inc.Inventors: John S. Thayer, Brian E. Longhenry, John G. Favor, Frederick D. Weber
-
Patent number: 6006315Abstract: A method is provided for writing a scalar value to a vector V1 without reading the vector from a storage device. A scalar value to be written into the vector at a specified position and a scalar value (index) representing such position are read from a storage device into an Arithmetic Logic Unit (ALU) of a vector processor. The ALU then generates another vector V2 having multiple copies of the scalar value to be written into V1. ALU also generates a mask representing the index. The vector V2 is then delivered to the storage storing V1, but the mask is applied so that only one or more, but not all, copies of the scalar value are written from V2 to the storage. The rest of the vector V1 remains unchanged. The invention reduces register file read contention. Furthermore, if the updated V1 (i.e. V1 having the scalar value) is to be used in the next instruction, a copy of V1 is read from the storage and is updated from V2 and the mask, simultaneously with V1 being updated in the storage.Type: GrantFiled: October 18, 1996Date of Patent: December 21, 1999Assignee: Samsung Electronics Co., Ltd.Inventor: Heonchul Park
-
Patent number: 5996057Abstract: The data processing system of the present invention loads three input operands, including two input vectors and a control vector, into vector registers and performs a permutation of the two input vectors as specified by the control vector, and further stores the result of the operation as the output operand in an output register. The control vector consists of sixteen indices, each uniquely identifying a single byte of input data in either of the input registers, and can be specified in the operational code or be the result of a computation previously performed within the vector registers. The specification of the control vector allows a vector-matrix operation to be performed on the input vectors by rearranging or replicating the input operand bytes in the bytes of the output register as a function of the control vector.Type: GrantFiled: April 17, 1998Date of Patent: November 30, 1999Assignees: Apple, IBM, Motorola Inc.Inventors: Hunter Ledbetter Scales, III, Keith Everett Diefendorff, Brett Olsson, Pradeep Kumar Dubey, Ronald Ray Hochsprung
-
MPEG motion compensation using operand routing and performing add and divide in a single instruction
Patent number: 5991865Abstract: A routable operand and selectable operation processor multimedia extension unit is employed to motion compensate MPEG video using improved vector processing. A vector processing unit executes an add and divide instruction that adds two vector registers and divides the result in a single instruction. This is implemented through loading a first vector register with a first plurality of elements from a source block. A second vector register is then loaded with a second plurality of elements that are adjacent to the first plurality of elements. The add and divide instruction is then executed on the first and second vector registers, yielding an interpolated source element that is stored in a resultant vector register.Type: GrantFiled: December 31, 1996Date of Patent: November 23, 1999Assignee: Compaq Computer CorporationInventors: Brian E. Longhenry, Gary W. Thome, John S. Thayer