Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
-
Patent number: 6789181Abstract: A method and computer for executing the method. A source program is translated into an object program, in a manner in which the translated object program has a different execution behavior than the source program. The translated object program is executed under a monitor capable of detecting any deviation from fully-correct interpretation before any side-effect of the different execution behavior is irreversibly committed. When the monitor detects the deviation, or when an interrupt occurs during execution of the object program, a state of the program is established corresponding to a state that would have occurred during an execution of the source program, and from which execution can continue. Execution of the source program continues primarily in a hardware emulator designed to execute instructions of an instruction set non-native to the computer.Type: GrantFiled: November 3, 1999Date of Patent: September 7, 2004Assignee: ATI International, SRLInventors: John S. Yates, David L. Reese, Korbin S. Van Dyke, Paul H. Hohensee
-
Publication number: 20040172517Abstract: An orthogonal data converter for converting the components of a sequential vector component flow to a parallel vector component flow. The data converter has an input rotator configured to rotate corresponding vector components of the sequential vector component flow by a prescribed amount, and a bank of register files configured to store the rotated vector components. The converter also has an output rotator configured to rotate the position of the vector components read from the bank of register files by a prescribed amount. A controller of the converter is operative to control the addressing of the bank of register files and the rotating of the vector components. In this regard, the controller is operative to write the vector components to the bank of register files in a prescribed order and read the vector components in a prescribed order to generate the parallel vector component flow.Type: ApplicationFiled: September 19, 2003Publication date: September 2, 2004Inventors: Boris Prokopenko, Timour Paltashev
-
Patent number: 6782470Abstract: The register file of a processor includes embedded operand queues. The configuration of the register file into registers and operand queues is defined dynamically by a computer program. The programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers used for the program. The programmer partitions a portion of the registers into one or more operand queues. A given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers. An additional address bit is included to distinguish operand queue addresses from register addresses. Queue state logic tracks status information for each queue, including a header pointer, tail pointer, start address, end address and number of vacancies value. The program sets the locations and depth of a given operand queue within the register file.Type: GrantFiled: November 6, 2000Date of Patent: August 24, 2004Assignee: University of WashingtonInventors: Stefan G. Berg, Michael S. Grow, Weiyun Sun, Donglok Kim, Yongmin Kim
-
Publication number: 20040153624Abstract: A “high availability” system comprises one or more switches under the control of multiple control processors (“CPs”). One of the CPs is deemed to be “active,” while the other CP is kept in a “standby” mode. Each CP generally has the same software load including a fabric state synchronization (“FSS”) facility. The FSSs of each CP communicate with each other. The state information pertaining to an active “image” is continuously provided to a standby copy of the image (“standby image”). The CPs' FSSs perform the function of synchronizing the standby image to the active image. The state information generally includes configuration and operational parameters and other information regarding the active image. By keeping the standby image synchronized to the active image, the standby image can be rapidly transitioned to the active mode if the active image experiences a fault and continue where the previous active image left off.Type: ApplicationFiled: October 29, 2002Publication date: August 5, 2004Applicant: Brocade Communications Systems, Inc.Inventors: Bill J. Zhou, Richard L. Hammons
-
Publication number: 20040128485Abstract: According to one embodiment, a microprocessor is described. The microprocessor includes a scalar processor and a vector processor. The vector processor fuses multiple instructions that are to be processed. The fused instructions enable a single source register to simultaneously transmit its data contents to multiple math units.Type: ApplicationFiled: December 27, 2002Publication date: July 1, 2004Inventor: Scott R. Nelson
-
Publication number: 20040128472Abstract: A vector information processing apparatus has a CPU comprising a plurality of asynchronously operating units, a main memory for storing data, and a main memory controller for controlling the writing of data in the main memory. The main memory controller has a VSC address buffer for holding a storage address in the main memory for each element designated by a vector scatter instruction. The main memory controller is arranged to inhibit the outputting of a writing permission signal for the main memory which is generated according to a writing request for writing an element having a smaller element number, which has the same storage address as the storage address and which has not been processed in a sequence of element numbers, of writing requests for writing elements in the main memory which are issued respectively from the asynchronously operating units according to a vector scatter instruction.Type: ApplicationFiled: July 22, 2003Publication date: July 1, 2004Applicant: NEC CORPORATIONInventor: Hisao Koyanagi
-
Publication number: 20040117510Abstract: Processor communication registers (PCRs) contained in each processor within a multiprocessor system and interconnected by a specialized bus provides enhanced processor communication. Each PCR stores identical processor communication information that is useful in pipelined or parallel multi-processing. Each processor has exclusive rights to store to a sector within each PCR and has continuous access to read the contents of its own PCR. Each processor updates its exclusive sector within all of the PCRs utilizing communication over the specialized bus, instantly allowing all of the other processors to see the change within the PCR data, and bypassing the cache subsystem.Type: ApplicationFiled: December 12, 2002Publication date: June 17, 2004Applicant: International Business Machines CorporationInventors: Ravi Kumar Arimilli, Robert Alan Cargnoni, Derek Edward Williams, Kenneth Lee Wright
-
Publication number: 20040117595Abstract: A system and method for calculating memory addresses in a partitioned memory in a processing system having a processing unit, input and output units, a program sequencer and an external interface. An address calculator includes a set of storage elements, such as registers, and an arithmetic unit for calculating a memory address of a vector element dependent upon values stored in the storage elements and the address of a previous vector element. The storage elements hold STRIDE, SKIP and SPAN values and optionally a TYPE value, relating to the spacing between elements in the same partition, the spacing between elements in the consecutive partitions, the number of elements in a partition and the size of a vector element, respectively.Type: ApplicationFiled: September 8, 2003Publication date: June 17, 2004Inventors: James M. Norris, Philip E. May, Kent D. Moat, Raymond B. Essick, Brian G. Lucas
-
Patent number: 6751725Abstract: Methods and apparatuses to clear state for operation of a stack. According to one embodiment of the invention, a processor comprises a set of one or more storage areas and a decode unit. The set of one or more storage areas are to store a plurality of tags and a top of stack indication, where each of the plurality of tags is to indicate if a register is in an empty or non-empty state. The decode unit is to decode scalar floating point instructions and packed data instructions, where at least certain of said scalar floating point instructions specify registers in a stack referenced manner and at least certain of said packed data instructions specify registers in a non-stack referenced manner. In addition, the packed data instructions include an instruction to mark the end of blocks of the packed data instructions in programs. The processor also comprises circuitry to cause the plurality of tags to indicate the empty state responsive to execution of the instruction.Type: GrantFiled: February 16, 2001Date of Patent: June 15, 2004Assignee: Intel CorporationInventors: David Bistry, Larry Mennemeier, Alexander D. Peleg, Carole Dulong, Eiichi Kowashi, Millind Mittal, Benny Eitan
-
Publication number: 20040103262Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.Type: ApplicationFiled: November 15, 2003Publication date: May 27, 2004Applicant: International Business Machines CorporationInventors: Clair John Glossner, Erdem Hokenek, David Meltzer, Mayan Moudgill
-
Patent number: 6742106Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: GrantFiled: January 28, 2003Date of Patent: May 25, 2004Assignee: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Patent number: 6728874Abstract: A method and system for correctly processing both big endian and little endian vector data. If the vector has a little endian data order, each piece of data (such as a byte) within the vector is processed in order. If the vector has a big endian data order, each vector element is processed in order, but each piece of data within each vector element is processed in reverse order.Type: GrantFiled: October 10, 2000Date of Patent: April 27, 2004Assignee: Koninklijke Philips Electronics N.V.Inventors: Frans W. Sijstermans, Evert-Jan D. Pol
-
Patent number: 6701424Abstract: A method and apparatus for loading and storing vectors from and to memory, including embedding a location identifier in bits comprising a vector load and store instruction, wherein the location identifier indicates a location in the vector where useful data ends. The vector load instruction further includes a value field that indicates a particular constant for use by the load/store unit to set locations in the vector register beyond the useful data with the constant. By embedding the ending location of the useful date in the instruction, bandwidth and memory are saved by only requiring that the useful data in the vector be loaded and stored.Type: GrantFiled: April 7, 2000Date of Patent: March 2, 2004Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng
-
Patent number: 6681315Abstract: A bit vector array apparatus provides a high speed method for processing network transmission controls. Complex data structures for controlling network access are represented in the simplest possible form as single bit vector elements. The bit vector elements are combined into bit vectors comprised of 32 single bit vector elements. The bit vectors are processed in parallel in the bit vector array apparatus, which is comprised of special-purpose bit manipulation functions to expedite the processing.Type: GrantFiled: November 26, 1997Date of Patent: January 20, 2004Assignee: International Business Machines CorporationInventors: Paul John Hilts, Brian Alan Youngman
-
Publication number: 20040006681Abstract: A configuration of vector units, digital circuitry and associated instructions is disclosed for the parallel processing of multiple Viterbi decoder butterflies on a programmable digital signal processor (DSP) that is based on single-instruction-multiple-data (SIMD) principles and provides indirect access to vector elements. The disclosed configuration uses a processor with two vector units and associated registers, where the vector units are connected back to back for processing Viterbi decoder state metrics. Viterbi add instructions increment vectors of state metrics from a first register, performing a desired permutation of state metrics while reading them indirectly through vector pointers, and writing intermediate result vectors to a second register.Type: ApplicationFiled: September 13, 2002Publication date: January 8, 2004Inventors: Jaime Humberto Moreno, Fredy Daniel Neeser
-
Publication number: 20040003200Abstract: An interconnection device (300) with a number of links (306, 308, 310, 312 and 314), each link having a number of link input ports (302), link output ports (304) and storage registers (316). An input selection switch (402) is coupled to a selected link input port to receive an input data token. The storage registers (316) may be used to store input data tokens. A storage access switch (404) is coupled to the input selection switch (402) and to the storage registers (316) and may be used to select the current input data token or a token from the storage registers as an output data token. An output selection switch (406) receives the output data token and provides it to a selected link output port. The interconnection device may, for example, be used to connect the inputs and outputs of the processing elements of a vector processor or digital signal processor.Type: ApplicationFiled: June 28, 2002Publication date: January 1, 2004Inventors: Philip E. May, Kent Donald Moat, Raymond B. Essick, Silviu Chiricescu, Brian Geoffrey Lucas, James M. Norris, Michael Allen Schuette, Ali Saidi
-
Patent number: 6665774Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: GrantFiled: October 16, 2001Date of Patent: December 16, 2003Assignee: Cray, Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg
-
Patent number: 6665790Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.Type: GrantFiled: February 29, 2000Date of Patent: December 16, 2003Assignee: International Business Machines CorporationInventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
-
Publication number: 20030221086Abstract: Data processing apparatus and methods capable of executing vector instructions. Such apparatus preferably include a number of data buffers whose sizes are configurable in hardware and/or in software; a number of buffer control units adapted to control access to the data buffers, at lease one buffer control unit including at least one programmable write pointer register, read pointer register, read stride register and vector length register; a number of execution units for executing vector instructions using input operands stored in data buffers and storing produced results to data buffers; and at least one Direct Memory Access channel transferring data to and from said buffers. Preferably, at least some of the data buffers are implemented in dual-ported fashion in order to allow at least two simultaneous accesses per buffer, including at least one read access and one write access.Type: ApplicationFiled: February 13, 2003Publication date: November 27, 2003Inventors: Slobodan A. Simovich, Ivan P. Radivojevic, Erik Ramberg
-
Patent number: 6625720Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector instructions are used for transferring the vector data between memory and registers used to perform calculations on the vector data. The transfers of portions of the vector data required in a calculation are scheduled so that calculations on a portion of the vector data are performed while a subsequent portion of the vector data is transferred. A vector buffer pool is partitioned into one or more vector buffers based on configuration information including the number of vectors buffers required by an application program and the size required for each vector buffer. The vector buffers are allocated for exclusive use by an application program that is executing in the data processor. Vector data transfer instructions are posted in a vector transfer instruction queue and are executed in the order they are posted to the instruction queue.Type: GrantFiled: August 17, 1999Date of Patent: September 23, 2003Assignee: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Publication number: 20030167387Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: ApplicationFiled: January 28, 2003Publication date: September 4, 2003Applicant: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Publication number: 20030159016Abstract: A data processor comprising: a register memory comprising an array of memory cells extending in two dimensions, the cells being located on rows in the first dimension and columns in the second dimension, each cell being addressable by means of an instruction specifying a pair of coordinates that identify the row and column of the cell in the array; and a processing unit capable of executing instructions that operate on a plurality of memory cells in the register, the instructions identifying the plurality of cells by means of a first instruction part specifying a pair of coordinates that identify a first cell in the array, and a second instruction part that identifies the configuration of the plurality of cells relative to the first cell; the data processor being arranged to interpret a first form of second instruction part as specifying a first group of cells all of which are located in the same row but in different columns, and to interpret a second form of second instruction part as specifying a first grouType: ApplicationFiled: October 31, 2002Publication date: August 21, 2003Applicant: ALPHAMOSAIC LIMITEDInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Publication number: 20030159017Abstract: A data processor comprising: a first processor unit having a first register memory addressable in a first format; a second processor unit having a second register memory addressable in a second format, and being capable of retrieving data from the first processor unit; the second processor unit being capable of executing an instruction including an operand specified by means of a reference to data in the first register memory.Type: ApplicationFiled: October 31, 2002Publication date: August 21, 2003Applicant: ALPHAMOSAIC LIMITEDInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 6571328Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: GrantFiled: August 1, 2001Date of Patent: May 27, 2003Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Patent number: 6553474Abstract: A data processor in which a read operation, including misaligned data as operand data, can be performed in a single cycle. An alignment buffer having a register to hold data stored at one address in data memory is provided between the data memory and a data path unit. The alignment buffer outputs misaligned data by selecting misaligned data from data held in the register and data read from the data memory. The data held in the register is updated as word-aligned data is read out.Type: GrantFiled: January 24, 2001Date of Patent: April 22, 2003Assignee: Mitsubishi Denki Kabushiki KaishaInventors: Hironobu Ito, Hisakazu Sato
-
Publication number: 20030074654Abstract: A digital computer system automatically creates an Instruction Set Architecture (ISA) that potentially exploits VLIW instructions, vector operations, fused operations, and specialized operations with the goal of increasing the performance of a set of applications while keeping hardware cost below a designer specified limit, or with the goal of minimizing hardware cost given a required level of performance.Type: ApplicationFiled: October 16, 2001Publication date: April 17, 2003Inventors: David William Goodwin, Dror Maydan, Ding-Kai Chen, Darin Stamenov Petkov, Steven Weng-Kiang Tjiang, Peng Tu, Christopher Rowen
-
Publication number: 20030056082Abstract: An improved method and system for controlling free space distribution by key range within a database. In one embodiment, a data structure including key ranges of a plurality of database tables and indexes, and a plurality of key range free space parameters is created. The plurality of database tables and indexes may include a plurality of page sets, which may include rows of data and keys. Time values may be associated with the plurality of free space parameters. The key range free space parameters may have values assigned to them. The key range free space parameters may be user-defined or automatically generated using growth trend analysis, based on key range growth statistics. The rows of data and keys within the plurality of page sets may be redistributed by a reorganization process. The redistributing may reference the key ranges of the data structure and the key range free space parameters.Type: ApplicationFiled: December 27, 2001Publication date: March 20, 2003Inventor: John D. Maxfield
-
Patent number: 6530011Abstract: A method and an apparatus for implementing mixed scalar and vector values in a digital processing system. In one embodiment, a digital processing system, which contains processing unit and memories, is capable of identifying a first data in a first scalar register and a second data in a vector register. Upon fetching the first data as a first operand and the second data as a second operand, the processing unit performs an operation between the first and second operands in response to an operator. After operations, the result is subsequently stored in a second scalar register.Type: GrantFiled: October 20, 1999Date of Patent: March 4, 2003Assignee: SandCraft, Inc.Inventor: Jack H. Choquette
-
Patent number: 6513107Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: GrantFiled: August 17, 1999Date of Patent: January 28, 2003Assignee: NEC Electronics, Inc.Inventor: Ahmad R. Ansari
-
Patent number: 6496902Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: GrantFiled: December 31, 1998Date of Patent: December 17, 2002Assignee: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg
-
Publication number: 20020144061Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: ApplicationFiled: October 16, 2001Publication date: October 3, 2002Applicant: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg
-
Patent number: 6446193Abstract: A method and apparatus for reducing instruction cycles in a digital signal processor wherein the processor includes a multiplier unit, an adder, a memory, and at least one pair of first and second accumulators. The accumulators include respective guard, high and low parts. The method and apparatus enable vectoring the respective first and second high parts from the accumulators to define a single vectored register responsive to a single instruction cycle and processing the data in the vectored register.Type: GrantFiled: September 8, 1997Date of Patent: September 3, 2002Assignee: Agere Systems Guardian Corp.Inventors: Mazhar M. Alidina, Sivanand Simanapalli, Larry R. Tate
-
Patent number: 6401194Abstract: A vector processor provides a data path divided into smaller slices of data, with each slice processed in parallel with the other slices. Furthermore, an execution unit provides smaller arithmetic and functional units chained together to execute more complex microprocessor instructions requiring multiple cycles by sharing single-cycle operations, thereby reducing both costs and size of the microprocessor. One embodiment handles 288-bit data widths using 36-bit data path slices. Another embodiment executes integer multiply and multiply-and-accumulate and floating point add/subtract and multiply operations using single-cycle arithmetic logic units. Other embodiments support 8-bit, 9-bit, 16-bit, and 32-bit integer data types and 32-bit floating data types.Type: GrantFiled: January 28, 1997Date of Patent: June 4, 2002Assignee: Samsung Electronics Co., Ltd.Inventors: Le Trong Nguyen, Heonchul Park, Roney S. Wong, Ted Nguyen, Edward H. Yu
-
Publication number: 20020032848Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: ApplicationFiled: August 1, 2001Publication date: March 14, 2002Applicant: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Publication number: 20020032710Abstract: According to the invention, a matrix of elements is processed in a processor. A first subset of matrix elements is loaded from a first location and a second subset of matrix elements is loaded from a second location. A third subset of matrix elements is stored in a first destination and a fourth subset of matrix elements is stored in a second destination. The loading and storing steps result from the same instruction issue.Type: ApplicationFiled: March 8, 2001Publication date: March 14, 2002Inventors: Ashley Saulsbury, Daniel S. Rice, Michael W. Parkin, Nyles Nettleton
-
Publication number: 20020026569Abstract: A method and apparatus for loading and storing vectors from and to memory, including embedding a location identifier in bits comprising a vector load and store instruction, wherein the location identifier indicates a location in the vector where useful data ends. The vector load instruction further includes a value field that indicates a particular constant for use by the load/store unit to set locations in the vector register beyond the useful data with the constant. By embedding the ending location of the useful date in the instruction, bandwidth and memory are saved by only requiring that the useful data in the vector be loaded and stored.Type: ApplicationFiled: August 1, 2001Publication date: February 28, 2002Applicant: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng
-
Publication number: 20020007449Abstract: A vector artchitecture processing unit according to the present invention comprises a vector scatter (VSC) address coincidence detection unit 3 that comprises registers in which an area start address and an area end address of an area specified by an area-specified vector scatter instruction are stored; and a circuit that checks if the addresses specified by the area-specified vector scatter instruction overlap with an address to be accessed by a memory access instruction following the area-specified vector scatter instruction, wherein an instruction issue control unit 1 comprises a hold control circuit that holds the following memory access instruction in response to an address conflict signal from the VSC address conflict detector.Type: ApplicationFiled: July 10, 2001Publication date: January 17, 2002Applicant: NEC CORPORATIONInventor: Hisao Koyanagi
-
Patent number: 6334176Abstract: The data processing system loads three input operands, including two input vectors and a control vector, into vector registers and performs a permutation of the two input vectors as specified by the control vector, and further stores the result of the operation as the output operand in an output register. The control vector consists of sixteen indices, each uniquely identifying a single byte of input data in either of the input registers, and can be specified in the operational code or be the result of a computation previously performed within the vector registers. The control vector is specified by calculating the offset of a selected vector element of the input vector relative to a base address of the input vector and loading each element with an index equal to the relative offset. Alternatively, the generation of the alignment vector is made by performing a look-up within a look-up table.Type: GrantFiled: April 17, 1998Date of Patent: December 25, 2001Assignees: Motorola, Inc., International Business Machines Corporation, Apple Computer, Inc.Inventors: Hunter Ledbetter Scales, III, Keith Everett Diefendorff, Brett Olsson, Pradeep Kumar Dubey, Ronald Ray Hochsprung
-
Patent number: 6282634Abstract: A floating point unit is provided with a register bank comprising 32 registers that may be used as either vector registers of scalar registers. A data processing instruction includes at least one register specifying field pointing to a register containing a data value to be used in that operation. An increase in the instruction bit space available to encode more opcodes or to allow for more registers is provided by encoding whether a register is to be treated as a vector or a scalar within the register field itself. Further, the register field for one register of the instruction may encode whether another register is a vector or a scalar. The registers can be initially accessed using the values within the register fields of the instruction independently of the opcode allowing for easier decode.Type: GrantFiled: May 27, 1998Date of Patent: August 28, 2001Assignee: ARM LimitedInventors: Christopher Neal Hinds, David Vivian Jaggar, David Terrence Matheny, David James Seal
-
Publication number: 20010014930Abstract: The invention relates to a new memory structure specially adapted for the storage of memory vectors. Each of the storage positions (#1, Mi-#M, Mi) of the memory has a length adapted to the length of large vectors and is parallelly arranged extending from an input and/or output for information and deeper into the memory. In this way each vector is stored undivided in a sequential order with the beginning of the vector at the input and/or output of the memory (memory field F1 in memory plane Mi). Addressing is made to the input and/or output of the memory. There are means (1IB-MIB, 1UB-MUB) acting like shift registers for the inputting and outputting of information in undivided sequence to/from the storage positions in the memory.Type: ApplicationFiled: December 8, 1997Publication date: August 16, 2001Inventor: INGEMAR SODERQUIST
-
Patent number: 6266758Abstract: The present invention provides alignment and ordering of vector elements for SIMD processing. In the alignment of vector elements for SIMD processing, one vector is loaded from a memory unit into a first register and another vector is loaded from the memory unit into a second register. The first vector contains a first byte of an aligned vector to be generated. Then, a starting byte specifying the first byte of an aligned vector is determined. Next, a vector is extracted from the first register and the second register beginning from the first bit in the first byte of the first register continuing through the bits in the second register. Finally, the extracted vector is replicated into a third register such that the third register contains a plurality of elements aligned for SIMD processing. In the ordering of vector elements for SIMD processing, a first vector is loaded from a memory unit into a first register and a second vector is loaded from the memory unit into a second register.Type: GrantFiled: March 5, 1999Date of Patent: July 24, 2001Assignee: MIPS Technologies, Inc.Inventors: Timothy J. van Hook, Perter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
-
Patent number: 6212617Abstract: A parallel programming system provides a lazy collection oriented data type that reduces inter-processor communication in programs executed on parallel computers. The lazy collection oriented data type is provided as a data type in a parallel programming language. The parallel language supports both data-parallel and control-parallel operations. These operations take advantage of the lazy collection oriented data type to defer or reduce inter-processor communication until an operation on the data type requires that it be balanced across a set of processors.Type: GrantFiled: June 30, 1998Date of Patent: April 3, 2001Assignee: Microsoft CorporationInventor: Jonathan C. Hardwick
-
Patent number: 6189094Abstract: A floating point unit having a register bank containing a plurality of registers supports vector operations that execute a specified operation a plurality of times upon a sequence of data values form different registers. The register bank is divided into subsets and with the sequence of registers used in a vector operation wrapping within a subset. The subsets comprise disjoint, contiguous ranges of register numbers. The wrapping within ranges allows compact code and efficient to be provided for performing DSP operations, such as FIR filtering and matrix transformations.Type: GrantFiled: May 27, 1998Date of Patent: February 13, 2001Assignee: Arm LimitedInventors: Christopher Neal Hinds, David Vivian Jaggar, David Terrence Matheny, David James Seal
-
Patent number: 6173366Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local CPU bus to a conventional processor. The MEU employs vector registers, a vector ALU, and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU. The vector instructions employ special load/store instructions in combination with numerous operational instructions to carry out concurrent multimedia operations on the aligned operands.Type: GrantFiled: December 2, 1996Date of Patent: January 9, 2001Assignees: Compaq Computer Corp., Advanced Micro Devices, Inc.Inventors: John S. Thayer, John G. Favor, Frederick D. Weber
-
Patent number: 6141673Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU.Type: GrantFiled: May 25, 1999Date of Patent: October 31, 2000Assignees: Advanced Micro Devices, Inc., Compaq Computer Corp.Inventors: John S. Thayer, John Gregory Favor, Frederick D. Weber
-
Patent number: 6098162Abstract: Vector shifting elements of a vector register by varying amounts in a single process is achieved in a vector supercomputer processor. A first vector register contains a set of operands, and a second vector register contains a set of shift counts, one shift count for each operand. Operands and shift counts are successively transferred to a vector shift functional unit, which shifts the operand by an amount equal to the value of the shift count. The shifted operands are stored in a third vector register. The vector shift functional unit also achieves word shifting of a predetermined number of vector register elements to different word locations of another vector register.Type: GrantFiled: August 24, 1998Date of Patent: August 1, 2000Assignee: Cray Research, Inc.Inventors: Alan J. Schiffleger, Ram K. Gupta, Christopher C. Hsiung
-
Patent number: 6088782Abstract: A Single Instruction Multiple Data processor apparatus for implementing algorithms using sliding window type data is shown. The implementation shifts the elements of a Destination Vector Register (406, 606) either automatically every time the destination register value is read or in response to a specific instruction (800). The shifting of the Destination Vector Register (406, 606) allows each processing element to operate on new data. As the destination vector (406, 606) elements are shifted, a new element is provided to the vector from a Source Vector Register (404, 604).Type: GrantFiled: July 10, 1997Date of Patent: July 11, 2000Assignee: Motorola Inc.Inventors: De-Lei Lee, L. Rodney Goke, William Carroll Anderson
-
Patent number: 6085305Abstract: A processor including at least one execution unit generating out-of-order results and out-of-order condition codes. Precise architectural state of the processor is maintained by providing a results buffer having a number of slots and providing a condition code buffer having the same number of slots as the results buffer, each slot in the condition code buffer in one-to-one correspondence with a slot in the results buffer. Each live instruction in the processor is assigned a slot in the results buffer and the condition code buffer. Each speculative result produced by the execution units is stored in the assigned slot in the results buffer. When an instruction is retired, the results for that instruction are transferred to an architectural result register and any condition codes generated by that instruction are transferred to an architectural condition code register.Type: GrantFiled: June 25, 1997Date of Patent: July 4, 2000Assignee: Sun Microsystems, Inc.Inventors: Ramesh Panwar, Arjun Prabhu
-
Patent number: 6073158Abstract: A system and method for time slicing multiple received data streams utilizing multiple processors in such a manner as to ensure that all processors are running at full capability and are efficiently timesharing a global memory storage area. The received data streams are each divided into fixed portions called spans. The invention is operable for sequencing the movement of the time-sliced spans between the processors, adjusting the scheduling of particular ones of the time-sliced spans as a function of either processor availability or maintenance of real-time transmission of the received real-time time-sliced data streams.Type: GrantFiled: July 29, 1993Date of Patent: June 6, 2000Assignee: Cirrus Logic, Inc.Inventors: Robert Marshall Nally, John Charles Schafer
-
Patent number: 6065114Abstract: A computer-implemented method of switching contexts in a processor is provided. The processor includes a register stack (RS) that has first and second portions. The processor includes a register stack engine (RSE) to exchange information, in one of instruction execution dependent and independent modes between the second portion and the storage area. The computer implemented method of switching contexts includes the following steps: It is determined whether an interrupt occurred; a first register (IFM) configured to store a content of a second register (CFM) is invalidated, the CFM is configured to store control information related to the first portion; it is determined whether an interrupt handler needs to access the RS; and if so, the IFM is validated, the content of the CFM is copied to the IFM, and RSE is caused to exchange information between both the first and second portions of the RS and the storage area.Type: GrantFiled: April 21, 1998Date of Patent: May 16, 2000Assignee: Idea CorporationInventors: Achmed Rumi Zahir, Jonathan K. Ross, Carol Thompson, Cary Coutant, Prasad Raje, Sunil Saxena