Scalar/vector Processor Interface Patents (Class 712/3)
  • Patent number: 7676647
    Abstract: A processor device is disclosed that includes a register file with a combined condition code register for scalar and vector operations. The processor device utilizes the combined condition code register for scalar and vector operations. Further, a compare operation can store resulting bits in the combined condition code register and a conditional operation can utilize the combined condition code register bits for evaluating a condition.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: March 9, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Lucian Codrescu, Erich Plondke, Taylor Simpson
  • Patent number: 7661006
    Abstract: A computer implemented method, apparatus, and computer program product for managing symmetric multiprocessor interconnects. The process identifies functional communication connections between each processor in a plurality of processors on a multiprocessor to form identified functional communication connections. The process maps every functional communication connection between any two processors in the plurality of processors, based on the identified functional communication connections, to form an interconnect matrix. The process creates a path map using the interconnect matrix. The path map comprises a sequence of communication connections between the plurality of processors. The process initializes the plurality of processors using the path map.
    Type: Grant
    Filed: January 9, 2007
    Date of Patent: February 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Luai A. Abou-Emara, Mark David McLaughlin, Jorge N. Yanez
  • Publication number: 20090228657
    Abstract: An apparatus includes a vector unit to process a vector data, a cache memory which includes a plurality of cache lines to store a plurality of divisional data being sent from a main memory, each of the divisional data of vector data having been divided according to a capacity of a cache line, and a cache controller to send all of the divisional data as the vector data to the vector unit after the cache lines have stored all of the divisional data including the vector data.
    Type: Application
    Filed: February 6, 2009
    Publication date: September 10, 2009
    Applicant: NEC Corporation
    Inventor: Takashi Hagiwara
  • Publication number: 20090150647
    Abstract: A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.
    Type: Application
    Filed: December 7, 2007
    Publication date: June 11, 2009
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7457938
    Abstract: In one embodiment, the present invention includes a method for executing an operation on low order portions of first and second source operands using a first execution stack of a processor and executing the operation on high order portions of the first and second source operands using a second execution stack of the processor, where the operation in the second execution stack is staggered by one or more cycles from the operation in the first execution stack. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: November 25, 2008
    Assignee: Intel Corporation
    Inventors: Stephan Jourdan, Avinash Sodani, Michael Fetterman, Per Hammarlund, Ronak Singhal, Glenn Hinton
  • Patent number: 7437521
    Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: October 14, 2008
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
  • Publication number: 20080222388
    Abstract: The dynamic efficient and accurate simulation of processor status flags is described. One exemplary embodiment includes simulation of processor status flags of a first CPU type on a second CPU type using simple arithmetic operations to calculate status flags in parallel, and by keeping an intermediate state that allows efficient calculation of status flags when they are needed. In this way, sufficient intermediate state exists to generate desired status flags either directly or with a simple operation.
    Type: Application
    Filed: March 5, 2007
    Publication date: September 11, 2008
    Applicant: Microsoft Corporation
    Inventor: Darek Mihocka
  • Patent number: 7350057
    Abstract: Described herein is a method and system for executing instructions. The system comprises a scalar unit for executing scalar instructions each defining a single value pair; a vector unit for executing vector instructions each defining multiple value pairs; and an instruction decoder for receiving a single stream of instructions including scalar instructions and vector instructions and operable to direct scalar instructions to the scalar unit and vector instructions to the vector unit. The vector unit can comprises a plurality of value processing units and a scalar result unit. The scalar unit can comprise a scalar register file. Communication between the vector unit and the scalar unit is enabled by allowing the vector unit to access the scalar register file and allowing the scalar unit to access output from the scalar result unit. The output of the scalar result unit may be based on the relative magnitudes of outputs from the plurality of value processing units.
    Type: Grant
    Filed: November 6, 2006
    Date of Patent: March 25, 2008
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 7334110
    Abstract: In a computer system having a scalar processing unit and a vector processing unit, wherein the vector processing unit includes a vector dispatch unit, a system and method of decoupling operation of the scalar processing unit from that of the vector processing unit, the method comprising sending a vector instruction from the scalar processing unit to the vector dispatch unit, wherein sending includes marking the vector instruction as complete if the vector instruction is not a vector memory instruction and if the vector instruction does not require scalar operands, reading a scalar operand, wherein reading includes transferring the scalar operand from the scalar processing unit to the vector dispatch unit, predispatching the vector instruction within the vector dispatch unit if the vector instruction is scalar committed, dispatching the predispatched vector instruction if all required operands are ready, and executing the dispatched vector instruction as a function of the scalar operand.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: February 19, 2008
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Steven L. Scott, Eric P. Lundberg, William T. Moore, Jr., Timothy J. Johnson
  • Patent number: 7305540
    Abstract: Methods and apparatuses for a data processing system are described herein. In one aspect of the invention, an exemplary apparatus includes a chip interconnect, a memory controller for controlling the host memory comprising DRAM memory, the memory controller coupled to the chip interconnect, a scalar processing unit coupled the chip interconnect wherein the scalar processing unit is capable of executing instructions to perform scalar data processing, a vector processing unit coupled the chip interconnect wherein the vector processing unit is capable of executing instructions to perform vector data processing, and an input/output (I/O) interface coupled to the chip interconnect wherein the I/O interface receives/transmits data from/to the scalar and/or vector processing units.
    Type: Grant
    Filed: December 31, 2001
    Date of Patent: December 4, 2007
    Assignee: Apple Inc.
    Inventors: Sushma Shrikant Trivedi, Joseph P. Bratt, Jack Benkual, Vaughn Todd Arnold, Derek Fujio Iwamoto
  • Patent number: 7254696
    Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.
    Type: Grant
    Filed: December 12, 2002
    Date of Patent: August 7, 2007
    Assignee: Alacritech, Inc.
    Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
  • Patent number: 7032101
    Abstract: An apparatus and method in a high performance processor for issuing instructions, comprising; a classification logic for sorting instructions in a number of priority categories, a plurality of instruction queues storing the instruction of differing priorities, and a issue logic selecting from which queue to dispatch instructions for execution. This apparatus and method can be implemented in both in-order, and out-of-order execution processor architectures. The invention also involves instruction cloning, and use of various predictive techniques.
    Type: Grant
    Filed: February 26, 2002
    Date of Patent: April 18, 2006
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Valentina Salapura
  • Patent number: 7028145
    Abstract: Protocol processor intended to be associated with at least one main processor of a system with a view to the execution of tasks to which the main processor is not suited. The protocol processor comprises a program part (30) including an incrementation register (31), a program memory (33) connected to the incrementation register (31) in order to receive addresses thereof, a decoding part (35) intended to receive instructions from the program memory (33) of the program part (30) with a view to executing an instruction in two cycles, and a data part (36) for executing the instruction.
    Type: Grant
    Filed: July 10, 1997
    Date of Patent: April 11, 2006
    Inventors: Gerard Chauvel, Francis Aussedat, Pierre Calippe
  • Patent number: 6963341
    Abstract: The present invention provides efficient ways to implement scan conversion and matrix transpose operations using vector multiplex operations in a SIMD processor. The present method provides a very fast and flexible way to implement different scan conversions, such as zigzag conversion, and matrix transpose for 2×2, 4×4, 8×8 blocks commonly used by all video compression and decompression algorithms.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: November 8, 2005
    Inventor: Tibet Mimar
  • Patent number: 6857061
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Grant
    Filed: April 7, 2000
    Date of Patent: February 15, 2005
    Assignee: Nintendo Co., Ltd.
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng
  • Patent number: 6839889
    Abstract: A method of implementing a scaleable architecture for a communications system is disclosed, based on minimizing a total gate count for the communications system to reduce cost, complexity, etc. The method considers the requirements of particular communications transmission process that is dividable into individual transmission tasks. A computational complexity for each of said N individual transmission tasks respectively, said computational complexity being based on a number of instructions per second (MIPs) required by a computational circuit to perform each of said N individual transmission tasks; a number of gates and/or transistors required to implement each of individual transmission task using a hardware based or software based computing circuit, etc. After determining an effective number of MIPs acheivable by such circuits, the N tasks are allocated in a gate efficient manner for a final design architecture, or for a working implementation in the field.
    Type: Grant
    Filed: March 1, 2001
    Date of Patent: January 4, 2005
    Assignee: Realtek Semiconductor Corp.
    Inventor: Ming-Kang Liu
  • Patent number: 6839828
    Abstract: There is provided a processor designed to operate in a plurality of modes for processing vector and scalar instructions. Register files are each for storing scalar and vector data and address information. A parallel vector unit, coupled to the register files, includes functional units configurable to operate in a vector operation mode and a scalar operation mode. The vector unit includes an apparatus for tightly coupling the functional units to perform an operation specified by a current instruction. Under a vector operation mode, the vector unit performs, in parallel, a single vector operation on a plurality of data elements. The operations performed on the plurality of data elements are each performed by a different functional unit of the vector unit. Under a scalar operation mode, the vector unit performs a scalar operation on a data element received from the register files in a functional unit within the vector unit.
    Type: Grant
    Filed: August 14, 2001
    Date of Patent: January 4, 2005
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin Edward Hopkins
  • Publication number: 20040193837
    Abstract: A data processing system includes left and right data path processors coupled to an instruction cache. The left and right data path processors, respectively, are configured to execute left and right instruction words received in a single clock cycle from the instruction cache. The left and right data path processors are also configured to operate in a scalar mode and a vector mode. The processors (a) execute the left and right instruction words as two separate instructions in the scalar mode, and (b) execute the left and right instruction words as one instruction in the vector mode.
    Type: Application
    Filed: March 31, 2003
    Publication date: September 30, 2004
    Inventors: Patrick Devaney, David M. Keaton, Katsumi Murai
  • Publication number: 20040193838
    Abstract: A processing system includes left and right data path processors configured to execute instructions issued from an instruction cache. A vector instruction includes a first word configured for execution by the left data path processor and a second word configured for execution by the right data path processor. The first and second words are issued in the same clock cycle from the instruction cache, and are interlocked to jointly specify a single vector instruction. The first and second words include code for vector operation and code for vector control. The first and second words are concurrently executed to complete the vector operation, free-of any other instructions issued from the instruction cache.
    Type: Application
    Filed: March 31, 2003
    Publication date: September 30, 2004
    Inventors: Patrick Devaney, David M. Keaton, Katsumi Murai
  • Patent number: 6734874
    Abstract: A method, apparatus and article of manufacture are provided for handling both scalar and vector components during graphics processing. To accomplish this, vertex data is received in the form of vectors after which vector operations are performed on the vector vertex data. Next, scalar operations may be executed on an output of the vector operations, thereby rendering vertex data in the form of scalars. Such scalar vertex data may then be converted to vector vertex data for performing vector operations thereon.
    Type: Grant
    Filed: January 31, 2001
    Date of Patent: May 11, 2004
    Assignee: nVidia Corporation
    Inventors: John Erik Lindholm, Simon Moy, David B. Kirk, Paolo E. Sabella
  • Publication number: 20040078493
    Abstract: A system and method for enabling high-speed, low-latency global tree communications among processing nodes interconnected according to a tree network structure. The global tree network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations include one or more of: global broadcast operations downstream from a root node to leaf nodes of a virtual tree, global reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node in the virtual tree.
    Type: Application
    Filed: August 22, 2003
    Publication date: April 22, 2004
    Inventors: Matthias A Blumrich, Dong Chen, Paul W Coteus, Alan G Gara, Mark E Giampapa, Philip Heidelberger, Dirk Hoenicke, Burkhard D Steinmacher-Burow, Todd E Takken, Pavlos M Vranas
  • Patent number: 6665774
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Grant
    Filed: October 16, 2001
    Date of Patent: December 16, 2003
    Assignee: Cray, Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg
  • Patent number: 6609189
    Abstract: The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU.
    Type: Grant
    Filed: March 12, 1999
    Date of Patent: August 19, 2003
    Assignee: Yale University
    Inventors: Bradley C. Kuszmaul, Dana Sue Henry-Kuszmaul
  • Patent number: 6530011
    Abstract: A method and an apparatus for implementing mixed scalar and vector values in a digital processing system. In one embodiment, a digital processing system, which contains processing unit and memories, is capable of identifying a first data in a first scalar register and a second data in a vector register. Upon fetching the first data as a first operand and the second data as a second operand, the processing unit performs an operation between the first and second operands in response to an operator. After operations, the result is subsequently stored in a second scalar register.
    Type: Grant
    Filed: October 20, 1999
    Date of Patent: March 4, 2003
    Assignee: SandCraft, Inc.
    Inventor: Jack H. Choquette
  • Publication number: 20030037221
    Abstract: An improved processor implementation is described in which scalar and vector processing components are merged to reduce complexity. In particular, the implementation includes a scalar-vector register file for storing scalar and vector instructions, as well as a parallel vector unit comprising functional units that can process vector or scalar instructions as required. A further aspect of the invention provides the ability to disable unused functional units in the parallel vector unit, such as during a scalar operation, to achieve significant power savings.
    Type: Application
    Filed: August 14, 2001
    Publication date: February 20, 2003
    Applicant: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin Edward Hopkins
  • Patent number: 6496902
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Grant
    Filed: December 31, 1998
    Date of Patent: December 17, 2002
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg
  • Publication number: 20020099922
    Abstract: The memory access arithmetic operation instruction is executed in the data processing apparatus including a memory access pipeline and arithmetic operation pipeline. The decoding and development of the memory access arithmetic operation are carried out after the memory access arithmetic operation instruction is input to the memory access pipeline and the memory access results and the memory access arithmetic instruction are output to the arithmetic operation pipeline.
    Type: Application
    Filed: January 13, 1999
    Publication date: July 25, 2002
    Applicant: Fujitsu Limited
    Inventor: Mariko SAKAMOTO
  • Patent number: 6266686
    Abstract: A method in a computer system which includes receiving a first instruction which indicates indicates termination of execution of instructions which operate upon packed data stored in a first storage area. The first storage area is used for modifying data responsive to execution of floating point instructions. A plurality of tags is associated with the first storage area indicating that locations in the first storage area are either empty or non-empty responsive to the execution of the floating point instructions which modify data contained in the first storage area. Responsive to the receiving of the first instruction which indicates termination of execution of instructions which operate upon the packed data stored in the first storage area, the method sets only the plurality of tags to an empty state. In different embodiments, setting of the plurality of tags to a non-empty state occurs responsive to receiving a second instruction.
    Type: Grant
    Filed: March 4, 1999
    Date of Patent: July 24, 2001
    Assignee: Intel Corporation
    Inventors: David Bistry, Larry Mennemeier, Alexander D. Peleg, Carole Dulong, Eiichi Kowashi, Millind Mittal, Benny Eitan
  • Patent number: 6219775
    Abstract: A massively-parallel computer includes a plurality of processing nodes and at least one control node interconnected by a network. The network faciliates the transfer of data among the processing nodes and of commands from the control node to the processing nodes. Each processing node includes an interface for transmitting data over, and receiving data and commands from, the network, at least one memory module for storing data, a node processor and an auxiliary processor. The node processor receives commands received by the interface and processes data in response thereto, in the process generating memory access requests for facilitating the retrieval of data from or storage of data in the memory module. The node processor further controlling the transfer of data over the network by the interface. The auxiliary processor is connected to the memory module and the node processor.
    Type: Grant
    Filed: March 18, 1998
    Date of Patent: April 17, 2001
    Assignee: Thinking Machines Corporation
    Inventors: Jon P. Wade, Daniel R. Cassiday, Robert D. Lordi, Guy Lewis Steele, Jr., Margaret A. St. Pierre, Monica C. Wong-Chan, Zahi S. Abuhamdeh, David C. Douglas, Mahesh N. Ganmukhi, Jeffrey V. Hill, W. Daniel Hillis, Scott J. Smith, Shaw-Wen Yang, Robert C. Zak, Jr.
  • Patent number: 6212617
    Abstract: A parallel programming system provides a lazy collection oriented data type that reduces inter-processor communication in programs executed on parallel computers. The lazy collection oriented data type is provided as a data type in a parallel programming language. The parallel language supports both data-parallel and control-parallel operations. These operations take advantage of the lazy collection oriented data type to defer or reduce inter-processor communication until an operation on the data type requires that it be balanced across a set of processors.
    Type: Grant
    Filed: June 30, 1998
    Date of Patent: April 3, 2001
    Assignee: Microsoft Corporation
    Inventor: Jonathan C. Hardwick
  • Patent number: 6141673
    Abstract: A multimedia extension unit (MEU) is provided for performing various multimedia-type operations. The MEU can be coupled either through a coprocessor bus or a local central processing unit (CPU) bus to a conventional processor. The MEU employs vector registers, a vector arithmetic logic unit (ALU), and an operand routing unit (ORU) to perform a maximum number of the multimedia operations within as few instruction cycles as possible. Complex algorithms are readily performed by arranging operands upon the vector ALU in accordance with the desired algorithm flowgraph. The ORU aligns the operands within partitioned slots or sub-slots of the vector registers using vector instructions unique to the MEU. At the output of the ORU, operand pairs from vector source or destination registers can be easily routed and combined at the vector ALU.
    Type: Grant
    Filed: May 25, 1999
    Date of Patent: October 31, 2000
    Assignees: Advanced Micro Devices, Inc., Compaq Computer Corp.
    Inventors: John S. Thayer, John Gregory Favor, Frederick D. Weber
  • Patent number: 6106573
    Abstract: A microprocessor implements an instruction tracing mechanism that saves the state of the microprocessor without special hardware. Prior to the execution of a traced instruction, a trace microcode routine is implemented that saves the state of the microprocessor to external memory. The state information saved by the trace microcode routine varies depending upon the amount of data needed by the end user. After the state of the processor has been saved, the trace instruction is executed. State information that changed during the execution of the trace instruction is saved to memory prior to a subsequent instruction. The trace instruction mechanism advantageously requires minimal special hardware and expedites the saving of the processor state information.
    Type: Grant
    Filed: May 14, 1999
    Date of Patent: August 22, 2000
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Rupaka Mahalingaiah, James K. Pickett
  • Patent number: 6073158
    Abstract: A system and method for time slicing multiple received data streams utilizing multiple processors in such a manner as to ensure that all processors are running at full capability and are efficiently timesharing a global memory storage area. The received data streams are each divided into fixed portions called spans. The invention is operable for sequencing the movement of the time-sliced spans between the processors, adjusting the scheduling of particular ones of the time-sliced spans as a function of either processor availability or maintenance of real-time transmission of the received real-time time-sliced data streams.
    Type: Grant
    Filed: July 29, 1993
    Date of Patent: June 6, 2000
    Assignee: Cirrus Logic, Inc.
    Inventors: Robert Marshall Nally, John Charles Schafer
  • Patent number: 6061776
    Abstract: A distributed memory computer architecture associates separate memory blocks with their own processors, each of which executes the same program. A processor fetching data or instructions from its local memory also broadcasts that fetched data or instruction to the other processors to cut the time required for them to request this data. Runs of instruction and data local to one processor providing improved performance that is captured by the system as a whole by the ability of the other processors not executing local data or instructions to execute instructions out of order and return to find the data ready in buffer for rapid use.
    Type: Grant
    Filed: April 21, 1999
    Date of Patent: May 9, 2000
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Douglas C. Burger, Stefanos Kaxiras, James R. Goodman
  • Patent number: 6055620
    Abstract: A control apparatus and method is provided for controlling operations of functional units in systems. The control apparatus and method implement a set of operations that can include dependencies between the functional units of a system to complete each operation. For example, in an asynchronous digital processor, self-timing and inter-block communication are used to implement a self-timed scheduler. The self-timed scheduler and method implement an instruction set using a plurality of functional units of the asynchronous digital processor. A scheduler can include a scheduler decoder that decodes each instruction to generate functional unit schedule and control information, a communication device and a plurality of scheduler functional unit controllers, wherein each of the scheduler functional unit controllers corresponds to one of the plurality of functional units of a system.
    Type: Grant
    Filed: September 18, 1997
    Date of Patent: April 25, 2000
    Assignees: LG Semicon Co., Ltd., Cogency Technology Incorporated
    Inventors: Nigel C. Paver, Paul Day
  • Patent number: 6052769
    Abstract: A method comprises decoding a single instruction having a first operand identifying a plurality of bytes of packed data and a second operand identifying a corresponding plurality of byte masks. Each of the plurality of byte masks identified by the second operand of the single decoded instruction are analyzed, wherein select bytes of the plurality of bytes identified by the first operand are moved to an implicitly defined location based, at least in part, on the analysis of the individual byte masks identified by the second operand of the single decoded instruction.
    Type: Grant
    Filed: March 31, 1998
    Date of Patent: April 18, 2000
    Assignee: Intel Corporation
    Inventors: Thomas R. Huff, Shreekant Thakkar, Nathaniel Hoffman
  • Patent number: 6044453
    Abstract: A programmable circuit and method for a data processing apparatus is provided that allows an entire instruction or instruction set to be modified. According to the present invention, the instruction can be modified, for example, during initialization or execution. The programmable circuit for a data processing apparatus can include a plurality of functional units, each functional unit performing a set of prescribed operations. A programmable circuit that is capable of modifying an entire instruction. A controller that decodes a current instruction to perform a corresponding instruction task using the plurality of functional units and a communications device coupling the functional units, the programmable circuit and the controller.
    Type: Grant
    Filed: September 30, 1997
    Date of Patent: March 28, 2000
    Assignees: LG Semicon Co., Ltd., Cogency Technology Incorporated
    Inventor: Nigel C. Paver
  • Patent number: 6038654
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instructions in-order.
    Type: Grant
    Filed: June 23, 1999
    Date of Patent: March 14, 2000
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 5991865
    Abstract: A routable operand and selectable operation processor multimedia extension unit is employed to motion compensate MPEG video using improved vector processing. A vector processing unit executes an add and divide instruction that adds two vector registers and divides the result in a single instruction. This is implemented through loading a first vector register with a first plurality of elements from a source block. A second vector register is then loaded with a second plurality of elements that are adjacent to the first plurality of elements. The add and divide instruction is then executed on the first and second vector registers, yielding an interpolated source element that is stored in a resultant vector register.
    Type: Grant
    Filed: December 31, 1996
    Date of Patent: November 23, 1999
    Assignee: Compaq Computer Corporation
    Inventors: Brian E. Longhenry, Gary W. Thome, John S. Thayer
  • Patent number: 5935230
    Abstract: At least two clusters of CPUs are present in a multiprocessor computer system. Each CPU cluster has a given number of CPUs, each CPU having an associated ID such as an ID number. An additional ID number, not associated with a CPU in the same cluster, is associated with the opposite CPU cluster that appears to the original cluster as a "phantom" processor. A round-robin bus arbitration scheme allows ordered ownership of a common bus within a first cluster until the ID reaches the "phantom" processor, at which time bus ownership passes to a CPU in the second cluster. This arrangement is preferably symmetric, so that when a CPU from the first cluster requests ownership of the bus, it is granted bus ownership by virtue of the first cluster's appearance to the second cluster as a "phantom" CPU.
    Type: Grant
    Filed: July 9, 1997
    Date of Patent: August 10, 1999
    Assignee: Amiga Development, LLC
    Inventors: Felix Pinai, Manhtien Phan