Patents by Inventor Gregory J. Faanes
Gregory J. Faanes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8601236Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.Type: GrantFiled: February 29, 2012Date of Patent: December 3, 2013Assignee: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
-
Patent number: 8307194Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.Type: GrantFiled: August 18, 2003Date of Patent: November 6, 2012Assignee: Cray Inc.Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
-
Publication number: 20120221830Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.Type: ApplicationFiled: February 29, 2012Publication date: August 30, 2012Applicant: CRAY INC.Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
-
Publication number: 20120072704Abstract: A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system.Type: ApplicationFiled: February 3, 2011Publication date: March 22, 2012Applicant: Cray Inc.Inventors: Timothy J. Johnson, Gregory J. Faanes
-
Publication number: 20100318741Abstract: A multiprocessor computer system comprises a processing node having a plurality of processors and a local memory shared among processors in the node. An L1 data cache is local to each of the plurality of processors, and an L2 cache is local to each of the plurality of processors. An L3 cache is local the node but shared among the plurality of processors, and the L3 cache is a subset of data stored in the local memory. The L2 caches are subsets of the L3 cache, and the L1 caches are a subset of the L2 caches in the respective processors.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Cray Inc.Inventors: Steven L. Scott, Gregory J. Faanes, Abdulla Bataineh, Michael Bye, Gerald A. Schwoerer, Dennis C. Abts
-
Patent number: 7743223Abstract: In a computer system having a plurality of processors connected to a shared memory, a system and method of decoupling an address from write data in a store to the shared memory. A write request address is generated for a memory write, wherein the write request address points to a memory location in shared memory. A write request is issued to the shared memory, wherein the write request includes the write request address. The write request address is noted in the shared memory and addresses in subsequent load and store requests are compared in share memory to the write request address. The write data is transferred to the shared memory and matched, within the shared memory, to the write request address. The write data is then stored into the shared memory as a function of the write request address.Type: GrantFiled: August 18, 2003Date of Patent: June 22, 2010Assignee: Cray Inc.Inventors: Steven L. Scott, Gregory J. Faanes
-
Publication number: 20100115234Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Applicant: CRAY INC.Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
-
Publication number: 20100115236Abstract: A multiprocessor computer system having a plurality of processing elements comprises one or more core-level hierarchical shared semaphore registers, wherein each core-level hierarchical shared semaphore register is coupled to a different processor core. Each hierarchical shared semaphore register is writable to each of a plurality of streams executing on the coupled processor core. One or more chip-level hierarchical shared semaphore registers are also coupled to plurality of processor cores, each chip-level hierarchical shared semaphore register writable to each of the plurality of processor cores.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Applicant: Cray inc.Inventors: Abdulla Bataineh, James Robert Kohn, Eric P. Lundberg, Timothy J. Johnson, Thomas L. Court, Gregory J. Faanes, Steven L. Scott
-
Publication number: 20100115232Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
-
Publication number: 20090138680Abstract: A processor is operable to execute one or more vector atomic memory operations. A further embodiment provides support for atomic memory operations in a memory manger, which is operable to process atomic memory operations and to return a completion notification or a result.Type: ApplicationFiled: November 28, 2007Publication date: May 28, 2009Inventors: Timothy J. Johnson, Gregory J. Faanes
-
Patent number: 7519771Abstract: A novel system and method for processing memory instructions. One embodiment of the invention provides a method for processing a memory instruction. In this embodiment, the method includes obtaining a memory request; storing the memory request in an Initial Request Queue (IRQ); and processing the memory request from the IRQ by a cache controller, wherein processing includes: identifying a type of the memory request, and processing the memory request in both a local cache and an Force Order Queue (FOQ), wherein processing includes determining if a portion of an address associated with the memory request matches one or more partial addresses in the FOQ and, if the memory request misses in the cache and the address does not match one or more partial addresses in the FOQ, adding the memory request to the FOQ and allocating a cache line in the local cache corresponding to the local cache miss.Type: GrantFiled: August 18, 2003Date of Patent: April 14, 2009Assignee: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg, Steven L. Scott, Robert J. Baird
-
Publication number: 20080288756Abstract: A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system.Type: ApplicationFiled: May 18, 2007Publication date: November 20, 2008Inventors: Timothy J. Johnson, Gregory J. Faanes
-
Patent number: 7437521Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.Type: GrantFiled: August 18, 2003Date of Patent: October 14, 2008Assignee: Cray Inc.Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
-
Patent number: 7334110Abstract: In a computer system having a scalar processing unit and a vector processing unit, wherein the vector processing unit includes a vector dispatch unit, a system and method of decoupling operation of the scalar processing unit from that of the vector processing unit, the method comprising sending a vector instruction from the scalar processing unit to the vector dispatch unit, wherein sending includes marking the vector instruction as complete if the vector instruction is not a vector memory instruction and if the vector instruction does not require scalar operands, reading a scalar operand, wherein reading includes transferring the scalar operand from the scalar processing unit to the vector dispatch unit, predispatching the vector instruction within the vector dispatch unit if the vector instruction is scalar committed, dispatching the predispatched vector instruction if all required operands are ready, and executing the dispatched vector instruction as a function of the scalar operand.Type: GrantFiled: August 18, 2003Date of Patent: February 19, 2008Assignee: Cray Inc.Inventors: Gregory J. Faanes, Steven L. Scott, Eric P. Lundberg, William T. Moore, Jr., Timothy J. Johnson
-
Patent number: 6665774Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: GrantFiled: October 16, 2001Date of Patent: December 16, 2003Assignee: Cray, Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg
-
Patent number: 6496902Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: GrantFiled: December 31, 1998Date of Patent: December 17, 2002Assignee: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg
-
Publication number: 20020144061Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.Type: ApplicationFiled: October 16, 2001Publication date: October 3, 2002Applicant: Cray Inc.Inventors: Gregory J. Faanes, Eric P. Lundberg