Patents by Inventor Gregory J. Faanes

Gregory J. Faanes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8601236
    Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.
    Type: Grant
    Filed: February 29, 2012
    Date of Patent: December 3, 2013
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
  • Patent number: 8307194
    Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: November 6, 2012
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
  • Publication number: 20120221830
    Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.
    Type: Application
    Filed: February 29, 2012
    Publication date: August 30, 2012
    Applicant: CRAY INC.
    Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
  • Publication number: 20120072704
    Abstract: A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system.
    Type: Application
    Filed: February 3, 2011
    Publication date: March 22, 2012
    Applicant: Cray Inc.
    Inventors: Timothy J. Johnson, Gregory J. Faanes
  • Publication number: 20100318741
    Abstract: A multiprocessor computer system comprises a processing node having a plurality of processors and a local memory shared among processors in the node. An L1 data cache is local to each of the plurality of processors, and an L2 cache is local to each of the plurality of processors. An L3 cache is local the node but shared among the plurality of processors, and the L3 cache is a subset of data stored in the local memory. The L2 caches are subsets of the L3 cache, and the L1 caches are a subset of the L2 caches in the respective processors.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes, Abdulla Bataineh, Michael Bye, Gerald A. Schwoerer, Dennis C. Abts
  • Patent number: 7743223
    Abstract: In a computer system having a plurality of processors connected to a shared memory, a system and method of decoupling an address from write data in a store to the shared memory. A write request address is generated for a memory write, wherein the write request address points to a memory location in shared memory. A write request is issued to the shared memory, wherein the write request includes the write request address. The write request address is noted in the shared memory and addresses in subsequent load and store requests are compared in share memory to the write request address. The write data is transferred to the shared memory and matched, within the shared memory, to the write request address. The write data is then stored into the shared memory as a function of the write request address.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: June 22, 2010
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes
  • Publication number: 20100115234
    Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Applicant: CRAY INC.
    Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
  • Publication number: 20100115236
    Abstract: A multiprocessor computer system having a plurality of processing elements comprises one or more core-level hierarchical shared semaphore registers, wherein each core-level hierarchical shared semaphore register is coupled to a different processor core. Each hierarchical shared semaphore register is writable to each of a plurality of streams executing on the coupled processor core. One or more chip-level hierarchical shared semaphore registers are also coupled to plurality of processor cores, each chip-level hierarchical shared semaphore register writable to each of the plurality of processor cores.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Applicant: Cray inc.
    Inventors: Abdulla Bataineh, James Robert Kohn, Eric P. Lundberg, Timothy J. Johnson, Thomas L. Court, Gregory J. Faanes, Steven L. Scott
  • Publication number: 20100115232
    Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
  • Publication number: 20090138680
    Abstract: A processor is operable to execute one or more vector atomic memory operations. A further embodiment provides support for atomic memory operations in a memory manger, which is operable to process atomic memory operations and to return a completion notification or a result.
    Type: Application
    Filed: November 28, 2007
    Publication date: May 28, 2009
    Inventors: Timothy J. Johnson, Gregory J. Faanes
  • Patent number: 7519771
    Abstract: A novel system and method for processing memory instructions. One embodiment of the invention provides a method for processing a memory instruction. In this embodiment, the method includes obtaining a memory request; storing the memory request in an Initial Request Queue (IRQ); and processing the memory request from the IRQ by a cache controller, wherein processing includes: identifying a type of the memory request, and processing the memory request in both a local cache and an Force Order Queue (FOQ), wherein processing includes determining if a portion of an address associated with the memory request matches one or more partial addresses in the FOQ and, if the memory request misses in the cache and the address does not match one or more partial addresses in the FOQ, adding the memory request to the FOQ and allocating a cache line in the local cache corresponding to the local cache miss.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: April 14, 2009
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg, Steven L. Scott, Robert J. Baird
  • Publication number: 20080288756
    Abstract: A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system.
    Type: Application
    Filed: May 18, 2007
    Publication date: November 20, 2008
    Inventors: Timothy J. Johnson, Gregory J. Faanes
  • Patent number: 7437521
    Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: October 14, 2008
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
  • Patent number: 7334110
    Abstract: In a computer system having a scalar processing unit and a vector processing unit, wherein the vector processing unit includes a vector dispatch unit, a system and method of decoupling operation of the scalar processing unit from that of the vector processing unit, the method comprising sending a vector instruction from the scalar processing unit to the vector dispatch unit, wherein sending includes marking the vector instruction as complete if the vector instruction is not a vector memory instruction and if the vector instruction does not require scalar operands, reading a scalar operand, wherein reading includes transferring the scalar operand from the scalar processing unit to the vector dispatch unit, predispatching the vector instruction within the vector dispatch unit if the vector instruction is scalar committed, dispatching the predispatched vector instruction if all required operands are ready, and executing the dispatched vector instruction as a function of the scalar operand.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: February 19, 2008
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Steven L. Scott, Eric P. Lundberg, William T. Moore, Jr., Timothy J. Johnson
  • Patent number: 6665774
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Grant
    Filed: October 16, 2001
    Date of Patent: December 16, 2003
    Assignee: Cray, Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg
  • Patent number: 6496902
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Grant
    Filed: December 31, 1998
    Date of Patent: December 17, 2002
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg
  • Publication number: 20020144061
    Abstract: A common scalar/vector data cache apparatus and method for a scalar/vector computer. One aspect of the present invention provides a computer system including a memory. The memory includes a plurality of sections. The computer system also includes a scalar/vector processor coupled to the memory using a plurality of separate address busses and a plurality of separate read-data busses wherein at least one of the sections of the memory is associated with each address bus and at least one of the sections of the memory is associated with each read-data bus. The processor further includes a plurality of scalar registers and a plurality of vector registers and operating on instructions which provide a reference address to a data word. The processor includes a scalar/vector cache unit that includes a cache array, and a FIFO unit that tracks (a.) an address in the cache array to which a read-data value will be placed when the read-data value is returned from the memory, and (b.
    Type: Application
    Filed: October 16, 2001
    Publication date: October 3, 2002
    Applicant: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg