Patents by Inventor Simon Steely

Simon Steely has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230008856
    Abstract: An DNN accelerator can perform fixed-point emulation of floating-point computation. In a multiplication operation on two floating-point matrices, the DNN accelerator determines an extreme exponent for a row in the first floating-point matrix and determines another extreme exponent for a column in the second floating-point matrix. The row and column can be converted to fixed-point vectors based on the extreme exponents. The two fixed-point vectors are fed into a PE array in the DNN accelerator. The PE array performs a multiplication operation on the two fixed-point vectors and generates a fixed-point inner product. The fixed-point inner product can be converted back to a floating-point inner product based on the extreme exponents. The floating-point inner product is an element in the matrix resulted from the multiplication operation on the two floating-point matrices. The matrix can be accumulated with another matrix resulted from a fixed-point emulation of a floating-point matrix multiplication.
    Type: Application
    Filed: September 5, 2022
    Publication date: January 12, 2023
    Inventors: Gregory Henry, Kermin Chofleming, Simon Steely, JR.
  • Patent number: 10817291
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Publication number: 20200310797
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, JR., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Publication number: 20200210358
    Abstract: Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described.
    Type: Application
    Filed: December 29, 2018
    Publication date: July 2, 2020
    Inventors: Kermin ChoFleming, Simon Steely, JR., Kent Glossop
  • Patent number: 10678724
    Abstract: Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described.
    Type: Grant
    Filed: December 29, 2018
    Date of Patent: June 9, 2020
    Assignee: Intel Corporation
    Inventors: Kermin ChoFleming, Simon Steely, Jr., Kent Glossop
  • Patent number: 10402176
    Abstract: Methods, apparatus, systems and articles of manufacture to compiler compile code to generate dataflow code are described. An example compiler apparatus includes an intermediate representation transformer to transform input software code to intermediate representation code; an instruction selector to insert machine instructions of a target execution platform in the intermediate representation code to generate machine intermediate representation code; and a target machine transformer to: convert a portion of the machine intermediate representation code to dataflow code to generate dataflow intermediate representation code; and allocate registers within the dataflow intermediate representation code.
    Type: Grant
    Filed: December 27, 2017
    Date of Patent: September 3, 2019
    Assignee: Intel Corporation
    Inventors: Kent Glossop, Kermin Fleming, Yongzhi Zhang, Simon Steely, Jr., Jim Sukha, Uma Srinivasan
  • Publication number: 20190042217
    Abstract: Methods, apparatus, systems and articles of manufacture to compiler compile code to generate dataflow code are described. An example compiler apparatus includes an intermediate representation transformer to transform input software code to intermediate representation code; an instruction selector to insert machine instructions of a target execution platform in the intermediate representation code to generate machine intermediate representation code; and a target machine transformer to: convert a portion of the machine intermediate representation code to dataflow code to generate dataflow intermediate representation code; and allocate registers within the dataflow intermediate representation code.
    Type: Application
    Filed: December 27, 2017
    Publication date: February 7, 2019
    Inventors: Kent Glossop, Kermin Fleming, Yongzhi Zhang, Simon Steely, JR., James Sukha, Uma Srinivasan
  • Patent number: 9740617
    Abstract: Methods and apparatuses to control cache line coherence are described. A hardware processor may include a first processor core with a cache to store a cache line, a second set of processor cores that each include a cache to store a copy of the cache line, and cache coherence logic to aggregate in a tag directory an acknowledgment message from each of the second set of processor cores in response to a request from the first processor core to modify the copy of the cache line in each of the second set of processor cores and send a consolidated acknowledgment message to the first processor core.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: August 22, 2017
    Assignee: Intel Corporation
    Inventors: Samantika Sury, Simon Steely, Jr., William Hasenplaugh, Joel Emer, David Webb
  • Patent number: 9727482
    Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.
    Type: Grant
    Filed: September 26, 2016
    Date of Patent: August 8, 2017
    Assignee: Intel Corporation
    Inventors: Simon Steely, Jr., Samantika S. Sury, William C. Hasenplaugh
  • Publication number: 20170010974
    Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.
    Type: Application
    Filed: September 26, 2016
    Publication date: January 12, 2017
    Inventors: Simon Steely, JR., Samantika S. Sury, William C. Hasenplaugh
  • Patent number: 9477610
    Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 25, 2016
    Assignee: Intel Corporation
    Inventors: Simon Steely, Jr., Samantika Subramaniam, William C. Hasenplaugh
  • Publication number: 20160179674
    Abstract: Methods and apparatuses to control cache line coherence are described. A hardware processor may include a first processor core with a cache to store a cache line, a second set of processor cores that each include a cache to store a copy of the cache line, and cache coherence logic to aggregate in a tag directory an acknowledgment message from each of the second set of processor cores in response to a request from the first processor core to modify the copy of the cache line in each of the second set of processor cores and send a consolidated acknowledgment message to the first processor core.
    Type: Application
    Filed: December 23, 2014
    Publication date: June 23, 2016
    Inventors: Samantika Sury, Simon Steely, JR., William Hasenplaugh, Joel Emer, David Webb
  • Patent number: 8769201
    Abstract: A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.
    Type: Grant
    Filed: December 2, 2008
    Date of Patent: July 1, 2014
    Assignee: Intel Corporation
    Inventors: William Hasenplaugh, Joel Emer, Tryggve Fossum, Aamer Jaleel, Simon Steely
  • Publication number: 20130339621
    Abstract: Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 19, 2013
    Inventors: Simon Steely, JR., Samantika Subramaniam, William C. Hasenplaugh
  • Publication number: 20100138609
    Abstract: A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.
    Type: Application
    Filed: December 2, 2008
    Publication date: June 3, 2010
    Inventors: William Hasenplaugh, Joel Emer, Tryggve Fossum, Aamer Jaleel, Simon Steely
  • Publication number: 20060230233
    Abstract: A technique to share cache lines among a plurality of bus agents. Embodiments of the invention comprise at least one technique to allow a number of agents, such as a processor or software program being executed by a processor, within a computer system or computer network to transfer ownership of a locked (“owned”) cache line, under certain circumstances, without incurring as much of the operational overhead and resulting performance degradation of many prior art techniques.
    Type: Application
    Filed: April 11, 2005
    Publication date: October 12, 2006
    Inventors: Simon Steely, Stephen Doren
  • Publication number: 20060143400
    Abstract: An embodiment of the present invention is a technique to perform replacement in a non-uniform access cache structure. A cache memory stores data and associated tags in a non-uniform access manner. The cache memory has a plurality of memory banks arranged according to a distance hierarchy with respect to one of a processor and a processor core. The distance hierarchy includes a lowest latency bank and a highest latency bank. A controller performs a non-uniform pseudo least recently used (LRU) replacement on the cache memory.
    Type: Application
    Filed: December 29, 2004
    Publication date: June 29, 2006
    Inventor: Simon Steely
  • Publication number: 20060041724
    Abstract: A technique to share cache lines among a plurality of bus agents. Embodiments of the invention comprise at least one technique to allow a number of agents, such as a processor or software program being executed by a processor, within a computer system or computer network to access a locked (“owned”) cache line, under certain circumstances, without incurring as much of the operational overhead and resulting performance degradation of many prior art techniques.
    Type: Application
    Filed: August 17, 2004
    Publication date: February 23, 2006
    Inventors: Simon Steely, Stephen Van Doren
  • Publication number: 20050240731
    Abstract: Methods for storing replacement data in a multi-way associative cache are disclosed. One method comprises logically dividing the cache's cache sets into segments of at least one cache way; searching a cache set in accordance with a segment search sequence for a segment currently comprising a way which has not yet been accessed during a current cycle of the segment search sequence; searching the current segment in accordance with a way search sequence for a way which has not yet been accessed during a current way search cycle; and storing the replacement data in a first way which has not yet been accessed during a current cycle of the way search sequence. A cache controller that performs such methods is also disclosed.
    Type: Application
    Filed: April 22, 2004
    Publication date: October 27, 2005
    Inventor: Simon Steely
  • Publication number: 20050198187
    Abstract: A multi-processor system includes a requesting node that provides a first request for data to a home node. The requesting node being operative to provide a second request for the data to at least one predicted node in parallel with first request. The requesting node receives at least one coherent copy of the data from at least one of the home node and the at least one predicted node.
    Type: Application
    Filed: January 15, 2004
    Publication date: September 8, 2005
    Inventors: Gregory Tierney, Simon Steely