Patents by Inventor Kermin ChoFleming

Kermin ChoFleming has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12332802
    Abstract: An embodiment of an integrated circuit comprises circuitry to generate a cache tag for data to be stored in a cache memory, store a first portion of the cache tag in a primary tag memory, and store a second portion of the cache tag in a secondary tag memory, wherein a size of the first portion is smaller than a size of the second portion. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: June 17, 2025
    Assignee: Intel Corporation
    Inventors: Kermin ChoFleming, Yu Bai, Ping Zou
  • Patent number: 12204478
    Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.
    Type: Grant
    Filed: March 19, 2021
    Date of Patent: January 21, 2025
    Assignee: Intel Corporation
    Inventors: Swapna Raj, Samantika S. Sury, Kermin Chofleming, Simon C. Steely, Jr.
  • Publication number: 20230008856
    Abstract: An DNN accelerator can perform fixed-point emulation of floating-point computation. In a multiplication operation on two floating-point matrices, the DNN accelerator determines an extreme exponent for a row in the first floating-point matrix and determines another extreme exponent for a column in the second floating-point matrix. The row and column can be converted to fixed-point vectors based on the extreme exponents. The two fixed-point vectors are fed into a PE array in the DNN accelerator. The PE array performs a multiplication operation on the two fixed-point vectors and generates a fixed-point inner product. The fixed-point inner product can be converted back to a floating-point inner product based on the extreme exponents. The floating-point inner product is an element in the matrix resulted from the multiplication operation on the two floating-point matrices. The matrix can be accumulated with another matrix resulted from a fixed-point emulation of a floating-point matrix multiplication.
    Type: Application
    Filed: September 5, 2022
    Publication date: January 12, 2023
    Inventors: Gregory Henry, Kermin Chofleming, Simon Steely, JR.
  • Publication number: 20220405209
    Abstract: An embodiment of an integrated circuit comprises circuitry to generate a cache tag for data to be stored in a cache memory, store a first portion of the cache tag in a primary tag memory, and store a second portion of the cache tag in a secondary tag memory, wherein a size of the first portion is smaller than a size of the second portion. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: June 21, 2021
    Publication date: December 22, 2022
    Applicant: Intel Corporation
    Inventors: Kermin ChoFleming, Yu Bai, Ping Zou
  • Publication number: 20220222177
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for improving data transfer for heterogeneous programs. An example apparatus includes instructions in the apparatus, and processor circuitry to at least one of execute or instantiate the instructions to determine a runtime associated with executing a code object by a heterogeneous electronic device based on at least one of a location of a memory object or a data transfer penalty, the data transfer penalty associated with access of the memory object in response to execution of the code object, identify a memory operation for the memory object based on the runtime, and generate an executable file based on the memory operation, the executable file, when executed, to cause execution of the code object by at least one of first hardware or second hardware of the heterogeneous electronic device based on the memory operation.
    Type: Application
    Filed: March 31, 2022
    Publication date: July 14, 2022
    Inventors: Kermin ChoFleming, Swapna Raj
  • Patent number: 11385873
    Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
    Type: Grant
    Filed: December 7, 2020
    Date of Patent: July 12, 2022
    Assignee: Intel Corporation
    Inventor: Kermin ChoFleming
  • Patent number: 11249683
    Abstract: Systems, apparatuses and methods may provide for technology that determines a plurality of memory operations associated with a data-flow graph that represents a computer code, where a spatial architecture executes the data-flow graph and the spatial architecture includes a plurality of memory controllers, randomly assigns one or more of the plurality of memory operations to one or more of the plurality of memory controllers to generate a first allocation of the plurality of memory operations to the memory controllers, and determines that the first allocation is to be stored as a permanent memory allocation based on a first performance metric associated with the first allocation.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Yu Bai, Kermin Chofleming
  • Patent number: 11037050
    Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: June 15, 2021
    Assignee: Intel Corporation
    Inventors: Krishna N. Vinod, Sujoyita Kaushikkar, Aniket S. Kakade, Kermin ChoFleming, Ping Zou, Alexey Suprun, Bhavya K. Daya
  • Publication number: 20210165642
    Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
    Type: Application
    Filed: December 7, 2020
    Publication date: June 3, 2021
    Inventor: Kermin ChoFleming
  • Patent number: 10915471
    Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: February 9, 2021
    Assignee: Intel Corporation
    Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely
  • Publication number: 20200410323
    Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Inventors: Krishna N. Vinod, Sujoyita Kaushikkar, Aniket S. Kakade, Kermin ChoFleming, Ping Zou, Alexey Suprun, Bhavya K. Daya
  • Publication number: 20200409709
    Abstract: Systems, methods, and apparatuses relating to time-multiplexing circuitry in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of processing elements. In another embodiment, a configurable spatial accelerator (CSA) includes a plurality of time-multiplexed processing elements; and a time-multiplexed, circuit switched interconnect network between the plurality of time-multiplexed processing elements.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Inventors: Kermin ChoFleming, Simon C. Steely, JR., Mitchell Diamond
  • Patent number: 10860301
    Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: December 8, 2020
    Assignee: Intel Corporation
    Inventor: Kermin ChoFleming
  • Patent number: 10817291
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Publication number: 20200310994
    Abstract: Systems, methods, and apparatuses relating to memory interface circuit allocation in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for an improved memory sub-system design via the improvements to allocation discussed herein.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Kermin ChoFleming, Yu Bai, Simon C. Steely
  • Publication number: 20200310797
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, JR., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Publication number: 20200210358
    Abstract: Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described.
    Type: Application
    Filed: December 29, 2018
    Publication date: July 2, 2020
    Inventors: Kermin ChoFleming, Simon Steely, JR., Kent Glossop
  • Publication number: 20200210113
    Abstract: Systems, apparatuses and methods may provide for technology that determines a plurality of memory operations associated with a data-flow graph that represents a computer code, where a spatial architecture executes the data-flow graph and the spatial architecture includes a plurality of memory controllers, randomly assigns one or more of the plurality of memory operations to one or more of the plurality of memory controllers to generate a first allocation of the plurality of memory operations to the memory controllers, and determines that the first allocation is to be stored as a permanent memory allocation based on a first performance metric associated with the first allocation.
    Type: Application
    Filed: March 13, 2020
    Publication date: July 2, 2020
    Applicant: Intel Corporation
    Inventors: Yu Bai, Kermin Chofleming
  • Patent number: 10678724
    Abstract: Systems, methods, and apparatuses relating to in-network storage for a configurable spatial accelerator are described.
    Type: Grant
    Filed: December 29, 2018
    Date of Patent: June 9, 2020
    Assignee: Intel Corporation
    Inventors: Kermin ChoFleming, Simon Steely, Jr., Kent Glossop
  • Publication number: 20190317744
    Abstract: Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.
    Type: Application
    Filed: June 28, 2019
    Publication date: October 17, 2019
    Inventor: Kermin ChoFleming