Patents by Inventor Samantika S. Sury
Samantika S. Sury has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11537520Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.Type: GrantFiled: October 5, 2021Date of Patent: December 27, 2022Assignee: Intel CorporationInventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
-
Patent number: 11500636Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.Type: GrantFiled: February 24, 2020Date of Patent: November 15, 2022Assignee: Intel CorporationInventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
-
Publication number: 20220206945Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.Type: ApplicationFiled: December 25, 2020Publication date: June 30, 2022Applicant: Intel CorporationInventors: Carl J. Beckmann, Samantika S. Sury, Christopher J. Hughes, Lingxiang Xiang, Rahul Agrawal
-
Publication number: 20220107897Abstract: Examples described herein relate to circuitry to selectively disable cache snoop operations issued by a particular processor or its cache manager based on data in a memory address range, to be accessed by the particular processor, having been flushed from one or more other cache devices accessible to other processors. At or after completion of flushing or scrubbing data in the memory address range to memory, the particular processor or its cache manager do not issue snoop operations for accesses to the memory address range. In response to an access by some other device to the memory address range, the processor or cache manager may resume issuing snoop operations.Type: ApplicationFiled: December 15, 2021Publication date: April 7, 2022Inventors: David KEPPEL, Swapna RAJ, Kermin CHOFLEMING, Samantika S. SURY
-
Publication number: 20220091983Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.Type: ApplicationFiled: October 5, 2021Publication date: March 24, 2022Inventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
-
Patent number: 11138112Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.Type: GrantFiled: April 11, 2019Date of Patent: October 5, 2021Assignee: Intel CorporationInventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
-
Publication number: 20210224213Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.Type: ApplicationFiled: March 19, 2021Publication date: July 22, 2021Inventors: Swapna RAJ, Samantika S. SURY, Kermin CHOFLEMING, Simon C. STEELY, JR.
-
Patent number: 11068264Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.Type: GrantFiled: August 9, 2019Date of Patent: July 20, 2021Assignee: Intel CorporationInventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
-
Publication number: 20200319886Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.Type: ApplicationFiled: February 24, 2020Publication date: October 8, 2020Applicant: Intel CorporationInventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
-
Publication number: 20200285578Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.Type: ApplicationFiled: March 18, 2020Publication date: September 10, 2020Applicant: Intel CorporationInventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
-
Publication number: 20200233814Abstract: Examples described herein relate to a computing system supporting custom page sized ranges for an application to map contiguous memory regions instead of many smaller sized pages. An application can request a custom range size. An operating system can allocate a contiguous physical memory region to a virtual address range by specifying a custom range sizes that are larger or smaller than the normal general page sizes. Virtual-to-physical address translation can occur using an address range circuitry and translation lookaside buffer in parallel. The address range circuitry can determine if a custom entry is available to use to identify a physical address translation for the virtual address. Physical address translation can be performed by transforming the virtual address in some examples.Type: ApplicationFiled: February 10, 2020Publication date: July 23, 2020Inventors: Farah E. FARGO, Mitchell DIAMOND, David KEPPEL, Samantika S. SURY, Binh PHAM, Shobha VISSAPRAGADA
-
Patent number: 10635590Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.Type: GrantFiled: September 29, 2017Date of Patent: April 28, 2020Assignee: Intel CorporationInventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
-
Patent number: 10572260Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.Type: GrantFiled: December 29, 2017Date of Patent: February 25, 2020Assignee: Intel CorporationInventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
-
Publication number: 20190384601Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.Type: ApplicationFiled: August 9, 2019Publication date: December 19, 2019Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, JR., Samantika S. Sury
-
Patent number: 10445234Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.Type: GrantFiled: July 1, 2017Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Samantika S. Sury
-
Patent number: 10430252Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.Type: GrantFiled: November 15, 2018Date of Patent: October 1, 2019Assignee: Intel CorporationInventors: Samantika S. Sury, Robert G. Blankenship, Simon C. Steely, Jr.
-
Patent number: 10387319Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.Type: GrantFiled: July 1, 2017Date of Patent: August 20, 2019Assignee: Intel CorporationInventors: Michael C. Adler, Chiachen Chou, Neal C. Crago, Kermin Fleming, Kent D. Glossop, Aamer Jaleel, Pratik M. Marolia, Simon C. Steely, Jr., Samantika S. Sury
-
Patent number: 10379855Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.Type: GrantFiled: September 30, 2016Date of Patent: August 13, 2019Assignee: Intel CorporationInventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
-
Publication number: 20190243761Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.Type: ApplicationFiled: April 11, 2019Publication date: August 8, 2019Applicant: Intel CorporationInventors: Doddaballapur N. Jayasimha, Samantika S. Sury, Christopher J. Hughes, Jonas Svennebring, Yen-Cheng Liu, Stephen R. Van Doren, David A. Koufaty
-
Publication number: 20190205139Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.Type: ApplicationFiled: December 29, 2017Publication date: July 4, 2019Inventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson