Patents by Inventor Mike MacPherson
Mike MacPherson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240411717Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: ApplicationFiled: June 10, 2024Publication date: December 12, 2024Applicant: Intel CorporationInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Publication number: 20240403259Abstract: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: July 22, 2024Publication date: December 5, 2024Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Joydeep Ray, Mike MacPherson, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Subramaniam Maiyuran, Vasanth Ranganathan, Jayakrishna P S, Pattabhiraman K, Sudhakar Kamma
-
Patent number: 12153541Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: GrantFiled: February 17, 2022Date of Patent: November 26, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, Pattabhiraman K, Mike MacPherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Jayakrishna P S, Prasoonkumar Surti
-
Patent number: 12141094Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.Type: GrantFiled: March 14, 2020Date of Patent: November 12, 2024Assignee: Intel CorporationInventors: Prasoonkumar Surti, Subramaniam Maiyuran, Valentin Andrei, Abhishek Appu, Varghese George, Altug Koker, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Lakshminarayanan Striramassarma, SungYe Kim
-
Publication number: 20240354559Abstract: A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.Type: ApplicationFiled: April 25, 2024Publication date: October 24, 2024Applicant: Intel CorporationInventors: Rajkishore Barik, Brian T. Lewis, Murali Sundaresan, Jeffrey Jackson, Feng Chen, Xiaoming Chen, Mike Macpherson
-
Publication number: 20240345990Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.Type: ApplicationFiled: April 4, 2024Publication date: October 17, 2024Applicant: Intel CorporationInventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
-
Patent number: 12117962Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.Type: GrantFiled: August 16, 2023Date of Patent: October 15, 2024Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Aravindh Anantaraman, Abhishek R. Appu, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Subramaniam Maiyuran, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Ben Ashbaugh, Murali Ramadoss, Vikranth Vemulapalli, William Sadler, Jonathan Pearce, Sungye Kim
-
Patent number: 12099461Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.Type: GrantFiled: March 14, 2020Date of Patent: September 24, 2024Assignee: INTEL CORPORATIONInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayanan Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
-
Patent number: 12093210Abstract: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.Type: GrantFiled: March 14, 2020Date of Patent: September 17, 2024Assignee: INTEL CORPORATIONInventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Joydeep Ray, Mike Macpherson, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Subramaniam Maiyuran, Vasanth Ranganathan, Jayakrishna P S, K Pattabhiraman, Sudhakar Kamma
-
Patent number: 12079155Abstract: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.Type: GrantFiled: March 14, 2020Date of Patent: September 3, 2024Assignee: Intel CorporationInventors: Joydeep Ray, Selvakumar Panneer, Saurabh Tangri, Ben Ashbaugh, Scott Janus, Abhishek Appu, Varghese George, Ravishankar Iyer, Nilesh Jain, Pattabhiraman K, Altug Koker, Mike MacPherson, Josh Mastronarde, Elmoustapha Ould-Ahmed-Vall, Jayakrishna P. S, Eric Samson
-
Patent number: 12066975Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.Type: GrantFiled: March 14, 2020Date of Patent: August 20, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Lakshminarayanan Striramassarma, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Sean Coleman, Varghese George, K Pattabhiraman, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, S Jayakrishna P, Prasoonkumar Surti
-
Publication number: 20240256456Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.Type: ApplicationFiled: December 20, 2023Publication date: August 1, 2024Applicant: Intel CorporationInventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
-
Publication number: 20240256483Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: ApplicationFiled: January 17, 2024Publication date: August 1, 2024Applicant: Intel CorporationInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Patent number: 12001944Abstract: A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine an optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.Type: GrantFiled: July 27, 2022Date of Patent: June 4, 2024Assignee: INTEL CORPORATIONInventors: Rajkishore Barik, Brian T. Lewis, Murali Sundaresan, Jeffrey Jackson, Feng Chen, Xiaoming Chen, Mike Macpherson
-
Patent number: 11995029Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.Type: GrantFiled: March 14, 2020Date of Patent: May 28, 2024Assignee: Intel CorporationInventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
-
Publication number: 20240161227Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.Type: ApplicationFiled: December 7, 2023Publication date: May 16, 2024Applicant: Intel CorporationInventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
-
Publication number: 20240161226Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.Type: ApplicationFiled: November 16, 2023Publication date: May 16, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
-
Patent number: 11934342Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.Type: GrantFiled: March 14, 2020Date of Patent: March 19, 2024Assignee: INTEL CORPORATIONInventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
-
Publication number: 20240086357Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.Type: ApplicationFiled: November 21, 2023Publication date: March 14, 2024Applicant: Intel CorporationInventors: Altug Koker, Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Sean Coleman, Nicolas Galoppo Von Borries, Varghese George, Pattabhiraman K, SungYe Kim, Mike Macpherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, James Valerio
-
Publication number: 20240086356Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.Type: ApplicationFiled: October 20, 2023Publication date: March 14, 2024Applicant: Intel CorporationInventors: Joydeep Ray, Altug Koker, Varghese George, Mike Macpherson, Aravindh Anantaraman, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Nicolas Galoppo von Borries, Ben J. Ashbaugh