Patents by Inventor Altug Koker

Altug Koker has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954062
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: April 9, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Niranjan Cooray, Subramaniam Maiyuran, Altug Koker, Prasoonkumar Surti, Varghese George, Valentin Andrei, Abhishek Appu, Guadalupe Garcia, Pattabhiraman K, Sungye Kim, Sanjay Kumar, Pratik Marolia, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, William Sadler, Lakshminarayanan Striramassarma
  • Patent number: 11948224
    Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
    Type: Grant
    Filed: November 1, 2022
    Date of Patent: April 2, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
  • Publication number: 20240103910
    Abstract: Apparatuses to synchronize lanes that diverge or threads that drift are disclosed. In one embodiment, a graphics multiprocessor includes a queue having an initial state of groups with a first group having threads of first and second instruction types and a second group having threads of the first and second instruction types. A regroup engine (or regroup circuitry) regroups threads into a third group having threads of the first instruction type and a fourth group having threads of the second instruction type.
    Type: Application
    Filed: October 5, 2023
    Publication date: March 28, 2024
    Applicant: Intel Corporation
    Inventors: Valentin Andrei, Subramaniam Maiyuran, SungYe Kim, Varghese George, Altug Koker, Aravindh Anantaraman
  • Patent number: 11934934
    Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.
    Type: Grant
    Filed: April 17, 2017
    Date of Patent: March 19, 2024
    Assignee: Intel Corporation
    Inventors: Liwei Ma, Elmoustapha Ould- Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
  • Patent number: 11934342
    Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: March 19, 2024
    Assignee: INTEL CORPORATION
    Inventors: Altug Koker, Varghese George, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Niranjan Cooray, Nicolas Galoppo Von Borries, Mike MacPherson, Subramaniam Maiyuran, ElMoustapha Ould-Ahmed-Vall, David Puffer, Vasanth Ranganathan, Joydeep Ray, Ankur N. Shah, Lakshminarayanan Striramassarma, Prasoonkumar Surti, Saurabh Tangri
  • Publication number: 20240086199
    Abstract: An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.
    Type: Application
    Filed: August 4, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Balaji Vembu, Abhishek R. Appu, Joydeep Ray, Altug Koker
  • Publication number: 20240086138
    Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
    Type: Application
    Filed: September 26, 2023
    Publication date: March 14, 2024
    Inventors: Eric J. Asperheim, Subramaniam Maiyuran, Kiran C. Veernapu, Sanjeev S. Jahagirdar, Balaji Vembu, Devan Burke, Philip R. Laws, Kamal Sinha, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Peter L. Doyle, Joydeep Ray, Travis T. Schluessler, John H. Feit, Nikos Kaburlasos, Jacek Kwiatkowski, Altug Koker
  • Publication number: 20240086357
    Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
    Type: Application
    Filed: November 21, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Sean Coleman, Nicolas Galoppo Von Borries, Varghese George, Pattabhiraman K, SungYe Kim, Mike Macpherson, Subramaniam Maiyuran, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, James Valerio
  • Publication number: 20240086356
    Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.
    Type: Application
    Filed: October 20, 2023
    Publication date: March 14, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Varghese George, Mike Macpherson, Aravindh Anantaraman, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Nicolas Galoppo von Borries, Ben J. Ashbaugh
  • Publication number: 20240078630
    Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
    Type: Application
    Filed: October 19, 2023
    Publication date: March 7, 2024
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Durgaprasad Bilagi, Joydeep Ray, Scott Janus, Sanjeev Jahagirdar, Brent Insko, Lidong Xu, Abhishek R. Appu, James Holland, Vasanth Ranganathan, Nikos Kaburlasos, Altug Koker, Xinmin Tian, Guei-Yuan Lueh, Changliang Wang
  • Patent number: 11921635
    Abstract: Embodiments described herein provide a scalable coherency tracking implementation that utilizes shared virtual memory to manage data coherency. In one embodiment, coherency tracking granularity is reduced relative to existing coherency tracking solutions, with coherency tracking storage memory moved to memory as a page table metadata. For example and in one embodiment, storage for coherency state is moved from dedicated hardware blocks to system memory, effectively providing a directory structure that is limitless in size.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventor: Altug Koker
  • Patent number: 11922535
    Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
    Type: Grant
    Filed: February 13, 2023
    Date of Patent: March 5, 2024
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11908542
    Abstract: Prior knowledge of access pattern is leveraged to improve energy dissipation for general matrix operations. This improves memory access energy for a multitude of applications such as image processing, deep neural networks, and scientific computing workloads, for example. In some embodiments, prior knowledge of access pattern allows for burst read and/or write operations. As such, burst mode solution can provide energy savings in both READ (RD) and WRITE (WR) operations. For machine learning or inference, the weight values are known ahead in time (e.g., inference operation), and so the unused bytes in the cache line are exploited to store a sparsity map that is used for disabling read from either upper or lower half of the cache line, thus saving dynamic capacitance.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Charles Augustine, Somnath Paul, Turbo Majumder, Iqbal Rajwani, Andrew Lines, Altug Koker, Lakshminarayanan Striramassarma, Muhammad Khellah
  • Patent number: 11900665
    Abstract: A graphics processor can include a processing cluster array including a plurality of processing clusters coupled with the plurality of memory controllers, each processing cluster of the plurality of processing clusters including a plurality of streaming multiprocessors, the processing cluster array configured for partitioning into a plurality of partitions. The plurality of partitions include a first partition including a first plurality of streaming multiprocessors configured to perform operations for a first neural network, The operations for the first neural network are isolated to the first partition. The plurality of partitions also include a second partition including a second plurality of streaming multiprocessors configured to perform operations for a second neural network. The operations for the second neural network are isolated to the second partition and protected from operations performed for the first neural network.
    Type: Grant
    Filed: July 25, 2023
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Barnan Das, Mayuresh M. Varerkar, Narayan Biswal, Stanley J. Baran, Gokcen Cilingir, Nilesh V. Shah, Archie Sharma, Sherine Abdelhak, Praneetha Kotha, Neelay Pandit, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Abhishek R. Appu, Altug Koker, Joydeep Ray
  • Patent number: 11899614
    Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.
    Type: Grant
    Filed: June 24, 2022
    Date of Patent: February 13, 2024
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Varghese George, Mike Macpherson, Aravindh Anantaraman, Abhishek R. Appu, Elmoustapha Ould-Ahmed-Vall, Nicolas Galoppo von Borries, Ben J. Ashbaugh
  • Publication number: 20240045830
    Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 16, 2023
    Publication date: February 8, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep RAY, Aravindh ANANTARAMAN, Abhishek R. APPU, Altug KOKER, Elmoustapha OULD-AHMED-VALL, Valentin ANDREI, Subramaniam MAIYURAN, Nicolas GALOPPO VON BORRIES, Varghese GEORGE, Mike MACPHERSON, Ben ASHBAUGH, Murali RAMADOSS, Vikranth VEMULAPALLI, William SADLER, Jonathan PEARCE, Sungye KIM
  • Patent number: 11892950
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Grant
    Filed: July 15, 2022
    Date of Patent: February 6, 2024
    Assignee: INTEL CORPORATION
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Publication number: 20240012767
    Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
    Type: Application
    Filed: July 25, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Joydeep Ray, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Michael Macpherson, Aravindh V. Anantaraman, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Varghese George, Abhishek Appu, Prasoonkumar Surti
  • Publication number: 20240013338
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: September 20, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20240013337
    Abstract: A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 11, 2024
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, John C. Weast, Mike B. Macpherson, Linda L. Hurd, Sara S. Baghsorkhi, Justin E. Gottschlich, Prasoonkumar Surti, Chandrasekaran Sakthivel, Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Kamal Sinha, Joydeep Ray, Balaji Vembu, Sanjeev Jahagirdar, Vasanth Ranganathan, Dukhwan Kim