Patents by Inventor Onur Kayiran

Onur Kayiran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Near-memory determination of registers

Patent number: 11966328

Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.

Type: Grant

Filed: December 18, 2020

Date of Patent: April 23, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Onur Kayiran, Mohamed Assem Ibrahim, Shaizeen Aga
BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION

Publication number: 20240111489

Abstract: A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.

Type: Application

Filed: September 29, 2022

Publication date: April 4, 2024

Inventors: Onur Kayiran, Michael Estlick, Masab Ahmad, Gabriel H. Loh
ACCELERATING PREDICATED INSTRUCTION EXECUTION IN VECTOR PROCESSORS

Publication number: 20240004656

Abstract: Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements in the vector are contiguously mapped, then, after the mapped elements are processed, decompressing the processed mapped elements, where the processed mapped elements are reverse mapped based on the index.

Type: Application

Filed: June 29, 2022

Publication date: January 4, 2024

Applicant: Advanced Micro Devices, Inc.

Inventors: Elliott David Binder, Onur Kayiran, Masab Ahmad
METHOD FOR EMBEDDING ROWS PREFETCHING IN RECOMMENDATION MODELS

Publication number: 20230401154

Abstract: A system and method for efficiently accessing sparse data for a workload are described. In various implementations, a computing system includes an integrated circuit and a memory for storing tasks of a workload that includes sparse accesses of data items stored in one or more tables. The integrated circuit receives a user query, and generates a result based on multiple data items targeted by the user query. To reduce the latency of processing the workload even with sparse lookup operations performed on the one or more tables, a prefetch engine of the integrated circuit stores a subset of data items in prefetch data storage. The prefetch engine also determines which data items to store in the prefetch data storage based on one or more of a frequency of reuse, a distance or latency of access of a corresponding table of the one more tables, or other.

Type: Application

Filed: June 8, 2022

Publication date: December 14, 2023

Inventors: Mohamed Assem Abd ElMohsen Ibrahim, Onur Kayiran, Shaizeen Dilawarhusen Aga, Yasuko Eckert
METHOD AND APPARATUS TO ADDRESS ROW HAMMER ATTACKS AT A HOST PROCESSOR

Publication number: 20230205872

Abstract: A method includes receiving an indication that a number of activations of a memory structure exceeds a threshold number of activations for a time period, and in response to the indication, throttling instruction execution for a thread issuing the activations.

Type: Application

Filed: December 23, 2021

Publication date: June 29, 2023

Inventors: Jagadish B. Kotra, Onur Kayiran, John Kalamatianos, Alok Garg
METHOD AND APPARATUS OF DYNAMICALLY CONTROLLING APPROXIMATION OF FLOATING-POINT ARITHMETIC OPERATIONS

Publication number: 20230098421

Abstract: Methods and apparatuses include a processing unit which helps control the speed and computational resources required for arithmetic operations of two numbers in a first format. The control unit of the processing unit approximates the arithmetic operations using a plurality of decomposed numbers in a second format that facilitates faster calculations than the first format, such that performing arithmetic operations using the decomposed numbers is capable of approximating the results of the arithmetic operations of the two numbers in the first format.

Type: Application

Filed: September 30, 2021

Publication date: March 30, 2023

Inventors: Onur Kayiran, Mohamed Assem Abd ElMohsen Ibrahim, Shaizeen Aga
Distributing Model Data in Memories in Nodes in an Electronic Device

Publication number: 20230065546

Abstract: An electronic device includes a plurality of nodes, each node having a processor that performs operations for processing instances of input data through a model, a local memory that stores a separate portion of model data for the model, and a controller. The controller identifies model data that meets one or more predetermined conditions in the separate portion of the model data in the local memory in some or all of the nodes that is accessible by the processors when processing the instances of input data through the model. The controller then copies the model data that meets the one or more predetermined conditions from the separate portion of the model data in the local memory in the some or all of the nodes to local memories in other nodes. In this way, the controller distributes model data that meets the one or more predetermined conditions among the nodes, making the model data that meets the one or more predetermined conditions available to the nodes without performing remote memory accesses.

Type: Application

Filed: September 29, 2021

Publication date: March 2, 2023

Inventors: Mohamed Assem Abd ElMohsen Ibrahim, Onur Kayiran, Shaizeen Aga
Memory request priority assignment techniques for parallel processors

Patent number: 11507522

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Type: Grant

Filed: December 6, 2019

Date of Patent: November 22, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
Memory access response merging in a memory hierarchy

Patent number: 11403221

Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

Type: Grant

Filed: September 24, 2020

Date of Patent: August 2, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Onur Kayiran, Yasuko Eckert, Mark Henry Oskin, Gabriel H. Loh, Steven E. Raasch, Maxim V. Kazakov
Temporal link encoding

Patent number: 11398831

Abstract: Temporal link encoding, including: identifying a data type of a data value to be transmitted; determining that the data type is included in one or more data types for temporal encoding; and transmitting the data value using temporal encoding.

Type: Grant

Filed: May 7, 2020

Date of Patent: July 26, 2022

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Onur Kayiran, Steven Raasch, Sergey Blagodurov, Jagadish B. Kotra
NEAR-MEMORY DETERMINATION OF REGISTERS

Publication number: 20220197647

Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.

Type: Application

Filed: December 18, 2020

Publication date: June 23, 2022

Inventors: Onur Kayiran, Mohamed Assem Ibrahim, Shaizeen Aga
FPGA-BASED PROGRAMMABLE DATA ANALYSIS AND COMPRESSION FRONT END FOR GPU

Publication number: 20220188493

Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

Type: Application

Filed: December 10, 2020

Publication date: June 16, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Kevin Y. Cheng, Sooraj Puthoor, Onur Kayiran
Adaptive cache reconfiguration via clustering

Patent number: 11360891

Abstract: A method of dynamic cache configuration includes determining, for a first clustering configuration, whether a current cache miss rate exceeds a miss rate threshold. The first clustering configuration includes a plurality of graphics processing unit (GPU) compute units clustered into a first plurality of compute unit clusters. The method further includes clustering, based on the current cache miss rate exceeding the miss rate threshold, the plurality of GPU compute units into a second clustering configuration having a second plurality of compute unit clusters fewer than the first plurality of compute unit clusters.

Type: Grant

Filed: March 15, 2019

Date of Patent: June 14, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Mohamed Assem Ibrahim, Onur Kayiran, Yasuko Eckert, Gabriel H. Loh
MEMORY ACCESS RESPONSE MERGING IN A MEMORY HIERARCHY

Publication number: 20220091980

Abstract: A system and method for efficiently processing memory requests are described. A computing system includes multiple compute units, multiple caches of a memory hierarchy and a communication fabric. A compute unit generates a memory access request that misses in a higher level cache, which sends a miss request to a lower level shared cache. During servicing of the miss request, the lower level cache merges identification information of multiple memory access requests targeting a same cache line from multiple compute units into a merged memory access response. The lower level shared cache continues to insert information into the merged memory access response until the lower level shared cache is ready to issue the merged memory access response. An intermediate router in the communication fabric broadcasts the merged memory access response into multiple memory access responses to send to corresponding compute units.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Inventors: Onur Kayiran, Yasuko Eckert, Mark Henry Oskin, Gabriel H. Loh, Steven E. Raasch, Maxim V. Kazakov
TEMPORAL LINK ENCODING

Publication number: 20210351787

Abstract: Temporal link encoding, including: identifying a data type of a data value to be transmitted; determining that the data type is included in one or more data types for temporal encoding; and transmitting the data value using temporal encoding.

Type: Application

Filed: May 7, 2020

Publication date: November 11, 2021

Inventors: ONUR KAYIRAN, STEVEN RAASCH, SERGEY BLAGODUROV, JAGADISH B. KOTRA
LOOK-AHEAD TELEPORTATION FOR RELIABLE COMPUTATION IN MULTI-SIMD QUANTUM PROCESSOR

Publication number: 20210255871

Abstract: A technique for processing qubits in a quantum computing device is provided. The technique includes determining that, in a first cycle, a first quantum processing region is to perform a first quantum operation that does not use a qubit that is stored in the first quantum processing region, identifying a second quantum processing region that is to perform a second quantum operation at a second cycle that is later than the first cycle, wherein the second quantum operation uses the qubit, determining that between the first cycle and the second cycle, no quantum operations are performed in the second quantum processing region, and moving the qubit from the first quantum processing region to the second quantum processing region.

Type: Application

Filed: February 18, 2020

Publication date: August 19, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Onur Kayiran, Jieming Yin, Yasuko Eckert
Mechanism for distributed-system-aware difference encoding/decoding in graph analytics

Patent number: 11068458

Abstract: A portion of a graph dataset is generated for each computing node in a distributed computing system by, for each subject vertex in a graph, recording for the computing node an offset for the subject vertex, where the offset references a first position in an edge array for the computing node, and for each edge of a set of edges coupled with the subject vertex in the graph, calculating an edge value for the edge based on a connected vertex identifier identifying a vertex coupled with the subject vertex via the edge. When the edge value is assigned to the first position, the edge value is determined by a first calculation, and when the edge value is assigned to position subsequent to the first position, the edge value is determined by a second calculation. In the computing node, the edge value is recorded in the edge array.

Type: Grant

Filed: November 27, 2018

Date of Patent: July 20, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Mohamed Assem Ibrahim, Onur Kayiran, Yasuko Eckert
MEMORY REQUEST PRIORITY ASSIGNMENT TECHNIQUES FOR PARALLEL PROCESSORS

Publication number: 20210173796

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

Type: Application

Filed: December 6, 2019

Publication date: June 10, 2021

Inventors: Sooraj Puthoor, Kishore Punniyamurthy, Onur Kayiran, Xianwei Zhang, Yasuko Eckert, Johnathan Alsop, Bradford Michael Beckmann
Mechanism for dynamic latency-bandwidth trade-off for efficient broadcasts/multicasts

Patent number: 10938709

Abstract: A method includes receiving, from an origin computing node, a first communication addressed to multiple destination computing nodes in a processor interconnect fabric, measuring a first set of one or more communication metrics associated with a transmission path to one or more of the multiple destination computing nodes, and for each of the destination computing nodes, based on the set of communication metrics, selecting between a multicast transmission mode and unicast transmission mode as a transmission mode for transmitting the first communication to the destination computing node.

Type: Grant

Filed: December 18, 2018

Date of Patent: March 2, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Mohamed Assem Ibrahim, Onur Kayiran, Yasuko Eckert, Jieming Yin
Prioritizing local and remote memory access in a non-uniform memory access architecture

Patent number: 10838864

Abstract: A miss in a cache by a thread in a wavefront is detected. The wavefront includes a plurality of threads that are executing a memory access request concurrently on a corresponding plurality of processor cores. A priority is assigned to the thread based on whether the memory access request is addressed to a local memory or a remote memory. The memory access request for the thread is performed based on the priority. In some cases, the cache is selectively bypassed depending on whether the memory access request is addressed to the local or remote memory. A cache block is requested in response to the miss. The cache block is biased towards a least recently used position in response to requesting the cache block from the local memory and towards a most recently used position in response to requesting the cache block from the remote memory.

Type: Grant

Filed: May 30, 2018

Date of Patent: November 17, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael W. Boyer, Onur Kayiran, Yasuko Eckert, Steven Raasch, Muhammad Shoaib Bin Altaf

1 2 next