Patents by Inventor Alexander Lyashevsky

Alexander Lyashevsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Low power and low latency GPU coprocessor for persistent computing

Patent number: 11625807

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Grant

Filed: February 22, 2021

Date of Patent: April 11, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT

Publication number: 20220416999

Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Supratim Pal, Wajdi Feghali, Changwon Rhee, Wei-Yu Chen, Timothy R. Bauer, Alexander Lyashevsky
LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

Publication number: 20220413848

Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO IMPROVE ALGORITHMIC SOLVER PERFORMANCE

Publication number: 20210319323

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve algorithmic solver performance. An example apparatus includes graph transforming circuitry to generate a vector representation corresponding to a graph input, vector classification circuitry to generate a node embedding machine learning classifier, the node embedding machine learning classifier to cause an output layer of probabilities corresponding to nodes of the graph input, loss calculating circuitry to train a model based on a target algorithmic function, the loss calculating circuitry to inject a solution diversity to reduce equivalent solution error of the target algorithmic function, and algorithmic solving circuitry to calculate solutions based on ranked ones of the output layer of probabilities.

Type: Application

Filed: June 24, 2021

Publication date: October 14, 2021

Inventors: Alexey Titov, Alexander Lyashevsky, Lukasz Kuszner
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING

Publication number: 20210201439

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Application

Filed: February 22, 2021

Publication date: July 1, 2021

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
Low power and low latency GPU coprocessor for persistent computing

Patent number: 10929944

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Grant

Filed: November 23, 2016

Date of Patent: February 23, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING

Publication number: 20180144435

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Application

Filed: November 23, 2016

Publication date: May 24, 2018

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
Per-block sort for performance enhancement of parallel processors

Patent number: 9740511

Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.

Type: Grant

Filed: June 4, 2015

Date of Patent: August 22, 2017

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: Alexander Lyashevsky
System for parallel intra-prediction decoding of video data

Patent number: 9565433

Abstract: A system for decoding video data includes a processing unit. The processing unit includes a plurality of processing pipelines and a driver. The driver includes a decoder configured to generate a plurality of intermediate control maps containing control information including an indication of which macro blocks or portions of macro blocks may be processed in parallel in the plurality of processing pipelines.

Type: Grant

Filed: December 28, 2012

Date of Patent: February 7, 2017

Assignee: ATI TECHNOLOGIES ULC

Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
PER-BLOCK SORT FOR PERFORMANCE ENHANCEMENT OF PARALLEL PROCESSORS

Publication number: 20160357580

Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.

Type: Application

Filed: June 4, 2015

Publication date: December 8, 2016

Applicant: Advanced Micro Devices, Inc.

Inventor: Alexander Lyashevsky
System and method for providing low latency to applications using heterogeneous processors

Patent number: 9495718

Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.

Type: Grant

Filed: June 7, 2013

Date of Patent: November 15, 2016

Assignee: Advanced Micro Devices, Inc.

Inventor: Alexander Lyashevsky
Software only intra-compute unit redundant multithreading for GPUs

Patent number: 9367372

Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.

Type: Grant

Filed: June 18, 2013

Date of Patent: June 14, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
Software only inter-compute unit redundant multithreading for GPUs

Patent number: 9274904

Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.

Type: Grant

Filed: June 18, 2013

Date of Patent: March 1, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
Parallel decoding method and system for highly compressed data

Patent number: 9055306

Abstract: Embodiments of a method and system for decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps include information regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines. In an embodiment, decoding is performed on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved.

Type: Grant

Filed: August 31, 2006

Date of Patent: June 9, 2015

Assignee: ATI Technologies ULC

Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
Method and system for inter-prediction in decoding of video data

Patent number: 9049461

Abstract: Embodiments of a method and system for inter-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an inter-prediction operation. In an embodiment, inter-prediction is performed on a frame basis such that inter-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the inter-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.

Type: Grant

Filed: August 31, 2006

Date of Patent: June 2, 2015

Assignee: ATI Technologies ULC

Inventors: Alexander Lyashevsky, Jason Yang, Arcot J Preetham
Software Only Intra-Compute Unit Redundant Multithreading for GPUs

Publication number: 20140368513

Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.

Type: Application

Filed: June 18, 2013

Publication date: December 18, 2014

Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
Software Only Inter-Compute Unit Redundant Multithreading for GPUs

Publication number: 20140373028

Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.

Type: Application

Filed: June 18, 2013

Publication date: December 18, 2014

Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
SYSTEM AND METHOD FOR PROVIDING LOW LATENCY TO APPLICATIONS USING HETEROGENEOUS PROCESSORS

Publication number: 20130328891

Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.

Type: Application

Filed: June 7, 2013

Publication date: December 12, 2013

Inventor: Alexander Lyashevsky
Resolution enhancement of video stream based on spatial and temporal correlation

Patent number: 8487929

Abstract: A method and computer program product are provided for resolution enhancement of a video stream based on spatial and temporal correlation. For instance, the method can include predicting interpolated pixels for an image frame of the video stream based on a spatial correlation of pixels in the image frame. The method can also include generating one or more motion vectors for the image frame. Based on the spatially-correlated pixels and the one or more motion vectors, an enhanced image can be reconstructed. Further, the method can include providing a correction factor to one or more pixels in the enhanced image frame.

Type: Grant

Filed: May 3, 2010

Date of Patent: July 16, 2013

Assignee: Advanced Micro Devices, Inc.

Inventor: Alexander Lyashevsky
Method and system for parallel intra-prediction decoding of video data

Patent number: 8345756

Abstract: Embodiments of a method and system for intra-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an intra-prediction operation. In an embodiment, intra-prediction is performed on a frame basis such that intra-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the intra-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.

Type: Grant

Filed: August 31, 2006

Date of Patent: January 1, 2013

Assignee: ATI Technologies, Inc.

Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham

1 2 next