Patents by Inventor Alexander Lyashevsky

Alexander Lyashevsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11625807
    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
    Type: Grant
    Filed: February 22, 2021
    Date of Patent: April 11, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
  • Publication number: 20220416999
    Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Wajdi Feghali, Changwon Rhee, Wei-Yu Chen, Timothy R. Bauer, Alexander Lyashevsky
  • Publication number: 20220413848
    Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
  • Publication number: 20210319323
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve algorithmic solver performance. An example apparatus includes graph transforming circuitry to generate a vector representation corresponding to a graph input, vector classification circuitry to generate a node embedding machine learning classifier, the node embedding machine learning classifier to cause an output layer of probabilities corresponding to nodes of the graph input, loss calculating circuitry to train a model based on a target algorithmic function, the loss calculating circuitry to inject a solution diversity to reduce equivalent solution error of the target algorithmic function, and algorithmic solving circuitry to calculate solutions based on ranked ones of the output layer of probabilities.
    Type: Application
    Filed: June 24, 2021
    Publication date: October 14, 2021
    Inventors: Alexey Titov, Alexander Lyashevsky, Lukasz Kuszner
  • Publication number: 20210201439
    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
    Type: Application
    Filed: February 22, 2021
    Publication date: July 1, 2021
    Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
  • Patent number: 10929944
    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: February 23, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
  • Publication number: 20180144435
    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
    Type: Application
    Filed: November 23, 2016
    Publication date: May 24, 2018
    Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
  • Patent number: 9740511
    Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.
    Type: Grant
    Filed: June 4, 2015
    Date of Patent: August 22, 2017
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Alexander Lyashevsky
  • Patent number: 9565433
    Abstract: A system for decoding video data includes a processing unit. The processing unit includes a plurality of processing pipelines and a driver. The driver includes a decoder configured to generate a plurality of intermediate control maps containing control information including an indication of which macro blocks or portions of macro blocks may be processed in parallel in the plurality of processing pipelines.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: February 7, 2017
    Assignee: ATI TECHNOLOGIES ULC
    Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
  • Publication number: 20160357580
    Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.
    Type: Application
    Filed: June 4, 2015
    Publication date: December 8, 2016
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Alexander Lyashevsky
  • Patent number: 9495718
    Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.
    Type: Grant
    Filed: June 7, 2013
    Date of Patent: November 15, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Alexander Lyashevsky
  • Patent number: 9367372
    Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.
    Type: Grant
    Filed: June 18, 2013
    Date of Patent: June 14, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
  • Patent number: 9274904
    Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.
    Type: Grant
    Filed: June 18, 2013
    Date of Patent: March 1, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
  • Patent number: 9055306
    Abstract: Embodiments of a method and system for decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps include information regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines. In an embodiment, decoding is performed on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: June 9, 2015
    Assignee: ATI Technologies ULC
    Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
  • Patent number: 9049461
    Abstract: Embodiments of a method and system for inter-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an inter-prediction operation. In an embodiment, inter-prediction is performed on a frame basis such that inter-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the inter-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: June 2, 2015
    Assignee: ATI Technologies ULC
    Inventors: Alexander Lyashevsky, Jason Yang, Arcot J Preetham
  • Publication number: 20140368513
    Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.
    Type: Application
    Filed: June 18, 2013
    Publication date: December 18, 2014
    Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
  • Publication number: 20140373028
    Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.
    Type: Application
    Filed: June 18, 2013
    Publication date: December 18, 2014
    Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
  • Publication number: 20130328891
    Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.
    Type: Application
    Filed: June 7, 2013
    Publication date: December 12, 2013
    Inventor: Alexander Lyashevsky
  • Patent number: 8487929
    Abstract: A method and computer program product are provided for resolution enhancement of a video stream based on spatial and temporal correlation. For instance, the method can include predicting interpolated pixels for an image frame of the video stream based on a spatial correlation of pixels in the image frame. The method can also include generating one or more motion vectors for the image frame. Based on the spatially-correlated pixels and the one or more motion vectors, an enhanced image can be reconstructed. Further, the method can include providing a correction factor to one or more pixels in the enhanced image frame.
    Type: Grant
    Filed: May 3, 2010
    Date of Patent: July 16, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Alexander Lyashevsky
  • Patent number: 8345756
    Abstract: Embodiments of a method and system for intra-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an intra-prediction operation. In an embodiment, intra-prediction is performed on a frame basis such that intra-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the intra-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: January 1, 2013
    Assignee: ATI Technologies, Inc.
    Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham