Patents by Inventor Alexander Lyashevsky
Alexander Lyashevsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11625807Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.Type: GrantFiled: February 22, 2021Date of Patent: April 11, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
-
Publication number: 20220416999Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Supratim Pal, Wajdi Feghali, Changwon Rhee, Wei-Yu Chen, Timothy R. Bauer, Alexander Lyashevsky
-
Publication number: 20220413848Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
-
Publication number: 20210319323Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve algorithmic solver performance. An example apparatus includes graph transforming circuitry to generate a vector representation corresponding to a graph input, vector classification circuitry to generate a node embedding machine learning classifier, the node embedding machine learning classifier to cause an output layer of probabilities corresponding to nodes of the graph input, loss calculating circuitry to train a model based on a target algorithmic function, the loss calculating circuitry to inject a solution diversity to reduce equivalent solution error of the target algorithmic function, and algorithmic solving circuitry to calculate solutions based on ranked ones of the output layer of probabilities.Type: ApplicationFiled: June 24, 2021Publication date: October 14, 2021Inventors: Alexey Titov, Alexander Lyashevsky, Lukasz Kuszner
-
Publication number: 20210201439Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.Type: ApplicationFiled: February 22, 2021Publication date: July 1, 2021Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
-
Patent number: 10929944Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.Type: GrantFiled: November 23, 2016Date of Patent: February 23, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
-
Publication number: 20180144435Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.Type: ApplicationFiled: November 23, 2016Publication date: May 24, 2018Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
-
Patent number: 9740511Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.Type: GrantFiled: June 4, 2015Date of Patent: August 22, 2017Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Alexander Lyashevsky
-
Patent number: 9565433Abstract: A system for decoding video data includes a processing unit. The processing unit includes a plurality of processing pipelines and a driver. The driver includes a decoder configured to generate a plurality of intermediate control maps containing control information including an indication of which macro blocks or portions of macro blocks may be processed in parallel in the plurality of processing pipelines.Type: GrantFiled: December 28, 2012Date of Patent: February 7, 2017Assignee: ATI TECHNOLOGIES ULCInventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
-
Publication number: 20160357580Abstract: A method of enhancing performance of an application executing in a parallel processor and a system for executing the method are disclosed. A block size for input to the application is determined. Input is partitioned into blocks having the block size. Input within each block is sorted. The application is executed with the sorted input.Type: ApplicationFiled: June 4, 2015Publication date: December 8, 2016Applicant: Advanced Micro Devices, Inc.Inventor: Alexander Lyashevsky
-
Patent number: 9495718Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.Type: GrantFiled: June 7, 2013Date of Patent: November 15, 2016Assignee: Advanced Micro Devices, Inc.Inventor: Alexander Lyashevsky
-
Patent number: 9367372Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.Type: GrantFiled: June 18, 2013Date of Patent: June 14, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
-
Patent number: 9274904Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.Type: GrantFiled: June 18, 2013Date of Patent: March 1, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
-
Patent number: 9055306Abstract: Embodiments of a method and system for decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps include information regarding rearranging the video data to be processed in parallel on multiple pipelines of a graphics processing unit (GPU) so as to optimize the use of the multiple pipelines. In an embodiment, decoding is performed on a frame basis such that each of multiple, distinct decoding operations is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved.Type: GrantFiled: August 31, 2006Date of Patent: June 9, 2015Assignee: ATI Technologies ULCInventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham
-
Patent number: 9049461Abstract: Embodiments of a method and system for inter-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an inter-prediction operation. In an embodiment, inter-prediction is performed on a frame basis such that inter-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the inter-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.Type: GrantFiled: August 31, 2006Date of Patent: June 2, 2015Assignee: ATI Technologies ULCInventors: Alexander Lyashevsky, Jason Yang, Arcot J Preetham
-
Publication number: 20140368513Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.Type: ApplicationFiled: June 18, 2013Publication date: December 18, 2014Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
-
Publication number: 20140373028Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.Type: ApplicationFiled: June 18, 2013Publication date: December 18, 2014Inventors: Alexander Lyashevsky, Sudhanva Gurumurthi, Vilas Sridharan
-
Publication number: 20130328891Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.Type: ApplicationFiled: June 7, 2013Publication date: December 12, 2013Inventor: Alexander Lyashevsky
-
Patent number: 8487929Abstract: A method and computer program product are provided for resolution enhancement of a video stream based on spatial and temporal correlation. For instance, the method can include predicting interpolated pixels for an image frame of the video stream based on a spatial correlation of pixels in the image frame. The method can also include generating one or more motion vectors for the image frame. Based on the spatially-correlated pixels and the one or more motion vectors, an enhanced image can be reconstructed. Further, the method can include providing a correction factor to one or more pixels in the enhanced image frame.Type: GrantFiled: May 3, 2010Date of Patent: July 16, 2013Assignee: Advanced Micro Devices, Inc.Inventor: Alexander Lyashevsky
-
Patent number: 8345756Abstract: Embodiments of a method and system for intra-prediction in decoding video data are described herein. In various embodiments, a high-compression-ratio codec (such as H.264) is part of the encoding scheme for the video data. Embodiments pre-process control maps that were generated from encoded video data, and generating intermediate control maps comprising information regarding decoding the video data. The control maps indicate which units of video data in a frame are to be processed using an intra-prediction operation. In an embodiment, intra-prediction is performed on a frame basis such that intra-prediction is performed on an entire frame at one time. In other embodiments, processing of different frames is interleaved. Embodiments increase the efficiency of the intra-prediction such as to allow decoding of high-compression-ratio encoded video data on personal computers or comparable equipment without special, additional decoding hardware.Type: GrantFiled: August 31, 2006Date of Patent: January 1, 2013Assignee: ATI Technologies, Inc.Inventors: Alexander Lyashevsky, Jason Yang, Arcot J. Preetham