Patents by Inventor Boris Beylin

Boris Beylin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices

Patent number: 10061592

Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.

Type: Grant

Filed: March 30, 2015

Date of Patent: August 28, 2018

Assignee: Samsung Electronics Co., Ltd.

Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing

Patent number: 10061591

Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.

Type: Grant

Filed: February 26, 2015

Date of Patent: August 28, 2018

Assignee: Samsung Electronics Company, Ltd.

Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul
Control flow in a thread-based environment without branching

Patent number: 9727341

Abstract: A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.

Type: Grant

Filed: August 12, 2014

Date of Patent: August 8, 2017

Assignee: Samsung Electronics Co., Ltd.

Inventors: Mitchell Alsup, Yang Jiao, Boris Beylin, Maxim Lukyanov, Alexander Grosul
Trace-based instruction execution processing

Patent number: 9483264

Abstract: A method for executing instructions in a thread processing environment includes determining a multiple requirements that must be satisfied and resources that must be available for executing multiple instructions. The multiple instructions are encapsulated into a schedulable structure. A header is configured for the schedulable structure with information including the determined multiple requirements and resources. The schedulable structure is schedule for executing each of the multiple instructions using the information.

Type: Grant

Filed: August 12, 2014

Date of Patent: November 1, 2016

Assignee: Samsung Electronics Co., Ltd.

Inventors: Mitchell Alsup, Boris Beylin, Michael Shebanow, SungSoo Park
Control flow optimization for efficient program code execution on a processor

Patent number: 9292269

Abstract: A method includes identifying a divergent region of interest (DRI) not including a post dominator node thereof within a control flow graph, and introducing a decision node in the control flow graph such that the decision node post-dominates an entry point of the DRI and is dominated by the entry point. The method also includes redirecting a regular control flow path within the control flow graph from another node previously coupled to the DRI to the decision node, and redirecting a runaway path from the another node to the decision node. Further, the method includes marking the runaway path to differentiate the runaway path from the regular control flow path, and directing control flow from the decision node to an originally intended destination of each of the regular control flow path and the runaway path based on the marking to provide for program thread synchronization and optimization within the DRI.

Type: Grant

Filed: January 31, 2014

Date of Patent: March 22, 2016

Assignee: NVIDIA Corporation

Inventors: Shekhar Vasant Divekar, Balajikrishna Atukuri, Boris Beylin
REDUNDANCY ELIMINATION IN SINGLE INSTRUCTION MULTIPLE DATA/THREAD (SIMD/T) EXECUTION PROCESSING

Publication number: 20150378733

Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.

Type: Application

Filed: February 26, 2015

Publication date: December 31, 2015

Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul
ARCHITECTURE AND EXECUTION FOR EFFICIENT MIXED PRECISION COMPUTATIONS IN SINGLE INSTRUCTION MULTIPLE DATA/THREAD (SIMD/T) DEVICES

Publication number: 20150378741

Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.

Type: Application

Filed: March 30, 2015

Publication date: December 31, 2015

Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
CONTROL FLOW WITHOUT BRANCHING

Publication number: 20150324198

Abstract: A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.

Type: Application

Filed: August 12, 2014

Publication date: November 12, 2015

Inventors: Mitchell Alsup, Yang Jiao, Boris Beylin, Maxim Lukyanov, Alexander Grosul
TRACE-BASED INSTRUCTION EXECUTION PROCESSING

Publication number: 20150324228

Abstract: A method for executing instructions in a thread processing environment includes determining a multiple requirements that must be satisfied and resources that must be available for executing multiple instructions. The multiple instructions are encapsulated into a schedulable structure. A header is configured for the schedulable structure with information including the determined multiple requirements and resources. The schedulable structure is schedule for executing each of the multiple instructions using the information.

Type: Application

Filed: August 12, 2014

Publication date: November 12, 2015

Inventors: Mitchell Alsup, Boris Beylin, Michael Shebanow, SungSoo Park
Efficient placement of texture barrier instructions

Patent number: 9142005

Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.

Type: Grant

Filed: August 20, 2012

Date of Patent: September 22, 2015

Assignee: NVIDIA CORPORATION

Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
CONTROL FLOW OPTIMIZATION FOR EFFICIENT PROGRAM CODE EXECUTION ON A PROCESSOR

Publication number: 20150220314

Abstract: A method includes identifying a divergent region of interest (DRI) not including a post dominator node thereof within a control flow graph, and introducing a decision node in the control flow graph such that the decision node post-dominates an entry point of the DRI and is dominated by the entry point. The method also includes redirecting a regular control flow path within the control flow graph from another node previously coupled to the DRI to the decision node, and redirecting a runaway path from the another node to the decision node. Further, the method includes marking the runaway path to differentiate the runaway path from the regular control flow path, and directing control flow from the decision node to an originally intended destination of each of the regular control flow path and the runaway path based on the marking to provide for program thread synchronization and optimization within the DRI.

Type: Application

Filed: January 31, 2014

Publication date: August 6, 2015

Applicant: NVIDIA Corporation

Inventors: Shekhar Vasant Divekar, Balajikrishna Atukuri, Boris Beylin
EFFICIENT PLACEMENT OF TEXTURE BARRIER INSTRUCTIONS

Publication number: 20140049549

Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.

Type: Application

Filed: August 20, 2012

Publication date: February 20, 2014

Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
Retargetting an application program for execution by a general purpose processor

Patent number: 8612732

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 19, 2009

Date of Patent: December 17, 2013

Assignee: NVIDIA Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Boris Beylin, Jayant B. Kolhe, Douglas Saylor
Insertion of multithreaded execution synchronization points in a software program

Patent number: 8381203

Abstract: A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.

Type: Grant

Filed: November 3, 2006

Date of Patent: February 19, 2013

Assignee: NVIDIA Corporation

Inventors: Boris Beylin, Robert Steven Glanville
String search scheme in a distributed architecture

Patent number: 8321440

Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.

Type: Grant

Filed: March 7, 2011

Date of Patent: November 27, 2012

Assignee: Intel Corporation

Inventor: Boris Beylin
STRING SEARCH SCHEME IN A DISTRIBUTED ARCHITECTURE

Publication number: 20110173232

Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.

Type: Application

Filed: March 7, 2011

Publication date: July 14, 2011

Applicant: INTEL CORPORATION

Inventor: Boris Beylin
Debugging tool for debugging multi-threaded programs

Patent number: 7945900

Abstract: A method includes running a debugging tool in regard to a program which is undergoing debugging. The program may support multi-threaded operation. The method further includes presenting an option to a user via the debugging tool with respect to a program instruction in a first thread of the program. The program instruction may be for putting an item of data into a queue. The method also includes, if the user exercises the option, identifying a program instruction in a second thread of the program. The second thread is different from the first thread. The identified program instruction in the second thread may be for getting the item of data from the queue. The method further includes stopping execution of the program at the identified program instruction in the second thread.

Type: Grant

Filed: April 29, 2004

Date of Patent: May 17, 2011

Assignee: Marvell International Ltd.

Inventors: Cheng-Hsueh Hsieh, Jason Dai, Boris Beylin
String search scheme in a distributed architecture

Patent number: 7917509

Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.

Type: Grant

Filed: November 5, 2007

Date of Patent: March 29, 2011

Assignee: Intel Corporation

Inventor: Boris Beylin
Method and apparatus for register allocation in presence of hardware constraints

Patent number: 7681187

Abstract: A method and apparatus for optimizing register allocation during scheduling and execution of program code in a hardware environment. The program code can be compiled to optimize execution given predetermined hardware constraints. The hardware constraints can include the number of register read and write operations that can be performed in a given processor pass. The optimizer can initially schedule the program using virtual registers and a goal of minimizing the amount of active registers at any time. The optimizer reschedules the program to assign the virtual registers to actual physical registers in a manner that minimizes the number of processor passes used to execute the program.

Type: Grant

Filed: March 31, 2005

Date of Patent: March 16, 2010

Assignee: NVIDIA Corporation

Inventors: Michael G. Ludwig, Jayant B. Kolhe, Robert Steven Glanville, Geoffrey C. Berry, Boris Beylin, Michael T. Bunnell
RETARGETTING AN APPLICATION PROGRAM FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR

Publication number: 20090259832

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Application

Filed: March 19, 2009

Publication date: October 15, 2009

Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR

1 2 next