Patents by Inventor Boris Beylin

Boris Beylin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10061592
    Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: August 28, 2018
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
  • Patent number: 10061591
    Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.
    Type: Grant
    Filed: February 26, 2015
    Date of Patent: August 28, 2018
    Assignee: Samsung Electronics Company, Ltd.
    Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul
  • Patent number: 9727341
    Abstract: A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.
    Type: Grant
    Filed: August 12, 2014
    Date of Patent: August 8, 2017
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mitchell Alsup, Yang Jiao, Boris Beylin, Maxim Lukyanov, Alexander Grosul
  • Patent number: 9483264
    Abstract: A method for executing instructions in a thread processing environment includes determining a multiple requirements that must be satisfied and resources that must be available for executing multiple instructions. The multiple instructions are encapsulated into a schedulable structure. A header is configured for the schedulable structure with information including the determined multiple requirements and resources. The schedulable structure is schedule for executing each of the multiple instructions using the information.
    Type: Grant
    Filed: August 12, 2014
    Date of Patent: November 1, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mitchell Alsup, Boris Beylin, Michael Shebanow, SungSoo Park
  • Patent number: 9292269
    Abstract: A method includes identifying a divergent region of interest (DRI) not including a post dominator node thereof within a control flow graph, and introducing a decision node in the control flow graph such that the decision node post-dominates an entry point of the DRI and is dominated by the entry point. The method also includes redirecting a regular control flow path within the control flow graph from another node previously coupled to the DRI to the decision node, and redirecting a runaway path from the another node to the decision node. Further, the method includes marking the runaway path to differentiate the runaway path from the regular control flow path, and directing control flow from the decision node to an originally intended destination of each of the regular control flow path and the runaway path based on the marking to provide for program thread synchronization and optimization within the DRI.
    Type: Grant
    Filed: January 31, 2014
    Date of Patent: March 22, 2016
    Assignee: NVIDIA Corporation
    Inventors: Shekhar Vasant Divekar, Balajikrishna Atukuri, Boris Beylin
  • Publication number: 20150378733
    Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.
    Type: Application
    Filed: February 26, 2015
    Publication date: December 31, 2015
    Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul
  • Publication number: 20150378741
    Abstract: A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions.
    Type: Application
    Filed: March 30, 2015
    Publication date: December 31, 2015
    Inventors: Maxim Lukyanov, Alexander Grosul, Mitchell Alsup, Boris Beylin
  • Publication number: 20150324198
    Abstract: A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.
    Type: Application
    Filed: August 12, 2014
    Publication date: November 12, 2015
    Inventors: Mitchell Alsup, Yang Jiao, Boris Beylin, Maxim Lukyanov, Alexander Grosul
  • Publication number: 20150324228
    Abstract: A method for executing instructions in a thread processing environment includes determining a multiple requirements that must be satisfied and resources that must be available for executing multiple instructions. The multiple instructions are encapsulated into a schedulable structure. A header is configured for the schedulable structure with information including the determined multiple requirements and resources. The schedulable structure is schedule for executing each of the multiple instructions using the information.
    Type: Application
    Filed: August 12, 2014
    Publication date: November 12, 2015
    Inventors: Mitchell Alsup, Boris Beylin, Michael Shebanow, SungSoo Park
  • Patent number: 9142005
    Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: September 22, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
  • Publication number: 20150220314
    Abstract: A method includes identifying a divergent region of interest (DRI) not including a post dominator node thereof within a control flow graph, and introducing a decision node in the control flow graph such that the decision node post-dominates an entry point of the DRI and is dominated by the entry point. The method also includes redirecting a regular control flow path within the control flow graph from another node previously coupled to the DRI to the decision node, and redirecting a runaway path from the another node to the decision node. Further, the method includes marking the runaway path to differentiate the runaway path from the regular control flow path, and directing control flow from the decision node to an originally intended destination of each of the regular control flow path and the runaway path based on the marking to provide for program thread synchronization and optimization within the DRI.
    Type: Application
    Filed: January 31, 2014
    Publication date: August 6, 2015
    Applicant: NVIDIA Corporation
    Inventors: Shekhar Vasant Divekar, Balajikrishna Atukuri, Boris Beylin
  • Publication number: 20140049549
    Abstract: One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.
    Type: Application
    Filed: August 20, 2012
    Publication date: February 20, 2014
    Inventors: Maxim Lukyanov, Boris Beylin, Robert Steven Glanville, Alexander Grosul
  • Patent number: 8612732
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: December 17, 2013
    Assignee: NVIDIA Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy, Boris Beylin, Jayant B. Kolhe, Douglas Saylor
  • Patent number: 8381203
    Abstract: A compiler is configured to determine a set of points in a flow graph for a software program where multithreaded execution synchronization points are inserted to synchronize divergent threads for SIMD processing. MIMD execution of divergent threads is allowed and execution of the divergent threads proceeds until a synchronization point is reached. When all of the threads reach the synchronization point, synchronous execution resumes. The synchronization points are needed to ensure proper execution of the certain instructions that require synchronous execution as defined in some graphics APIs and when synchronous execution improves performance based on a SIMD architecture.
    Type: Grant
    Filed: November 3, 2006
    Date of Patent: February 19, 2013
    Assignee: NVIDIA Corporation
    Inventors: Boris Beylin, Robert Steven Glanville
  • Patent number: 8321440
    Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.
    Type: Grant
    Filed: March 7, 2011
    Date of Patent: November 27, 2012
    Assignee: Intel Corporation
    Inventor: Boris Beylin
  • Publication number: 20110173232
    Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.
    Type: Application
    Filed: March 7, 2011
    Publication date: July 14, 2011
    Applicant: INTEL CORPORATION
    Inventor: Boris Beylin
  • Patent number: 7945900
    Abstract: A method includes running a debugging tool in regard to a program which is undergoing debugging. The program may support multi-threaded operation. The method further includes presenting an option to a user via the debugging tool with respect to a program instruction in a first thread of the program. The program instruction may be for putting an item of data into a queue. The method also includes, if the user exercises the option, identifying a program instruction in a second thread of the program. The second thread is different from the first thread. The identified program instruction in the second thread may be for getting the item of data from the queue. The method further includes stopping execution of the program at the identified program instruction in the second thread.
    Type: Grant
    Filed: April 29, 2004
    Date of Patent: May 17, 2011
    Assignee: Marvell International Ltd.
    Inventors: Cheng-Hsueh Hsieh, Jason Dai, Boris Beylin
  • Patent number: 7917509
    Abstract: Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings.
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: March 29, 2011
    Assignee: Intel Corporation
    Inventor: Boris Beylin
  • Patent number: 7681187
    Abstract: A method and apparatus for optimizing register allocation during scheduling and execution of program code in a hardware environment. The program code can be compiled to optimize execution given predetermined hardware constraints. The hardware constraints can include the number of register read and write operations that can be performed in a given processor pass. The optimizer can initially schedule the program using virtual registers and a goal of minimizing the amount of active registers at any time. The optimizer reschedules the program to assign the virtual registers to actual physical registers in a manner that minimizes the number of processor passes used to execute the program.
    Type: Grant
    Filed: March 31, 2005
    Date of Patent: March 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: Michael G. Ludwig, Jayant B. Kolhe, Robert Steven Glanville, Geoffrey C. Berry, Boris Beylin, Michael T. Bunnell
  • Publication number: 20090259832
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 19, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR