Patents by Inventor Arch D. Robison

Arch D. Robison has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10528345
    Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.
    Type: Grant
    Filed: March 27, 2015
    Date of Patent: January 7, 2020
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
  • Patent number: 9760410
    Abstract: Technologies for multithreaded synchronization including a computing device having a many-core processor. Each processor core includes multiple hardware threads. A hardware thread executed by a processor core enters a synchronization barrier and synchronizes with other hardware threads executed by the same processor core. After synchronization, the hardware thread synchronizes with a source hardware thread that may be executed by a different processor core. The source hardware thread may be assigned using an n-way shuffle of all hardware threads, where n is the number of hardware threads per processor core. The hardware thread resynchronizes with the other hardware threads executed by the same processor core. The hardware thread alternately synchronizes with the source hardware thread and the other hardware threads executed by the same processor core until all hardware threads have synchronized. The computing device may reduce a Boolean value over the synchronization barrier.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: September 12, 2017
    Assignee: Intel Corporation
    Inventor: Arch D. Robison
  • Patent number: 9747108
    Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 27, 2015
    Date of Patent: August 29, 2017
    Assignee: Intel Corporation
    Inventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
  • Patent number: 9690552
    Abstract: Technologies for generating composable library functions include a first computing device that includes a library compiler configured to compile a composable library and second computing device that includes an application compiler configured to compose library functions of the composable library based on a plurality of abstractions written at different levels of abstractions. For example, the abstractions may include an algorithm abstraction at a high level, a blocked-algorithm abstraction at medium level, and a region-based code abstraction at a low level. Other embodiments are described and claimed herein.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: June 27, 2017
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Peng Tu, Tatiana Shpeisman, Hai Liu, Todd A. Anderson, Youfeng Wu, Paul M. Petersen, Victor W. Lee, P. G. Lowney, Arch D. Robison, Cheng Wang
  • Patent number: 9507594
    Abstract: A predicated instruction compilation system includes a control flow graph generation module to generate a control flow graph of a program code to be compiled into the predicated instructions to be executed on a processor that does not include any program counter. Each of the instructions includes a predicate guard and a predicate update. The compilation system also includes a control flow transformation module to automatically generate the predicate guard and an update to the predicate state on the processor. A computer-implemented method of compiling a program code into predicated instructions is also described.
    Type: Grant
    Filed: July 2, 2013
    Date of Patent: November 29, 2016
    Assignee: Intel Corporation
    Inventor: Arch D. Robison
  • Publication number: 20160283237
    Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.
    Type: Application
    Filed: March 27, 2015
    Publication date: September 29, 2016
    Inventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
  • Publication number: 20160283245
    Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.
    Type: Application
    Filed: March 27, 2015
    Publication date: September 29, 2016
    Applicant: INTEL CORPORATION
    Inventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
  • Publication number: 20160170813
    Abstract: Technologies for multithreaded synchronization including a computing device having a many-core processor. Each processor core includes multiple hardware threads. A hardware thread executed by a processor core enters a synchronization barrier and synchronizes with other hardware threads executed by the same processor core. After synchronization, the hardware thread synchronizes with a source hardware thread that may be executed by a different processor core. The source hardware thread may be assigned using an n-way shuffle of all hardware threads, where n is the number of hardware threads per processor core. The hardware thread resynchronizes with the other hardware threads executed by the same processor core. The hardware thread alternately synchronizes with the source hardware thread and the other hardware threads executed by the same processor core until all hardware threads have synchronized. The computing device may reduce a Boolean value over the synchronization barrier.
    Type: Application
    Filed: December 12, 2014
    Publication date: June 16, 2016
    Inventor: Arch D. Robison
  • Publication number: 20160170812
    Abstract: Technologies for multithreaded synchronization and work stealing include a computing device executing two or more threads in a thread team. A thread executes all of the tasks in its task queue and then exchanges its associated task stolen flag value with false and stores that value in a temporary flag. Subsequently, the thread enters a basic synchronization barrier. The computing device performs a logical-OR reduction over the temporary flags of the thread team to produce a reduction value. While waiting for other threads of the thread team to enter the barrier, the thread may steal a task from a victim thread and set the task stolen flag of the victim thread to true. After exiting the basic synchronization barrier, if the reduction value is true, the thread repeats exchanging the task stolen flag value and entering the basic synchronization barrier. Other embodiments are described and claimed.
    Type: Application
    Filed: December 12, 2014
    Publication date: June 16, 2016
    Inventors: Arch D. Robison, Alejandro Duran Gonzalez
  • Patent number: 9348658
    Abstract: Technologies for multithreaded synchronization and work stealing include a computing device executing two or more threads in a thread team. A thread executes all of the tasks in its task queue and then exchanges its associated task stolen flag value with false and stores that value in a temporary flag. Subsequently, the thread enters a basic synchronization barrier. The computing device performs a logical-OR reduction over the temporary flags of the thread team to produce a reduction value. While waiting for other threads of the thread team to enter the barrier, the thread may steal a task from a victim thread and set the task stolen flag of the victim thread to true. After exiting the basic synchronization barrier, if the reduction value is true, the thread repeats exchanging the task stolen flag value and entering the basic synchronization barrier. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: May 24, 2016
    Assignee: Intel Corporation
    Inventors: Arch D. Robison, Alejandro Duran Gonzalez
  • Publication number: 20150012729
    Abstract: A predicated instruction compilation system includes a control flow graph generation module to generate a control flow graph of a program code to be compiled into the predicated instructions to be executed on a processor that does not include any program counter. Each of the instructions includes a predicate guard and a predicate update. The compilation system also includes a control flow transformation module to automatically generate the predicate guard and an update to the predicate state on the processor. A computer-implemented method of compiling a program code into predicated instructions is also described.
    Type: Application
    Filed: July 2, 2013
    Publication date: January 8, 2015
    Inventor: Arch D. Robison
  • Patent number: 8707324
    Abstract: Implementing fair scalable reader writer mutual exclusion for access to a critical section by a plurality of processing threads is accomplished by creating a first queue node for a first thread, the first queue node representing a request by the first thread to access the critical section; setting at least one pointer within a queue to point to the first queue node, the queue representing at least one thread desiring access to the critical section; waiting until a condition is met, the condition comprising the first queue node having no preceding write requests as indicated by at least one predecessor queue node on the queue; permitting the first thread to enter the critical section in response to the condition being met; and causing the first thread to release a spin lock, the spin lock acquired by a second thread of the plurality of processing threads.
    Type: Grant
    Filed: February 27, 2012
    Date of Patent: April 22, 2014
    Assignee: Intel Corporation
    Inventors: Alexey Kukanov, Arch D. Robison
  • Patent number: 8468509
    Abstract: A method for computing a trip count for a loop in advance of the execution of the loop is provided. The method comprises identifying the elements of a loop; returning infinity, if a first index value satisfies a first condition and that a first step size is equal to zero; modifying the first index value and the first step size, if the first index value satisfies the first condition, when the first step size is not equal to zero, and the first step size is greater than half of a first modulus; returning the result computed by applying a formula that divides the difference between a first condition value and the first index value by the first step size and rounds up to a next integer when there is a non-zero remainder; and returning a second trip count for a second loop based on the elements of the first loop.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: June 18, 2013
    Assignee: Intel Corporation
    Inventor: Arch D. Robison
  • Publication number: 20120198471
    Abstract: Implementing fair scalable reader writer mutual exclusion for access to a critical section by a plurality of processing threads is accomplished by creating a first queue node for a first thread, the first queue node representing a request by the first thread to access the critical section; setting at least one pointer within a queue to point to the first queue node, the queue representing at least one thread desiring access to the critical section; waiting until a condition is met, the condition comprising the first queue node having no preceding write requests as indicated by at least one predecessor queue node on the queue; permitting the first thread to enter the critical section in response to the condition being met; and causing the first thread to release a spin lock, the spin lock acquired by a second thread of the plurality of processing threads.
    Type: Application
    Filed: February 27, 2012
    Publication date: August 2, 2012
    Inventors: Alexey Kukanov, Arch D. Robison
  • Patent number: 8108867
    Abstract: A method, device, system, and computer readable medium are disclosed. In one embodiment the method includes managing one or more threads attempting to steal task work from one or more other threads. The method will block a thread from stealing a mailed task that is also residing in another thread's task pool. The blocking occurs when the mailed task was mailed to an idle third thread. Additionally, some tasks are deferred instead of immediately spawned.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: January 31, 2012
    Assignee: Intel Corporation
    Inventor: Arch D. Robison
  • Publication number: 20090320040
    Abstract: A method, device, system, and computer readable medium are disclosed. In one embodiment the method includes managing one or more threads attempting to steal task work from one or more other threads. The method will block a thread from stealing a mailed task that is also residing in another thread's task pool. The blocking occurs when the mailed task was mailed to an idle third thread. Additionally, some tasks are deferred instead of immediately spawned.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 24, 2009
    Inventor: Arch D. Robison
  • Patent number: 7624386
    Abstract: A dependence graph having a linear number of edges and one or more tie vertices is generated by constructing a tree of nodes, receiving requests to create cut and/or fan vertices corresponding to each node, adjusting a frontier of nodes up or down, and creating one or more cut or fan vertices, zero or more tie vertices, and at least one predecessor edge.
    Type: Grant
    Filed: December 16, 2004
    Date of Patent: November 24, 2009
    Assignee: Intel Corporation
    Inventor: Arch D. Robison
  • Publication number: 20090248776
    Abstract: A method for computing a trip count for a loop in advance of the execution of the loop is provided. The method comprises identifying the elements of a loop; returning infinity, if a first index value satisfies a first condition and that a first step size is equal to zero; modifying the first index value and the first step size, if the first index value satisfies the first condition, when the first step size is not equal to zero, and the first step size is greater than half of a first modulus; returning the result computed by applying a formula that divides the difference between a first condition value and the first index value by the first step size and rounds up to a next integer when there is a non-zero remainder; and returning a second trip count for a second loop based on the elements of the first loop.
    Type: Application
    Filed: March 27, 2008
    Publication date: October 1, 2009
    Inventor: Arch D. Robison
  • Publication number: 20090125519
    Abstract: A method, apparatus and system for, in a computing apparatus, comparing a measure of data contention for a group of operations protected by a lock to a predetermined threshold for data contention, and comparing a measure of lock contention for the group of operations to a predetermined threshold for lock contention, eliding the lock for concurrently executing two or more of the operations of the group using two or more threads when the measure of data contention is approximately less than or equal to the predetermined threshold for data contention and the measure of lock contention is approximately greater than or equal to a predetermined threshold for lock contention, and acquiring the lock for executing two or more of the of operations of the group in a serialized manner when the measure of data contention is approximately greater than or equal to the predetermined threshold for data contention and the measure of lock contention is approximately less than or equal to a predetermined threshold for lock conte
    Type: Application
    Filed: November 13, 2007
    Publication date: May 14, 2009
    Inventors: Arch D. Robison, Paul M. Petersen
  • Patent number: 7257808
    Abstract: A system and method to reduce the size of source code in a processing system are described. Multiple subgraph structures are identified within a graph structure constructed for multiple source code instructions in a program. Unifiable variables that are not simultaneously used in the source code instructions are identified within each subgraph structure. Finally, one or more unifiable instructions from a tine of a corresponding subgraph structure are transferred to a handle of the corresponding subgraph structure, each unifiable instruction containing one or more unifiable variables.
    Type: Grant
    Filed: January 3, 2002
    Date of Patent: August 14, 2007
    Assignee: Intel Corporation
    Inventor: Arch D. Robison