Patents by Inventor Arch D. Robison

Arch D. Robison has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions and logic to provide atomic range modification operations

Patent number: 10528345

Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.

Type: Grant

Filed: March 27, 2015

Date of Patent: January 7, 2020

Assignee: Intel Corporation

Inventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
Technologies for fast synchronization barriers for many-core processing

Patent number: 9760410

Abstract: Technologies for multithreaded synchronization including a computing device having a many-core processor. Each processor core includes multiple hardware threads. A hardware thread executed by a processor core enters a synchronization barrier and synchronizes with other hardware threads executed by the same processor core. After synchronization, the hardware thread synchronizes with a source hardware thread that may be executed by a different processor core. The source hardware thread may be assigned using an n-way shuffle of all hardware threads, where n is the number of hardware threads per processor core. The hardware thread resynchronizes with the other hardware threads executed by the same processor core. The hardware thread alternately synchronizes with the source hardware thread and the other hardware threads executed by the same processor core until all hardware threads have synchronized. The computing device may reduce a Boolean value over the synchronization barrier.

Type: Grant

Filed: December 12, 2014

Date of Patent: September 12, 2017

Assignee: Intel Corporation

Inventor: Arch D. Robison
User-level fork and join processors, methods, systems, and instructions

Patent number: 9747108

Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.

Type: Grant

Filed: March 27, 2015

Date of Patent: August 29, 2017

Assignee: Intel Corporation

Inventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
Technologies for low-level composable high performance computing libraries

Patent number: 9690552

Abstract: Technologies for generating composable library functions include a first computing device that includes a library compiler configured to compile a composable library and second computing device that includes an application compiler configured to compose library functions of the composable library based on a plurality of abstractions written at different levels of abstractions. For example, the abstractions may include an algorithm abstraction at a high level, a blocked-algorithm abstraction at medium level, and a region-based code abstraction at a low level. Other embodiments are described and claimed herein.

Type: Grant

Filed: December 27, 2014

Date of Patent: June 27, 2017

Assignee: Intel Corporation

Inventors: Hongbo Rong, Peng Tu, Tatiana Shpeisman, Hai Liu, Todd A. Anderson, Youfeng Wu, Paul M. Petersen, Victor W. Lee, P. G. Lowney, Arch D. Robison, Cheng Wang
Method and system of compiling program code into predicated instructions for execution on a processor without a program counter

Patent number: 9507594

Abstract: A predicated instruction compilation system includes a control flow graph generation module to generate a control flow graph of a program code to be compiled into the predicated instructions to be executed on a processor that does not include any program counter. Each of the instructions includes a predicate guard and a predicate update. The compilation system also includes a control flow transformation module to automatically generate the predicate guard and an update to the predicate state on the processor. A computer-implemented method of compiling a program code into predicated instructions is also described.

Type: Grant

Filed: July 2, 2013

Date of Patent: November 29, 2016

Assignee: Intel Corporation

Inventor: Arch D. Robison
USER-LEVEL FORK AND JOIN PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20160283245

Abstract: A processor of an aspect includes a plurality of processor elements, and a first processor element. The first processor element may perform a user-level fork instruction of a software thread. The first processor element may include a decoder to decode the user-level fork instruction. The user-level fork instruction is to indicate at least one instruction address. The first processor element may also include a user-level thread fork module. The user-level fork module, in response to the user-level fork instruction being decoded, may configure each of the plurality of processor elements to perform instructions in parallel. Other processors, methods, systems, and instructions are disclosed.

Type: Application

Filed: March 27, 2015

Publication date: September 29, 2016

Applicant: INTEL CORPORATION

Inventors: Oren Ben-Kiki, Ilan Pardo, Arch D. Robison, James H. Cownie
INSTRUCTIONS AND LOGIC TO PROVIDE ATOMIC RANGE OPERATIONS

Publication number: 20160283237

Abstract: Instructions and logic provide atomic range operations in a multiprocessing system. In one embodiment an atomic range modification instruction specifies an address for a set of range indices. The instruction locks access to the set of range indices and loads the range indices to check the range size. The range size is compared with a size sufficient to perform the range modification. If the range size is sufficient to perform the range modification, the range modification is performed and one or more modified range indices of the set of range indices is stored back to memory. Otherwise an error signal is set when the range size is not sufficient to perform said range modification. Access to the set of range indices is unlocked responsive to completion of the atomic range modification instruction. Embodiments may include atomic increment next instructions, add next instructions, decrement end instructions, and/or subtract end instructions.

Type: Application

Filed: March 27, 2015

Publication date: September 29, 2016

Inventors: Ilan Pardo, Oren Ben-Kiki, Arch D. Robison, Nadav Chachmon, James H. Cownie
TECHNOLOGIES FOR FAST SYNCHRONIZATION BARRIERS FOR MANY-CORE PROCESSING

Publication number: 20160170813

Abstract: Technologies for multithreaded synchronization including a computing device having a many-core processor. Each processor core includes multiple hardware threads. A hardware thread executed by a processor core enters a synchronization barrier and synchronizes with other hardware threads executed by the same processor core. After synchronization, the hardware thread synchronizes with a source hardware thread that may be executed by a different processor core. The source hardware thread may be assigned using an n-way shuffle of all hardware threads, where n is the number of hardware threads per processor core. The hardware thread resynchronizes with the other hardware threads executed by the same processor core. The hardware thread alternately synchronizes with the source hardware thread and the other hardware threads executed by the same processor core until all hardware threads have synchronized. The computing device may reduce a Boolean value over the synchronization barrier.

Type: Application

Filed: December 12, 2014

Publication date: June 16, 2016

Inventor: Arch D. Robison
TECHNOLOGIES FOR EFFICIENT SYNCHRONIZATION BARRIERS WITH WORK STEALING SUPPORT

Publication number: 20160170812

Abstract: Technologies for multithreaded synchronization and work stealing include a computing device executing two or more threads in a thread team. A thread executes all of the tasks in its task queue and then exchanges its associated task stolen flag value with false and stores that value in a temporary flag. Subsequently, the thread enters a basic synchronization barrier. The computing device performs a logical-OR reduction over the temporary flags of the thread team to produce a reduction value. While waiting for other threads of the thread team to enter the barrier, the thread may steal a task from a victim thread and set the task stolen flag of the victim thread to true. After exiting the basic synchronization barrier, if the reduction value is true, the thread repeats exchanging the task stolen flag value and entering the basic synchronization barrier. Other embodiments are described and claimed.

Type: Application

Filed: December 12, 2014

Publication date: June 16, 2016

Inventors: Arch D. Robison, Alejandro Duran Gonzalez
Technologies for efficient synchronization barriers with work stealing support

Patent number: 9348658

Abstract: Technologies for multithreaded synchronization and work stealing include a computing device executing two or more threads in a thread team. A thread executes all of the tasks in its task queue and then exchanges its associated task stolen flag value with false and stores that value in a temporary flag. Subsequently, the thread enters a basic synchronization barrier. The computing device performs a logical-OR reduction over the temporary flags of the thread team to produce a reduction value. While waiting for other threads of the thread team to enter the barrier, the thread may steal a task from a victim thread and set the task stolen flag of the victim thread to true. After exiting the basic synchronization barrier, if the reduction value is true, the thread repeats exchanging the task stolen flag value and entering the basic synchronization barrier. Other embodiments are described and claimed.

Type: Grant

Filed: December 12, 2014

Date of Patent: May 24, 2016

Assignee: Intel Corporation

Inventors: Arch D. Robison, Alejandro Duran Gonzalez
Method and system of compiling program code into predicated instructions for excution on a processor without a program counter

Publication number: 20150012729

Abstract: A predicated instruction compilation system includes a control flow graph generation module to generate a control flow graph of a program code to be compiled into the predicated instructions to be executed on a processor that does not include any program counter. Each of the instructions includes a predicate guard and a predicate update. The compilation system also includes a control flow transformation module to automatically generate the predicate guard and an update to the predicate state on the processor. A computer-implemented method of compiling a program code into predicated instructions is also described.

Type: Application

Filed: July 2, 2013

Publication date: January 8, 2015

Inventor: Arch D. Robison
Fair scalable reader-writer mutual exclusion

Patent number: 8707324

Abstract: Implementing fair scalable reader writer mutual exclusion for access to a critical section by a plurality of processing threads is accomplished by creating a first queue node for a first thread, the first queue node representing a request by the first thread to access the critical section; setting at least one pointer within a queue to point to the first queue node, the queue representing at least one thread desiring access to the critical section; waiting until a condition is met, the condition comprising the first queue node having no preceding write requests as indicated by at least one predecessor queue node on the queue; permitting the first thread to enter the critical section in response to the condition being met; and causing the first thread to release a spin lock, the spin lock acquired by a second thread of the plurality of processing threads.

Type: Grant

Filed: February 27, 2012

Date of Patent: April 22, 2014

Assignee: Intel Corporation

Inventors: Alexey Kukanov, Arch D. Robison
Advance trip count computation in a concurrent processing environment

Patent number: 8468509

Abstract: A method for computing a trip count for a loop in advance of the execution of the loop is provided. The method comprises identifying the elements of a loop; returning infinity, if a first index value satisfies a first condition and that a first step size is equal to zero; modifying the first index value and the first step size, if the first index value satisfies the first condition, when the first step size is not equal to zero, and the first step size is greater than half of a first modulus; returning the result computed by applying a formula that divides the difference between a first condition value and the first index value by the first step size and rounds up to a next integer when there is a non-zero remainder; and returning a second trip count for a second loop based on the elements of the first loop.

Type: Grant

Filed: March 27, 2008

Date of Patent: June 18, 2013

Assignee: Intel Corporation

Inventor: Arch D. Robison
FAIR SCALABLE READER-WRITER MUTUAL EXCLUSION

Publication number: 20120198471

Abstract: Implementing fair scalable reader writer mutual exclusion for access to a critical section by a plurality of processing threads is accomplished by creating a first queue node for a first thread, the first queue node representing a request by the first thread to access the critical section; setting at least one pointer within a queue to point to the first queue node, the queue representing at least one thread desiring access to the critical section; waiting until a condition is met, the condition comprising the first queue node having no preceding write requests as indicated by at least one predecessor queue node on the queue; permitting the first thread to enter the critical section in response to the condition being met; and causing the first thread to release a spin lock, the spin lock acquired by a second thread of the plurality of processing threads.

Type: Application

Filed: February 27, 2012

Publication date: August 2, 2012

Inventors: Alexey Kukanov, Arch D. Robison
Preserving hardware thread cache affinity via procrastination

Patent number: 8108867

Abstract: A method, device, system, and computer readable medium are disclosed. In one embodiment the method includes managing one or more threads attempting to steal task work from one or more other threads. The method will block a thread from stealing a mailed task that is also residing in another thread's task pool. The blocking occurs when the mailed task was mailed to an idle third thread. Additionally, some tasks are deferred instead of immediately spawned.

Type: Grant

Filed: June 24, 2008

Date of Patent: January 31, 2012

Assignee: Intel Corporation

Inventor: Arch D. Robison
Preserving hardware thread cache affinity via procrastination

Publication number: 20090320040

Abstract: A method, device, system, and computer readable medium are disclosed. In one embodiment the method includes managing one or more threads attempting to steal task work from one or more other threads. The method will block a thread from stealing a mailed task that is also residing in another thread's task pool. The blocking occurs when the mailed task was mailed to an idle third thread. Additionally, some tasks are deferred instead of immediately spawned.

Type: Application

Filed: June 24, 2008

Publication date: December 24, 2009

Inventor: Arch D. Robison
Fast tree-based generation of a dependence graph

Patent number: 7624386

Abstract: A dependence graph having a linear number of edges and one or more tie vertices is generated by constructing a tree of nodes, receiving requests to create cut and/or fan vertices corresponding to each node, adjusting a frontier of nodes up or down, and creating one or more cut or fan vertices, zero or more tie vertices, and at least one predecessor edge.

Type: Grant

Filed: December 16, 2004

Date of Patent: November 24, 2009

Assignee: Intel Corporation

Inventor: Arch D. Robison
ADVANCE TRIP COUNT COMPUTATION IN A CONCURRENT PROCESSING ENVIRONMENT

Publication number: 20090248776

Abstract: A method for computing a trip count for a loop in advance of the execution of the loop is provided. The method comprises identifying the elements of a loop; returning infinity, if a first index value satisfies a first condition and that a first step size is equal to zero; modifying the first index value and the first step size, if the first index value satisfies the first condition, when the first step size is not equal to zero, and the first step size is greater than half of a first modulus; returning the result computed by applying a formula that divides the difference between a first condition value and the first index value by the first step size and rounds up to a next integer when there is a non-zero remainder; and returning a second trip count for a second loop based on the elements of the first loop.

Type: Application

Filed: March 27, 2008

Publication date: October 1, 2009

Inventor: Arch D. Robison
Device, system, and method for regulating software lock elision mechanisms

Publication number: 20090125519

Abstract: A method, apparatus and system for, in a computing apparatus, comparing a measure of data contention for a group of operations protected by a lock to a predetermined threshold for data contention, and comparing a measure of lock contention for the group of operations to a predetermined threshold for lock contention, eliding the lock for concurrently executing two or more of the operations of the group using two or more threads when the measure of data contention is approximately less than or equal to the predetermined threshold for data contention and the measure of lock contention is approximately greater than or equal to a predetermined threshold for lock contention, and acquiring the lock for executing two or more of the of operations of the group in a serialized manner when the measure of data contention is approximately greater than or equal to the predetermined threshold for data contention and the measure of lock contention is approximately less than or equal to a predetermined threshold for lock conte

Type: Application

Filed: November 13, 2007

Publication date: May 14, 2009

Inventors: Arch D. Robison, Paul M. Petersen
System and method to reduce the size of source code in a processing system

Patent number: 7257808

Abstract: A system and method to reduce the size of source code in a processing system are described. Multiple subgraph structures are identified within a graph structure constructed for multiple source code instructions in a program. Unifiable variables that are not simultaneously used in the source code instructions are identified within each subgraph structure. Finally, one or more unifiable instructions from a tine of a corresponding subgraph structure are transferred to a handle of the corresponding subgraph structure, each unifiable instruction containing one or more unifiable variables.

Type: Grant

Filed: January 3, 2002

Date of Patent: August 14, 2007

Assignee: Intel Corporation

Inventor: Arch D. Robison

1 2 next