Patents by Inventor Steven K. Reinhardt

Steven K. Reinhardt has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Network interface controller-based scheduling of processing tasks in a distributed computing system

Patent number: 10963309

Abstract: Techniques for scheduling processing tasks in a device having multiple computing elements are disclosed. A network interface controller of the device receives processing tasks, for execution on the computing elements, from a network that is external to the device. The network interface controller schedules the tasks for execution on the computing devices based on policy data available to the network interface controller. A scheduler within the network interface controller, which can be implemented as a standalone processing unit (such as a microcontroller, a programmable processing core, or an application specific integrated circuit), performs such scheduling, thereby freeing the central processing unit of the device from the burden of performing scheduling operations. The scheduler schedules the tasks according to any technically feasible scheduling technique.

Type: Grant

Filed: September 16, 2016

Date of Patent: March 30, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Abhisek Pan, Steven K. Reinhardt
GPU remote communication with triggered operations

Patent number: 10936533

Abstract: Methods, devices, and systems for transmitting data over a computer communications network are disclosed. A queue of communications commands can be pre-generated using a central processing unit (CPU) and stored in a device memory of a network interface controller (NIC). Thereafter, if a graphics processing unit (GPU) has data to communicate to a remote GPU, it can store the data in a send buffer, where the location in the buffer is pointed to by a pre-generated command. The GPU can then signal to the interface device that the data is ready, triggering execution of the pre-generated command to send the data.

Type: Grant

Filed: October 18, 2016

Date of Patent: March 2, 2021

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Michael W. LeBeane, Steven K. Reinhardt
HARDWARE ACCELERATED NEURAL NETWORK SUBGRAPHS

Publication number: 20190286972

Abstract: Technology related to hardware accelerated neural network subgraphs is disclosed. In one example of the disclosed technology, a method for compiling a neural network model is disclosed. The method includes identifying a subgraph of the neural network model to partition from the neural network model. An interface can be inserted between the neural network model and a partitioned version of the identified subgraph. The partitioned version can be adapted to be evaluated with a neural network accelerator. The identified subgraph can be compiled to the neural network accelerator to generate configuration information for the neural network accelerator. The neural network accelerator can be configured with the configuration information to provide an accelerated version of the subgraph.

Type: Application

Filed: May 4, 2018

Publication date: September 19, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ahmad Mahdi El Husseini, Christian Boehn, Friedel van Megen, Amanda Grace Rapsang, Steven K. Reinhardt
HARDWARE ACCELERATED NEURAL NETWORK SUBGRAPHS

Publication number: 20190286973

Abstract: Technology related to hardware accelerated neural network subgraphs is disclosed. In one example of the disclosed technology, a method includes receiving source code specifying a neural network model. The source code includes an application programming interface (API) marking a subgraph of the neural network model as targeted for hardware acceleration. The method includes compiling the subgraph to the neural network accelerator target to generate configuration information for the hardware accelerator. The method includes configuring the hardware accelerator to evaluate the neural network model, where the hardware accelerator is configured using the configuration information.

Type: Application

Filed: May 4, 2018

Publication date: September 19, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ratna Kumar Kovvuri, Ahmad Mahdi El Husseini, Steven K. Reinhardt, Daniel Lo, Eric S. Chung, Sarabjit Singh Seera, Friedel van Megen, Alessandro Forin
Wavefront resource virtualization

Patent number: 10360652

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Grant

Filed: June 13, 2014

Date of Patent: July 23, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Message aggregation, combining and compression for efficient data communications in GPU-based clusters

Patent number: 10320695

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Grant

Filed: May 26, 2016

Date of Patent: June 11, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Conditional atomic operations in single instruction multiple data processors

Patent number: 10209990

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

Type: Grant

Filed: June 2, 2015

Date of Patent: February 19, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
GPU REMOTE COMMUNICATION WITH TRIGGERED OPERATIONS

Publication number: 20180107627

Abstract: Methods, devices, and systems for transmitting data over a computer communications network are disclosed. A queue of communications commands can be pre-generated using a central processing unit (CPU) and stored in a device memory of a network interface controller (NIC). Thereafter, if a graphics processing unit (GPU) has data to communicate to a remote GPU, it can store the data in a send buffer, where the location in the buffer is pointed to by a pre-generated command. The GPU can then signal to the interface device that the data is ready, triggering execution of the pre-generated command to send the data.

Type: Application

Filed: October 18, 2016

Publication date: April 19, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Steven K. Reinhardt
NETWORK INTERFACE CONTROLLER-BASED SCHEDULING OF PROCESSING TASKS IN A DISTRIBUTED COMPUTING SYSTEM

Publication number: 20180081715

Abstract: Techniques for scheduling processing tasks in a device having multiple computing elements are disclosed. A network interface controller of the device receives processing tasks, for execution on the computing elements, from a network that is external to the device. The network interface controller schedules the tasks for execution on the computing devices based on policy data available to the network interface controller. A scheduler within the network interface controller, which can be implemented as a standalone processing unit (such as a microcontroller, a programmable processing core, or an application specific integrated circuit), performs such scheduling, thereby freeing the central processing unit of the device from the burden of performing scheduling operations. The scheduler schedules the tasks according to any technically feasible scheduling technique.

Type: Application

Filed: September 16, 2016

Publication date: March 22, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael W. LeBeane, Abhisek Pan, Steven K. Reinhardt
METHOD AND APPARATUS FOR TIME-BASED SCHEDULING OF TASKS

Publication number: 20170161114

Abstract: A computing device is disclosed. The computing device includes an Accelerated Processing Unit (APU) including at least a first Heterogeneous System Architecture (HSA) computing device and at least a second HSA computing device, the second computing device being a different type than the first computing device, and an HSA Memory Management Unit (HMMU) allowing the APU to communicate with at least one memory. The computing task is enqueued on an HSA-managed queue that is set to run on the at least first HSA computing device or the at least second HSA computing device. The computing task is re-enqueued on the HSA-managed queue based on a repetition flag that triggers the number of times the computing task is re-enqueued. The repetition field is decremented each time the computing task is re-enqueued. The repetition field may include a special value (e.g., ?1) to allow re-enqueuing of the computing task indefinitely.

Type: Application

Filed: December 8, 2015

Publication date: June 8, 2017

Applicant: Advanced Micro Devices, Inc.

Inventors: Walter B. Benton, Steven K. Reinhardt
Remote task queuing by networked computing devices

Patent number: 9582402

Abstract: The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.

Type: Grant

Filed: January 26, 2014

Date of Patent: February 28, 2017

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Michael L. Chu, Vinod Tipparaju, Walter B. Benton
CONDITIONAL ATOMIC OPERATIONS AT A PROCESSOR

Publication number: 20160357551

Abstract: A conditional fetch-and-phi operation tests a memory location to determine if the memory locations stores a specified value and, if so, modifies the value at the memory location. The conditional fetch-and-phi operation can be implemented so that it can be concurrently executed by a plurality of concurrently executing threads, such as the threads of wavefront at a GPU. To execute the conditional fetch-and-phi operation, one of the concurrently executing threads is selected to execute a compare-and-swap (CAS) operation at the memory location, while the other threads await the results. The CAS operation tests the value at the memory location and, if the CAS operation is successful, the value is passed to each of the concurrently executing threads.

Type: Application

Filed: June 2, 2015

Publication date: December 8, 2016

Inventors: David A. Wood, Steven K. Reinhardt, Bradford M. Beckmann, Marc S. Orr
MESSAGE AGGREGATION, COMBINING AND COMPRESSION FOR EFFICIENT DATA COMMUNICATIONS IN GPU-BASED CLUSTERS

Publication number: 20160352598

Abstract: A system and method for efficient management of network traffic management of highly data parallel computing. A processing node includes one or more processors capable of generating network messages. A network interface is used to receive and send network messages across a network. The processing node reduces at least one of a number or a storage size of the original network messages into one or more new network messages. The new network messages are sent to the network interface to send across the network.

Type: Application

Filed: May 26, 2016

Publication date: December 1, 2016

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann, Shuai Che, David A. Wood
Conditional notification mechanism

Patent number: 9411663

Abstract: The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.

Type: Grant

Filed: March 1, 2013

Date of Patent: August 9, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Method for memory consistency among heterogeneous computer components

Patent number: 9361118

Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model. For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.

Type: Grant

Filed: May 12, 2014

Date of Patent: June 7, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Derek R. Hower, Mark D. Hill, David Wood, Steven K. Reinhardt, Benedict R. Gaster, Blake A. Hechtman, Bradford M. Beckmann
Simulating vector execution

Patent number: 9342334

Abstract: A system and method for simulating new instructions without compiler support for the new instructions. A simulator detects a given region in code generated by a compiler. The given region may be a candidate for vectorization or may be a region already vectorized. In response to the detection, the simulator suspends execution of a time-based simulation. The simulator then serially executes the region for at least two iterations using a functional-based simulation and using instructions with operands which correspond to P or less lanes of single-instruction-multiple-data (SIMD) execution. The value P is a maximum number of lanes of SIMD exection supported both by the compiler. The simulator stores checkpoint state during the serial execution. In response to determining no inter-iteration memory dependencies exist, the simulator returns to the time-based simulation and resumes execution using N-wide vector instructions.

Type: Grant

Filed: June 22, 2012

Date of Patent: May 17, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford M. Beckmann, Nilay Vaish, Steven K. Reinhardt
Conditional notification mechanism

Patent number: 9256535

Abstract: The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.

Type: Grant

Filed: April 4, 2013

Date of Patent: February 9, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Steven K. Reinhardt, Marc S. Orr, Bradford M. Beckmann
Wavefront Resource Virtualization

Publication number: 20150363903

Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.

Type: Application

Filed: June 13, 2014

Publication date: December 17, 2015

Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
Message broadcast with router bypassing

Patent number: 9015448

Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.

Type: Grant

Filed: June 17, 2010

Date of Patent: April 21, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS

Publication number: 20140337587

Abstract: A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model . For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.

Type: Application

Filed: May 12, 2014

Publication date: November 13, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Derek R. HOWER, Mark D. Hill, David Wood, Steven K. Reinhardt, Benedict R. Gaster, Blake A. Hechtman, Bradford M. Beckmann

1 2 3 next