Patents by Inventor Skyler Arron Windh
Skyler Arron Windh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250021317Abstract: Devices and techniques for parallelizing loops that have loop-dependent variables are described herein. A system includes a processing device; and a memory device configured to store instructions, which when executed by the processing device, cause the processing device to perform operations comprising: accessing, by a compiler executing on a processing device, a computer code listing; determining that the computer code listing includes a loop with a loop-carried dependency variable; optimizing the loop for parallel execution by removing the loop-carried dependency variable; and compiling the computer code listing into executable software code with the loop executable in parallel in hardware.Type: ApplicationFiled: July 10, 2024Publication date: January 16, 2025Inventors: Bashar Romanous, Skyler Arron Windh, Patrick Estep
-
Publication number: 20240362024Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.Type: ApplicationFiled: July 11, 2024Publication date: October 31, 2024Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
-
Publication number: 20240354121Abstract: An exploration tool of a design space of configurations to execute a data flow program using circuit tiles of a coarse grained reconfigurable array. The tool can identify different configurations for the program and determine performance metrics of the configurations. A user of the tool can provide one or more criteria in a request to the tool; and in response, the tool can identify, from the different configurations and based on the one or more criteria applied to the performance metrics, a first configuration of executing the program on the coarse grained reconfigurable array. For example, the tool can use a toolchain to generate the configurations and use a simulator to run simulations of executions of the program according to the configurations. The tool can compare attributes determined by the toolchain and the simulator for consistency in detecting errors or defects in the toolchain and the simulator.Type: ApplicationFiled: February 28, 2024Publication date: October 24, 2024Inventors: Bashar Romanous, Patrick Alan Estep, Skyler Arron Windh
-
Patent number: 12039335Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.Type: GrantFiled: March 25, 2022Date of Patent: July 16, 2024Assignee: Micron Technology, Inc.Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
-
Patent number: 11829758Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.Type: GrantFiled: March 13, 2023Date of Patent: November 28, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Gongyu Wang
-
Patent number: 11815935Abstract: An assembly language program for a coarse grained reconfiguration array (CGRA), having dispatch interface information indicating operations to be performed via a dispatch interface of the CGRA to receive an input, memory interface information indicating operations to be performed via one or more memory interfaces of the CGRA, tile memory information indicating memory variables referring to memory locations to be implemented in tile memories of the CGRA, a flow description specifying one or more synchronous data flows, through the memory locations referenced via the memory variables in the tile memory information, to produce a result from the input using the CGRA.Type: GrantFiled: March 25, 2022Date of Patent: November 14, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Allan Kennedy Porterfield, Douglas John Vanesko, Randall Paul Meyer, Patrick Alan Estep, Bashar Romanous
-
Patent number: 11789790Abstract: Devices and techniques for triggering early termination of cooperating processes in a processor are described herein. A system includes multiple memory-compute nodes, wherein a memory-compute node comprises: event manager circuitry configured to establish a broadcast channel to receive event messages; and thread manager circuitry configured to organize a plurality of threads to perform portions of a cooperative task, wherein the plurality of threads each monitor the broadcast channel to receive event messages on the broadcast channel, and wherein upon achieving a threshold operation, the thread manager circuitry is to use the event manager circuitry to broadcast, on the broadcast channel, an event message indicating that the cooperative task is complete, causing other threads, in response to receiving the event message, to terminate execution of their respective portions of the cooperative task.Type: GrantFiled: November 10, 2022Date of Patent: October 17, 2023Assignee: Micron Technology, Inc.Inventors: Patrick Estep, Skyler Arron Windh, Tony M. Brewer
-
Patent number: 11782725Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.Type: GrantFiled: August 16, 2021Date of Patent: October 10, 2023Assignee: Micron Technology, Inc.Inventors: Bryan Hornung, Skyler Arron Windh
-
Publication number: 20230315415Abstract: An assembly language program for a coarse grained reconfiguration array (CGRA), having dispatch interface information indicating operations to be performed via a dispatch interface of the CGRA to receive an input, memory interface information indicating operations to be performed via one or more memory interfaces of the CGRA, tile memory information indicating memory variables referring to memory locations to be implemented in tile memories of the CGRA, a flow description specifying one or more synchronous data flows, through the memory locations referenced via the memory variables in the tile memory information, to produce a result from the input using the CGRA.Type: ApplicationFiled: March 25, 2022Publication date: October 5, 2023Inventors: Skyler Arron Windh, Allan Kennedy Porterfield, Douglas John Vanesko, Randall Paul Meyer, Patrick Alan Estep, Bashar Romanous
-
Publication number: 20230305842Abstract: Control a coarse grained reconfigurable array during execution of an assembly language program identifying data flows through memory locations represented by memory variables. For example, a lowering program can be configured to receive the assembly language program, a hardware profile of the coarse grained reconfigurable array, and an instruction execution schedule to generate a configuration usable to control the coarse grained reconfigurable array. The lowering program can identify tile memories used to implement the memory locations represented by the memory variables in the assembly language program, and trace the data flows specified in the assembly language program. Using timing of instruction execution identified in the schedule, the lowering program can determine timing and controls for the dispatch interface, memory interfaces, and internal connections within tiles of the coarse grained reconfigurable array during execution of the assembly language program.Type: ApplicationFiled: March 25, 2022Publication date: September 28, 2023Inventors: Skyler Arron Windh, Douglas John Vanesko
-
Publication number: 20230306272Abstract: An artificial neural network is trained via reinforcement learning to receive first data representative of execution dependency conditions of instructions of a program, second data representative of a schedule of a first portion of the instructions of the program for execution in a device having a plurality of circuits units operable in parallel, and third data identifying a next instruction selected from a second portion of the instructions of the program remaining to be scheduled for execution in the device. The artificial neural network selects a placement of the next instruction in one of the circuit units from a plurality of possible placements of the next instruction in the device. Performance of placements of instructions being tested in search for a valid schedule for running the program in the device can be measured to generate samples to train the artificial neural network via reinforcement learning.Type: ApplicationFiled: March 16, 2023Publication date: September 28, 2023Inventors: Andre Xian Ming Chang, Abhishek Chaurasia, Parth Khopkar, Bashar Romanous, Patrick Alan Estep, Skyler Arron Windh, Eugenio Culurciello, Sheik Dawood Beer Mohideen
-
Publication number: 20230305848Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.Type: ApplicationFiled: March 25, 2022Publication date: September 28, 2023Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
-
Patent number: 11720475Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: GrantFiled: November 21, 2022Date of Patent: August 8, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11698853Abstract: Latency in a node-based compute-near-memory system can be problematic. A solution to the problem can include or use a dedicated software-based cache at each node. The cache can be configured to store information received from each of the other nodes in the system. In an example, the cache can be populated during a breadth first search algorithm to store frontier information from each of the other nodes.Type: GrantFiled: June 29, 2021Date of Patent: July 11, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Randall Meyer
-
Publication number: 20230214219Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.Type: ApplicationFiled: March 13, 2023Publication date: July 6, 2023Inventors: Skyler Arron Windh, Gongyu Wang
-
Publication number: 20230079727Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: ApplicationFiled: November 21, 2022Publication date: March 16, 2023Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11604650Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.Type: GrantFiled: August 11, 2021Date of Patent: March 14, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Gongyu Wang
-
Publication number: 20230074452Abstract: Devices and techniques for triggering early termination of cooperating processes in a processor are described herein. A system includes multiple memory-compute nodes, wherein a memory-compute node comprises: event manager circuitry configured to establish a broadcast channel to receive event messages; and thread manager circuitry configured to organize a plurality of threads to perform portions of a cooperative task, wherein the plurality of threads each monitor the broadcast channel to receive event messages on the broadcast channel, and wherein upon achieving a threshold operation, the thread manager circuitry is to use the event manager circuitry to broadcast, on the broadcast channel, an event message indicating that the cooperative task is complete, causing other threads, in response to receiving the event message, to terminate execution of their respective portions of the cooperative task.Type: ApplicationFiled: November 10, 2022Publication date: March 9, 2023Inventors: Patrick Estep, Skyler Arron Windh, Tony M. Brewer
-
Publication number: 20230056246Abstract: A first set of multiple coordinate data structure elements describing non-zero values of an input matrix may be loaded to a compute element. A first set of input vector values having input vector row numbers corresponding to input matrix column numbers of the first set of multiple coordinate data structure elements may also be loaded to the compute element. Multiple parallel processing lanes of the compute element may be used to update multiple partial accumulation values, where each partial accumulation value corresponds to an output vector row and one of the multiple parallel processing lanes. At least a portion of the partial accumulation values corresponding to the first input matrix row may be summed across at least a portion of the parallel processing lanes to generate a first output vector row value.Type: ApplicationFiled: August 3, 2021Publication date: February 23, 2023Inventors: Skyler Arron Windh, Douglas Vanesko
-
Publication number: 20230058935Abstract: A hybrid threading processor (HTP) supports thread creation by executing an instruction that indicates an amount of storage space to reserve for return values. Before a thread is created, the indicated amount of space is reserved. The newly created child thread sends a return packet back to the parent thread when the child thread completes. The thread writes its return information into the reserved space and waits for the parent thread to execute a thread join instruction. The thread join instruction takes the returned information from the reserved space and transfers it to the parent thread's register state. The reserved space is released once the child thread is joined. Using a configurable amount of space for each child thread may allow for more child threads to be executed simultaneously.Type: ApplicationFiled: August 18, 2021Publication date: February 23, 2023Inventors: Tony Brewer, Patrick Estep, Skyler Arron Windh