Patents by Inventor Skyler Arron Windh

Skyler Arron Windh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250021317
    Abstract: Devices and techniques for parallelizing loops that have loop-dependent variables are described herein. A system includes a processing device; and a memory device configured to store instructions, which when executed by the processing device, cause the processing device to perform operations comprising: accessing, by a compiler executing on a processing device, a computer code listing; determining that the computer code listing includes a loop with a loop-carried dependency variable; optimizing the loop for parallel execution by removing the loop-carried dependency variable; and compiling the computer code listing into executable software code with the loop executable in parallel in hardware.
    Type: Application
    Filed: July 10, 2024
    Publication date: January 16, 2025
    Inventors: Bashar Romanous, Skyler Arron Windh, Patrick Estep
  • Publication number: 20240362024
    Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.
    Type: Application
    Filed: July 11, 2024
    Publication date: October 31, 2024
    Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
  • Publication number: 20240354121
    Abstract: An exploration tool of a design space of configurations to execute a data flow program using circuit tiles of a coarse grained reconfigurable array. The tool can identify different configurations for the program and determine performance metrics of the configurations. A user of the tool can provide one or more criteria in a request to the tool; and in response, the tool can identify, from the different configurations and based on the one or more criteria applied to the performance metrics, a first configuration of executing the program on the coarse grained reconfigurable array. For example, the tool can use a toolchain to generate the configurations and use a simulator to run simulations of executions of the program according to the configurations. The tool can compare attributes determined by the toolchain and the simulator for consistency in detecting errors or defects in the toolchain and the simulator.
    Type: Application
    Filed: February 28, 2024
    Publication date: October 24, 2024
    Inventors: Bashar Romanous, Patrick Alan Estep, Skyler Arron Windh
  • Patent number: 12039335
    Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.
    Type: Grant
    Filed: March 25, 2022
    Date of Patent: July 16, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
  • Patent number: 11829758
    Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.
    Type: Grant
    Filed: March 13, 2023
    Date of Patent: November 28, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Gongyu Wang
  • Patent number: 11815935
    Abstract: An assembly language program for a coarse grained reconfiguration array (CGRA), having dispatch interface information indicating operations to be performed via a dispatch interface of the CGRA to receive an input, memory interface information indicating operations to be performed via one or more memory interfaces of the CGRA, tile memory information indicating memory variables referring to memory locations to be implemented in tile memories of the CGRA, a flow description specifying one or more synchronous data flows, through the memory locations referenced via the memory variables in the tile memory information, to produce a result from the input using the CGRA.
    Type: Grant
    Filed: March 25, 2022
    Date of Patent: November 14, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Allan Kennedy Porterfield, Douglas John Vanesko, Randall Paul Meyer, Patrick Alan Estep, Bashar Romanous
  • Patent number: 11789790
    Abstract: Devices and techniques for triggering early termination of cooperating processes in a processor are described herein. A system includes multiple memory-compute nodes, wherein a memory-compute node comprises: event manager circuitry configured to establish a broadcast channel to receive event messages; and thread manager circuitry configured to organize a plurality of threads to perform portions of a cooperative task, wherein the plurality of threads each monitor the broadcast channel to receive event messages on the broadcast channel, and wherein upon achieving a threshold operation, the thread manager circuitry is to use the event manager circuitry to broadcast, on the broadcast channel, an event message indicating that the cooperative task is complete, causing other threads, in response to receiving the event message, to terminate execution of their respective portions of the cooperative task.
    Type: Grant
    Filed: November 10, 2022
    Date of Patent: October 17, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Patrick Estep, Skyler Arron Windh, Tony M. Brewer
  • Patent number: 11782725
    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Skyler Arron Windh
  • Publication number: 20230315415
    Abstract: An assembly language program for a coarse grained reconfiguration array (CGRA), having dispatch interface information indicating operations to be performed via a dispatch interface of the CGRA to receive an input, memory interface information indicating operations to be performed via one or more memory interfaces of the CGRA, tile memory information indicating memory variables referring to memory locations to be implemented in tile memories of the CGRA, a flow description specifying one or more synchronous data flows, through the memory locations referenced via the memory variables in the tile memory information, to produce a result from the input using the CGRA.
    Type: Application
    Filed: March 25, 2022
    Publication date: October 5, 2023
    Inventors: Skyler Arron Windh, Allan Kennedy Porterfield, Douglas John Vanesko, Randall Paul Meyer, Patrick Alan Estep, Bashar Romanous
  • Publication number: 20230305842
    Abstract: Control a coarse grained reconfigurable array during execution of an assembly language program identifying data flows through memory locations represented by memory variables. For example, a lowering program can be configured to receive the assembly language program, a hardware profile of the coarse grained reconfigurable array, and an instruction execution schedule to generate a configuration usable to control the coarse grained reconfigurable array. The lowering program can identify tile memories used to implement the memory locations represented by the memory variables in the assembly language program, and trace the data flows specified in the assembly language program. Using timing of instruction execution identified in the schedule, the lowering program can determine timing and controls for the dispatch interface, memory interfaces, and internal connections within tiles of the coarse grained reconfigurable array during execution of the assembly language program.
    Type: Application
    Filed: March 25, 2022
    Publication date: September 28, 2023
    Inventors: Skyler Arron Windh, Douglas John Vanesko
  • Publication number: 20230306272
    Abstract: An artificial neural network is trained via reinforcement learning to receive first data representative of execution dependency conditions of instructions of a program, second data representative of a schedule of a first portion of the instructions of the program for execution in a device having a plurality of circuits units operable in parallel, and third data identifying a next instruction selected from a second portion of the instructions of the program remaining to be scheduled for execution in the device. The artificial neural network selects a placement of the next instruction in one of the circuit units from a plurality of possible placements of the next instruction in the device. Performance of placements of instructions being tested in search for a valid schedule for running the program in the device can be measured to generate samples to train the artificial neural network via reinforcement learning.
    Type: Application
    Filed: March 16, 2023
    Publication date: September 28, 2023
    Inventors: Andre Xian Ming Chang, Abhishek Chaurasia, Parth Khopkar, Bashar Romanous, Patrick Alan Estep, Skyler Arron Windh, Eugenio Culurciello, Sheik Dawood Beer Mohideen
  • Publication number: 20230305848
    Abstract: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.
    Type: Application
    Filed: March 25, 2022
    Publication date: September 28, 2023
    Inventors: Allan Kennedy Porterfield, Skyler Arron Windh, Bashar Romanous
  • Patent number: 11720475
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Grant
    Filed: November 21, 2022
    Date of Patent: August 8, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11698853
    Abstract: Latency in a node-based compute-near-memory system can be problematic. A solution to the problem can include or use a dedicated software-based cache at each node. The cache can be configured to store information received from each of the other nodes in the system. In an example, the cache can be populated during a breadth first search algorithm to store frontier information from each of the other nodes.
    Type: Grant
    Filed: June 29, 2021
    Date of Patent: July 11, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Randall Meyer
  • Publication number: 20230214219
    Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.
    Type: Application
    Filed: March 13, 2023
    Publication date: July 6, 2023
    Inventors: Skyler Arron Windh, Gongyu Wang
  • Publication number: 20230079727
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Application
    Filed: November 21, 2022
    Publication date: March 16, 2023
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11604650
    Abstract: Disclosed in some examples, are systems, methods, devices, and machine readable mediums which use improved dynamic programming algorithms to pack conditional branch instructions. Conditional code branches may be modeled as directed acyclic graphs (DAGs) which have a topological ordering. These DAGs may be used to construct a dynamic programming table to find a partial mapping of one path onto the other path using dynamic programming algorithms.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: March 14, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Gongyu Wang
  • Publication number: 20230074452
    Abstract: Devices and techniques for triggering early termination of cooperating processes in a processor are described herein. A system includes multiple memory-compute nodes, wherein a memory-compute node comprises: event manager circuitry configured to establish a broadcast channel to receive event messages; and thread manager circuitry configured to organize a plurality of threads to perform portions of a cooperative task, wherein the plurality of threads each monitor the broadcast channel to receive event messages on the broadcast channel, and wherein upon achieving a threshold operation, the thread manager circuitry is to use the event manager circuitry to broadcast, on the broadcast channel, an event message indicating that the cooperative task is complete, causing other threads, in response to receiving the event message, to terminate execution of their respective portions of the cooperative task.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 9, 2023
    Inventors: Patrick Estep, Skyler Arron Windh, Tony M. Brewer
  • Publication number: 20230056246
    Abstract: A first set of multiple coordinate data structure elements describing non-zero values of an input matrix may be loaded to a compute element. A first set of input vector values having input vector row numbers corresponding to input matrix column numbers of the first set of multiple coordinate data structure elements may also be loaded to the compute element. Multiple parallel processing lanes of the compute element may be used to update multiple partial accumulation values, where each partial accumulation value corresponds to an output vector row and one of the multiple parallel processing lanes. At least a portion of the partial accumulation values corresponding to the first input matrix row may be summed across at least a portion of the parallel processing lanes to generate a first output vector row value.
    Type: Application
    Filed: August 3, 2021
    Publication date: February 23, 2023
    Inventors: Skyler Arron Windh, Douglas Vanesko
  • Publication number: 20230058935
    Abstract: A hybrid threading processor (HTP) supports thread creation by executing an instruction that indicates an amount of storage space to reserve for return values. Before a thread is created, the indicated amount of space is reserved. The newly created child thread sends a return packet back to the parent thread when the child thread completes. The thread writes its return information into the reserved space and waits for the parent thread to execute a thread join instruction. The thread join instruction takes the returned information from the reserved space and transfers it to the parent thread's register state. The reserved space is released once the child thread is joined. Using a configurable amount of space for each child thread may allow for more child threads to be executed simultaneously.
    Type: Application
    Filed: August 18, 2021
    Publication date: February 23, 2023
    Inventors: Tony Brewer, Patrick Estep, Skyler Arron Windh