Patents by Inventor Douglas Vanesko

Douglas Vanesko has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240111538
    Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.
    Type: Application
    Filed: November 30, 2023
    Publication date: April 4, 2024
    Inventors: Douglas Vanesko, Tony M. Brewer
  • Publication number: 20240070112
    Abstract: Devices and techniques for loading contexts in a coarse-grained reconfigurable array processor are described herein. A system or apparatus may include context load circuitry operable to load context for a coarse-grained reconfigurable array processor, where the context load circuitry is configured to: (a) receive a kernel identifier; (b) access a first registry to obtain a context mask base address; (c) determine a context mask address from the context mask base address and the kernel identifier; (d) access a second registry to obtain a context state base address; (e) determine a context state address from the context state base address and the kernel identifier; (f) use a context mask at the context mask address to determine corresponding active context state; and (g) load the corresponding active context state into the coarse-grained reconfigurable array processor.
    Type: Application
    Filed: August 31, 2022
    Publication date: February 29, 2024
    Inventors: Bryan Hornung, Douglas Vanesko, David Patrick
  • Patent number: 11907718
    Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: February 20, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Bryan Hornung, Patrick Estep
  • Patent number: 11861366
    Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: January 2, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Tony M. Brewer
  • Patent number: 11802957
    Abstract: A synthetic-aperture radar (SAR) antenna emits radar pulses and receives their reflections. SAR is typically used on a moving platform, such as an aircraft, drone, or spacecraft. Since the position of the antenna changes between the time of emitting a radar pulse and receiving the reflection of the pulse, the synthetic aperture of the radar is increased, giving greater accuracy for a same (physical) sized radar over conventional beam-scanning radar. The pulse data is processed, using a backprojection algorithm, to generate a two-dimensional image that can be used for navigation. The order in which the SAR data is processed can impact the likelihood of cache hits in accessing the data. Since accessing data from cache instead of memory storage reduces both access time and power consumption, devices that access more data from cache have greater battery life and range.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: October 31, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Patrick Estep, Tony M. Brewer, Bryan Hornung, Douglas Vanesko
  • Patent number: 11789642
    Abstract: A dispatch element interfaces with a host processor and dispatches threads to one or more tiles of a hybrid threading fabric. Data structures in memory to be used by a tile may be identified by a starting address and a size, included as parameters provided by the host. The dispatch element sends a command to a memory interface to transfer the identified data to the tile that will use the data. Thus, when the tile begins processing the thread, the data is already available in local memory of the tile and does not need to be accessed from the memory controller. Data may be transferred by the dispatch element while the tile is performing operations for another thread, increasing the percentage of operations performed by the tile that are performing useful work and reducing the percentage that are merely retrieving data.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: October 17, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Bryan Hornung, Tony M. Brewer
  • Patent number: 11709796
    Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: July 25, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Douglas Vanesko
  • Patent number: 11675588
    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: June 13, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Tony M. Brewer, Gongyu Wang
  • Publication number: 20230067771
    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. A first tile in a first node can include a processor with a processor output and a first register network configured to receive information from the processor output and information from one or more of the multiple other tiles in the first node. In response to an output instruction and a delay instruction, the register network can provide an output signal to one of the multiple other tiles in the first node. Based on the output instruction, the output signal can include one or the other of the information from the processor output and the information from one or more of the multiple other tiles in the first node. A timing characteristic of the output signal can depend on the delay instruction.
    Type: Application
    Filed: August 20, 2021
    Publication date: March 2, 2023
    Inventors: Douglas Vanesko, Tony M. Brewer, Gongyu Wang
  • Publication number: 20230055320
    Abstract: Various examples are directed to systems and methods for performing operations in a reconfigurable compute fabric. A dispatch interface may send a first asynchronous message to a first flow controller of a first synchronous flow. The first asynchronous message may instruct the first flow controller to begin execution of a first-level loop. The first synchronous flow may send a second asynchronous message to a second flow controller of a second synchronous flow. The second asynchronous message may instruct the second flow controller to execute a second-level loop. The first flow controller may receive a third asynchronous message indicating that the second-level loop has completed and that a synchronous flow thread is free for executing a next iteration of the first-level loop.
    Type: Application
    Filed: August 16, 2021
    Publication date: February 23, 2023
    Inventors: Douglas Vanesko, Bryan Hornung
  • Publication number: 20230056246
    Abstract: A first set of multiple coordinate data structure elements describing non-zero values of an input matrix may be loaded to a compute element. A first set of input vector values having input vector row numbers corresponding to input matrix column numbers of the first set of multiple coordinate data structure elements may also be loaded to the compute element. Multiple parallel processing lanes of the compute element may be used to update multiple partial accumulation values, where each partial accumulation value corresponds to an output vector row and one of the multiple parallel processing lanes. At least a portion of the partial accumulation values corresponding to the first input matrix row may be summed across at least a portion of the parallel processing lanes to generate a first output vector row value.
    Type: Application
    Filed: August 3, 2021
    Publication date: February 23, 2023
    Inventors: Skyler Arron Windh, Douglas Vanesko
  • Publication number: 20230050687
    Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.
    Type: Application
    Filed: August 16, 2021
    Publication date: February 16, 2023
    Inventors: Bryan Hornung, Douglas Vanesko
  • Publication number: 20230051544
    Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.
    Type: Application
    Filed: August 11, 2021
    Publication date: February 16, 2023
    Inventors: Douglas Vanesko, Tony M. Brewer
  • Publication number: 20220413742
    Abstract: A dispatch element interfaces with a host processor and dispatches threads to one or more tiles of a hybrid threading fabric. Data structures in memory to be used by a tile may be identified by a starting address and a size, included as parameters provided by the host. The dispatch element sends a command to a memory interface to transfer the identified data to the tile that will use the data. Thus, when the tile begins processing the thread, the data is already available in local memory of the tile and does not need to be accessed from the memory controller. Data may be transferred by the dispatch element while the tile is performing operations for another thread, increasing the percentage of operations performed by the tile that are performing useful work and reducing the percentage that are merely retrieving data.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Inventors: Douglas Vanesko, Bryan Hornung, Tony M. Brewer
  • Publication number: 20220413804
    Abstract: Two commands each perform a partial complex multiply and accumulate. By using these two commands together, a full complex multiply and accumulate operation is performed. As compared to traditional implementations, this reduces the number of commands used from eight (four multiplies, a subtraction and three adds) to two. In some example embodiments, a single-instruction/multiple-data (SIMD) architecture is used to enable each command to perform multiple partial complex multiply and accumulate operations simultaneously, further increasing efficiency. One application of a complex multiply and accumulate is in generating images from pulse data of a radar or lidar. For example, an image may be generated from a synthetic aperture radar (SAR) on an autonomous vehicle (e.g., a drone). The image may be provided to a trained machine learning model that generates an output. Based on the output, inputs to control circuits of the autonomous vehicle are generated.
    Type: Application
    Filed: June 28, 2021
    Publication date: December 29, 2022
    Inventors: Douglas Vanesko, Bryan Hornung
  • Publication number: 20220317972
    Abstract: Devices and techniques for hardware for concurrent SINE and cosine determination are described herein. A first sequence of bits representing an angle of a line from an origin to a unit circle can be obtained. A quadrant of the unit circle for the line is determined and the two least significant bits of the first sequence of bits is replaced with an encoding for the quadrant, the angle is translated to a base quadrant angle and sin and cosine operations are performed on a portion of a second sequence of bits (derived from the first sequence of bits) to create intermediate sin and cosine solutions in the base quadrant. The quadrant encoding in the first sequence of bits is then used to create a final sin and cosine solutions in the quadrant from the intermediate solutions.
    Type: Application
    Filed: August 18, 2021
    Publication date: October 6, 2022
    Inventors: Douglas Vanesko, Tony M. Brewer, Bryan Hornung, Patrick Estep
  • Publication number: 20220317283
    Abstract: A synthetic-aperture radar (SAR) antenna emits radar pulses and receives their reflections. SAR is typically used on a moving platform, such as an aircraft, drone, or spacecraft. Since the position of the antenna changes between the time of emitting a radar pulse and receiving the reflection of the pulse, the synthetic aperture of the radar is increased, giving greater accuracy for a same (physical) sized radar over conventional beam-scanning radar. The pulse data is processed, using a backprojection algorithm, to generate a two-dimensional image that can be used for navigation. The order in which the SAR data is processed can impact the likelihood of cache hits in accessing the data. Since accessing data from cache instead of memory storage reduces both access time and power consumption, devices that access more data from cache have greater battery life and range.
    Type: Application
    Filed: April 26, 2021
    Publication date: October 6, 2022
    Inventors: Patrick Estep, Tony M. Brewer, Bryan Hornung, Douglas Vanesko
  • Publication number: 20220318162
    Abstract: Linear interpolation is performed within a memory system. The memory system receives a floating-point point index into an integer-indexed memory array. The memory system accesses the two values of the two adjacent integer indices, performs the linear interpolation, and provides the resulting interpolated value. In many system architectures, the critical limitation on system performance is the data transfer rate between memory and processing elements. Accordingly, reducing the amount of data transferred improves overall system performance and reduces power consumption.
    Type: Application
    Filed: April 26, 2021
    Publication date: October 6, 2022
    Inventors: Bryan Hornung, Tony M. Brewer, Douglas Vanesko, Patrick Estep
  • Publication number: 20220206804
    Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
    Type: Application
    Filed: August 18, 2021
    Publication date: June 30, 2022
    Inventors: Douglas Vanesko, Bryan Hornung, Patrick Estep