Patents Examined by Shawn Doman
-
Patent number: 12118397Abstract: The present disclosure describes techniques for accelerating data processing by offloading thread computation. An application may be started based on creating and executing a process by a host, the process associated with a plurality of threads. Creating a plurality of computation threads on a storage device may be requested based on determining that the storage device represents a computational storage. The plurality of computation threads may be created based on preloading a plurality of libraries in the storage device. The plurality of libraries may comprise executable codes associated with the plurality of threads. Data processing associated with the plurality of threads may be offloaded to the storage device using the plurality of computation threads. Activities associated with the plurality of computation threads may be managed by the process.Type: GrantFiled: September 15, 2022Date of Patent: October 15, 2024Assignees: Lemon Inc., Beijing Youzhuju Network Technology Co. Ltd.Inventors: Viacheslav Dubeyko, Jian Wang
-
Patent number: 12112141Abstract: A method for performing a convolution operation includes storing, a convolution kernel in a first storage device, the convolution kernel having dimensions x by y; storing, in a second storage device, a first subset of element values of an input feature map having dimensions n by m; performing a first simultaneous multiplication, of each value of the first subset of element values of the input feature map with a first element value from among the x*y elements of the convolution kernel; for each remaining value of the x*y elements of the convolution kernel, performing, a simultaneous multiplication of the remaining value with a corresponding subset of element values of the input feature map; for each simultaneous multiplication, storing, result of the simultaneous multiplication in an accumulator; and outputting, the values of the accumulator as a first row of an output feature map.Type: GrantFiled: June 12, 2020Date of Patent: October 8, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Ali Shafiee Ardestani, Joseph Hassoun
-
Patent number: 12112167Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.Type: GrantFiled: June 27, 2020Date of Patent: October 8, 2024Assignee: Intel CorporationInventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
-
Patent number: 12106099Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.Type: GrantFiled: October 8, 2023Date of Patent: October 1, 2024Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 12106111Abstract: A prediction unit includes a first predictor that provides an output comprising a hashed fetch address of a current fetch block in response to an input. The first predictor input comprises a hashed fetch address of a previous fetch block that immediately precedes the current fetch block in program execution order. A second predictor provides an output comprising a fetch address of a next fetch block that immediately succeeds the current fetch block in program execution order in response to an input. The second predictor input comprises the hashed fetch address of the current fetch block output by the first predictor.Type: GrantFiled: August 2, 2022Date of Patent: October 1, 2024Assignee: Ventana Micro Systems Inc.Inventors: John G. Favor, Michael N. Michael
-
Patent number: 12106107Abstract: A memory device includes a memory having a memory bank, a processor in memory (PIM) circuit, and control logic. The PIM circuit includes instruction memory storing at least one instruction provided from a host. The PIM circuit is configured to process an operation using data provided by the host or data read from the memory bank and to store at least one instruction provided by the host. The control logic is configured to decode a command/address received from the host to generate a decoding result and to perform a control operation so that one of i) a memory operation on the memory bank is performed and ii) the PIM circuit performs a processing operation, based on the decoding result. A counting value of a program counter instructing a position of the instruction memory is controlled in response to the command/address instructing the processing operation be performed.Type: GrantFiled: March 31, 2023Date of Patent: October 1, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Sukhan Lee, Shinhaeng Kang, Namsung Kim, Seongil O, Hak-Soo Yu
-
Patent number: 12045615Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.Type: GrantFiled: September 16, 2022Date of Patent: July 23, 2024Assignee: Apple Inc.Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
-
Patent number: 12038868Abstract: Devices and techniques for loading contexts in a coarse-grained reconfigurable array processor are described herein. A system or apparatus may include context load circuitry operable to load context for a coarse-grained reconfigurable array processor, where the context load circuitry is configured to: (a) receive a kernel identifier; (b) access a first registry to obtain a context mask base address; (c) determine a context mask address from the context mask base address and the kernel identifier; (d) access a second registry to obtain a context state base address; (e) determine a context state address from the context state base address and the kernel identifier; (f) use a context mask at the context mask address to determine corresponding active context state; and (g) load the corresponding active context state into the coarse-grained reconfigurable array processor.Type: GrantFiled: August 31, 2022Date of Patent: July 16, 2024Assignee: Micron Technology, Inc.Inventors: Bryan Hornung, Douglas Vanesko, David Patrick
-
Patent number: 12026518Abstract: An apparatus for parallel processing includes a memory and one or more processors, at least one of which operates a single instruction, multiple data (SIMD) model, and each of which are coupled to the memory. The processors are configured to process data samples associated with one or multiple chains or graphs of data processors, which chains or graphs describe processing steps to be executed repeatedly on data samples that are a subset of temporally ordered samples. The processors are additionally configured to dynamically schedule one or multiple sets of the samples associated with the one or multiple chains or graphs of data processors to reduce latency of processing of the data samples associated with a single chain or graph of data processors or different chains and graphs of data processors.Type: GrantFiled: September 13, 2022Date of Patent: July 2, 2024Assignee: BRAINGINES SAInventors: Markus Steinberger, Alexander Talashov, Aleksandrs Procopcuks, Vasilii Sumatokhin
-
Patent number: 12001847Abstract: A processor may include an instruction pipeline that executes program instructions in-order according to a program order. During operation, the instruction pipeline may detect that data is missing for a first instruction. In response, the instruction pipeline may send a request to load the missing data for the first instruction. However, the instruction pipeline may not necessarily stall operation to wait for the missing data to be loaded. Instead, the instruction pipeline may continue executing instructions subsequent to the first instruction. During the continued execution, the instruction pipeline may detect that data is missing for a second instruction, and send a request to load the missing data for the second instruction. The instruction pipeline may continue such operation until it determines that a condition occurs that prevents the continued execution. When the condition occurs, the instruction pipeline may stop the continued execution, and then re-execute the first instruction.Type: GrantFiled: August 30, 2022Date of Patent: June 4, 2024Assignee: Apple Inc.Inventors: Justin M Deinlein, Michael L Karm, Brett S Feero, David E Kroesche
-
Patent number: 11995443Abstract: Reuse of branch information queue entries for multiple instances of predicted control instructions in captured loops in a processor, and related methods and computer-readable media. The processor establishes and updates a branch entry in a branch information queue (BIQ) circuit with branch information in response to a speculative prediction made for a predicted control instruction. The branch information is used for making and tracking flow path predictions for predicted control instructions as well as verifying such predictions against its resolution for possible misprediction recovery. The processor is configured to reuse the same branch entry in the BIQ circuit for each instance of the predicted control instruction. This conserves space in the BIQ circuit, which allows for a smaller sized BIQ circuit to be used thus conserving area and power consumption. The branch information for each instance of a predicted control instruction within a loop remains consistent.Type: GrantFiled: October 4, 2022Date of Patent: May 28, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Daren Eugene Streett, Rami Mohammad Al Sheikh
-
Patent number: 11983141Abstract: A system for executing an application on a pool of reconfigurable processors with first and second reconfigurable processors having first and second architectures that are different from each other is presented. The system comprises an archive of configuration files with first and second configuration files for executing the application on the first and second reconfigurable processors, respectively, and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises a runtime processor that allocates reconfigurable processors for executing the application and an auto-discovery module that is configured to perform discovery of whether the reconfigurable processors include at least one of the first reconfigurable processors and whether the reconfigurable processors include at least one of the second reconfigurable processors.Type: GrantFiled: September 9, 2022Date of Patent: May 14, 2024Assignee: SambaNova Systems, Inc.Inventors: Greg Dykema, Maran Wilson, Guoyao Feng, Kuan Zhou, Tianyu Sun, Taylor Lee, Kin Hing Leung, Arnav Goel, Conrad Turlik, Milad Sharif
-
Patent number: 11983538Abstract: Techniques are disclosed relating to a processor load-store unit. In some embodiments, the load-store unit is configured to execute load/store instructions in parallel using first and second pipelines and first and second tag memory arrays. In tag write conflict situations, the load-store unit may arbitrate between the first and second pipelines to ensure the first and second tag memory array contents remain identical. In some embodiments, a data cache tag replay scheme is utilized. In some embodiments, executing load/store instructions in parallel with fills, probes, and store-updates, using separate but identical tag memory arrays, may advantageously improve performance.Type: GrantFiled: April 18, 2022Date of Patent: May 14, 2024Assignee: Cadence Design Systems, Inc.Inventors: Robert T. Golla, Ajay A. Ingle
-
Patent number: 11977890Abstract: Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.Type: GrantFiled: December 30, 2021Date of Patent: May 7, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Magiting M. Talisayon, Luca Schiano, Neil N. Marketkar, Yueh-Chuan Tzeng
-
Patent number: 11977895Abstract: Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.Type: GrantFiled: December 22, 2020Date of Patent: May 7, 2024Assignee: Intel CorporationInventors: Sabareesh Ganapathy, Fangwen Fu, Hong Jiang, James Valerio
-
Patent number: 11977896Abstract: An apparatus, method and computer program, the apparatus comprising processing circuitry to execute instructions, issue circuitry to issue the instructions for execution by the processing circuitry, and candidate instruction storage circuitry to store a plurality of condition-dependent instructions, each specifying at least one condition. The issue circuitry is configured to issue a given condition-dependent instruction in response to a determination or a prediction of the at least one condition specified by the given condition-dependent instruction being met, and when the given condition-dependent instruction is a sequence-start instruction, the issue circuitry is responsive to the determination or prediction to issue a sequence of instructions comprising the sequence-start instruction and at least one subsequent instruction.Type: GrantFiled: September 12, 2022Date of Patent: May 7, 2024Assignee: Arm LimitedInventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu, Wei Wang
-
Patent number: 11966740Abstract: A processor comprising: a register file comprising a group of operand registers for holding data values, each operand register being a fixed number of bits in length for holding a respective data value of that length; and processing logic comprising floating point logic for performing floating point operations on data values in the register file, the floating point logic is configured to process the fixed number of bits in the respective data value according to a floating point format comprising a set of mantissa bits and a set of exponent bits. The processing logic is operable to select between a plurality of different variants of the floating point format, at least some of the variants having a different size sets of mantissa bits and exponent bits relative to one another.Type: GrantFiled: August 10, 2021Date of Patent: April 23, 2024Assignee: Graphcore LimitedInventors: Mrudula Gore, Alan Alexander
-
Patent number: 11934834Abstract: Instruction scheduling in a processor using operation source parent tracking. A source parent is a producer instruction whose execution generates a produced value consumed by a consumer instruction. The processor is configured to track identifying operation source parent information for instructions processed in a pipeline and providing such operation source parent information to a scheduling circuit along with the associated consumer instruction. The scheduling circuit is configured to perform instruction scheduling using operation source parent tracking on received instruction(s) to be scheduled for execution. The processor is configured to compare sources and destinations for each of the instructions to be scheduled based on the operation source parent information to determine instructions ready for scheduling for execution.Type: GrantFiled: October 19, 2021Date of Patent: March 19, 2024Assignee: Ampere Computing LLCInventors: Sean Philip Mirkes, Jason Anthony Bessette
-
Patent number: 11934833Abstract: A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.Type: GrantFiled: December 21, 2021Date of Patent: March 19, 2024Assignee: Texas Instruments IncorporatedInventor: Joseph Zbiciak
-
Patent number: 11907718Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.Type: GrantFiled: August 18, 2021Date of Patent: February 20, 2024Assignee: Micron Technology, Inc.Inventors: Douglas Vanesko, Bryan Hornung, Patrick Estep