Patents Examined by Jacob Petranek
  • Patent number: 12141584
    Abstract: Disclosed herein are embodiments related to a power efficient multi-bit storage system. In one configuration, the multi-bit storage system includes a first storage circuit, a second storage circuit, a prediction circuit, and a clock gating circuit. In one aspect, the first storage circuit updates a first output bit according to a first input bit, in response to a trigger signal, and the second storage circuit updates a second output bit according to a second input bit, in response to the trigger signal. In one aspect, the prediction circuit generates a trigger enable signal indicating whether at least one of the first output bit or the second output bit is predicted to change a state. In one aspect, the clock gating circuit generates the trigger signal based on the trigger enable signal.
    Type: Grant
    Filed: July 7, 2022
    Date of Patent: November 12, 2024
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LIMITED
    Inventors: Kai-Chi Huang, Chi-Lin Liu, Wei-Hsiang Ma, Shang-Chih Hsieh
  • Patent number: 12135679
    Abstract: In an embodiment a system on chip includes at least one master device, at least one slave device, a connection interface configured to route signals between the at least one master device and the at least one slave device, the connection interface configured to operate according to configuration parameters, and a configuration bus connected to the connection interface, wherein the configuration bus is configured to deliver new configuration parameters to the connection interface so as to adapt operation of the connection interface.
    Type: Grant
    Filed: June 8, 2022
    Date of Patent: November 5, 2024
    Assignee: STMicroelectronics S.r.l.
    Inventors: Antonino Mondello, Salvatore Pisasale
  • Patent number: 12130744
    Abstract: A multi-core processor configured to improve processing performance in certain computing contexts is provided. The multi-core processor includes multiple processing cores that implement barrel threading to execute multiple instruction threads in parallel while ensuring that the effects of an idle instruction or thread upon the performance of the processor is minimized. The multiple cores can also share a common data cache, thereby minimizing the need for expensive and complex mechanisms to mitigate inter-cache coherency issues. The barrel-threading can minimize the latency impacts associated with a shared data cache. In some examples, the multi-core processor can also include a serial processor configured to execute single threaded programming code that may not yield satisfactory performance in a processing environment that employs barrel threading.
    Type: Grant
    Filed: February 15, 2022
    Date of Patent: October 29, 2024
    Assignee: Mobileye Vision Technologies Ltd.
    Inventors: Yosef Kreinin, Yosi Arbeli, Gil Israel Dogon
  • Patent number: 12130915
    Abstract: Systems, methods, and apparatuses relating to microarchitectural mechanisms for the prevention of side-channel attacks are disclosed herein. In one embodiment, a processor core includes an instruction fetch circuit to fetch instructions; a branch target buffer comprising a plurality of entries that each include a thread identification (TID) and a privilege level bit; and a branch predictor, coupled to the instruction fetch circuit and the branch target buffer, to predict a target instruction corresponding to a branch instruction based on at least one entry of the plurality of entries in the branch target buffer, and cause the target instruction to be fetched by the instruction fetch circuit.
    Type: Grant
    Filed: February 1, 2022
    Date of Patent: October 29, 2024
    Assignee: Intel CORPORATION
    Inventors: Robert S. Chappell, Jared W. Stark, IV, Joseph Nuzman, Stephen Robinson, Jason W. Brandt
  • Patent number: 12111789
    Abstract: The present disclosure is directed to a distributed graphics processor unit (GPU) architecture that includes an array of processing nodes. Each processing node may include a GPU node that is coupled to its own fast memory unit and its own storage unit. The fast memory unit and storage unit may be integrated into a single unit or may be separately coupled to the GPU node. The processing node may have its fast memory unit coupled to both the GPU node and the storage node. The various architectures provide a GPU-based system that may be treated as a storage unit, such as solid state drive (SSD) that performs onboard processing to perform memory-oriented operations. In this respect, the system may be viewed as a “smart drive” for big-data near-storage processing.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: October 8, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Dmitri Yudanov
  • Patent number: 12112205
    Abstract: Data format conversion processing of an accelerator accessed by a processor of a computing environment is reduced. The processor and accelerator use different data formats, and the accelerator is configured to perform an input conversion to convert data from a processor data format to an accelerator data format prior to performing an operation using the data, and an output conversion to convert resultant data from accelerator data format back to processor data format after performing the operation. The reducing includes determining that adjoining operations of a process to run on the processor and accelerator are to be performed by the accelerator, where the adjoining operations include a source operation and destination operation. Further, the reducing includes blocking an output data format conversion of the source operation and an input data format conversion of the input data for the destination operation.
    Type: Grant
    Filed: July 11, 2023
    Date of Patent: October 8, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Qi Liang, Yi Xuan Zhang, Gui Yu Jiang
  • Patent number: 12112399
    Abstract: A lens distortion correction function operates by backmapping output images to the uncorrected, distorted input images. As a vision image processor completes processing on the image data lines needed for the lens distortion correction function to operate on a group of output, undistorted image lines, the lens distortion correction function begins processing the image data. This improves image processing pipeline delays by overlapping the operations. The vision image processor provides output image data to a circular buffer in SRAM, rather than providing it to DRAM. The lens distortion correction function operates from the image data in the circular buffer. By operating from the SRAM circular buffer, access to the DRAM for the highly fragmented backmapping image data read operations is removed, improving available DRAM bandwidth. By using a circular buffer, less space is needed in the SRAM. The improved memory operations further improve the image processing pipeline delays.
    Type: Grant
    Filed: November 8, 2021
    Date of Patent: October 8, 2024
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Niraj Nandan, Rajasekhar Reddy Allu, Mihir Narendra Mody
  • Patent number: 12106101
    Abstract: Techniques are disclosed for a vector processor architecture that enables data interpolation in accordance with multiple dimensions, such as one-, two-, and three-dimensional linear interpolation. The vector processor architecture includes a vector processor and accompanying vector addressable memory that enable a simultaneous retrieval of multiple entries in the vector addressable memory to facilitate linear interpolation calculations. The vector processor architecture vastly increases the speed in which such calculations may occur compared to conventional processing architectures. Example implementations include the calculation of digital pre-distortion (DPD) coefficients for use with radio frequency (RF) transmitter chains to support multi-band applications.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: October 1, 2024
    Assignee: Intel Corporation
    Inventors: Kameran Azadet, Joseph Williams, Zoran Zivkovic
  • Patent number: 12099846
    Abstract: A data processing apparatus comprises receiver circuitry for receiving instructions from each of a plurality of requester devices. Processing circuitry executes the instructions associated with each of a subset of the requester devices at a time and arbitration circuitry determines the subset of the requester devices and causes the instructions associated with each of the subset of the requester devices to be executed next. In response to the receiver circuitry receiving an instruction of a predetermined type from one of the requester devices outside the subset of requester devices, the arbitration circuitry causes the instruction of the predetermined type to be executed next.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: September 24, 2024
    Assignee: Arm Limited
    Inventors: Frederic Claude Marie Piry, Cédric Denis Robert Airaud, Natalya Bondarenko, Luca Maroncelli, Geoffray Matthieu Lacourba
  • Patent number: 12093695
    Abstract: This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks comprising a training data. Here, set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information are fetched from the current environment to initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources using a data pre-processing technique, to compute a transformed training data and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters.
    Type: Grant
    Filed: February 22, 2023
    Date of Patent: September 17, 2024
    Assignee: TATA CONSULTANCY SERVICES LIMITED
    Inventors: Amit Kalele, Ravindran Subbiah, Anubhav Jain
  • Patent number: 12093694
    Abstract: Techniques and mechanisms for providing branch prediction information to facilitate instruction decoding by a processor. In an embodiment, entries of a branch prediction table (BTB) each identify, for a corresponding instruction, whether a prediction based on the instruction (if any) is eligible to be communicated, with another prediction, in a single fetch cycle. A branch prediction unit of the processor determines a linear address of a fetch region which is under consideration, and performs a search of the BTB based on the linear address. A result of the search is evaluated to detect for any hit entry which indicates a double prediction eligibility. In another embodiment, where it is determined that double prediction eligibility is indicated for an earliest one the instructions represented by the hit entries, multiple predictions are communicated in a single fetch cycle.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: September 17, 2024
    Assignee: Intel Corporation
    Inventors: Mathew Lowes, Jonathan Combs, Martin Licht
  • Patent number: 12086598
    Abstract: The present disclosure provides new and innovative systems and methods for processing out-of-order events. In an example, a computer-implemented method includes obtaining data, committing the obtained data to a fixed-size storage pool, the fixed-size storage pool including a plurality of slots and a pool index including a fixed-length array, by acquiring a slot in the plurality of slots, locking the acquired slot, storing the obtained data in the acquired slot, updating the pool index for the storage pool by updating an element in the array corresponding to the acquired slot, the element storing an indication of the obtained data, and unlocking the acquired slot, and transmitting an indication that the data is available.
    Type: Grant
    Filed: August 13, 2021
    Date of Patent: September 10, 2024
    Assignee: Red Hat, Inc.
    Inventors: Andrea Tarocchi, Francesco Nigro
  • Patent number: 12079157
    Abstract: Argument registers in a reconfigurable processor are loaded from a runtime program running on a host processor. The runtime program stores a configuration file in a memory. A program load controller reads the configuration file from the memory and distributes it to configurable units in in the reconfigurable processor which sequentially shift it into a shift register of the configuration data store. The runtime program stores an argument load file in the memory and a fast argument load (FAL) controller reads the argument load file from memory and distributes (value, control) tuples to the configuration units in the reconfigurable processor. The configurable units process the tuples by writing the value directly into an argument register made up of a portion of the shift register in the configuration data store specified by the control of the tuple without shifting the value through the shift register.
    Type: Grant
    Filed: February 2, 2023
    Date of Patent: September 3, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Manish K. Shah, Gregory Frederick Grohoski
  • Patent number: 12073215
    Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
    Type: Grant
    Filed: December 16, 2019
    Date of Patent: August 27, 2024
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Yao Zhang, Bingrui Wang
  • Patent number: 12072834
    Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.
    Type: Grant
    Filed: May 15, 2023
    Date of Patent: August 27, 2024
    Assignee: Moffett International Co., Limited
    Inventors: Xiaoqian Zhang, Zhibin Xiao
  • Patent number: 12072836
    Abstract: A reconfigurable processor includes an array of configurable units connected by a bus system. Each configurable unit has a configuration data store, organized as a shift register, to store configuration data. The configuration data store also includes individually addressable argument registers respectively made up of word-sized portions of the shift register to provide arguments to the configurable unit. The configurable unit also includes program load logic shift data into the configuration data store, and argument load logic to directly load data into the argument registers without shifting the received argument data through the shift register. A program load controller is associated with the array to respond to a program load command by executing a program load process, and a fast argument load (FAL) controller is associated with the array to respond to an FAL command by executing an FAL process.
    Type: Grant
    Filed: February 2, 2023
    Date of Patent: August 27, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Manish K. Shah, Gregory Frederick Grohoski
  • Patent number: 12061910
    Abstract: A processor unit for multiply and accumulate (“MAC”) operations is provided. The present invention may include the processor unit having a plurality of MAC units for performing a set of MAC operations. The present invention may include each MAC unit having an execution unit and a one-write one-read (“1W/1R”) register file, where the 1W/1R register file may have at least one accumulator. The present invention may include the execution unit of each MAC unit being configured to perform a subset of MAC operations by computing a product of a set of values received from another register file of the processor unit and adding the computed product to the at least one accumulator. The present invention may include each MAC unit being configured to perform the respective subset of MAC operations in a single clock cycle.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: August 13, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jentje Leenstra, Andreas Wagner, Jose E. Moreira, Brian W. Thompto
  • Patent number: 12045618
    Abstract: The invention provides a data processing apparatus and a data processing method for generating prefetches of data for use during execution of instructions by processing circuitry. The prefetches that are generated are based on a nested prefetch pattern. The nested prefetch pattern comprises a first pattern and a second pattern. The first pattern is defined by a first address offset between sequentially accessed addresses and a first observed number of the sequentially accessed addresses separated by the first address offset. The second pattern is defined by a second address offset between sequential iterations of the first pattern and a second observed number of the sequential iterations of the first pattern separated by the second address offset.
    Type: Grant
    Filed: March 23, 2021
    Date of Patent: July 23, 2024
    Assignee: Arm Limited
    Inventors: Natalya Bondarenko, Stefano Ghiggini, Geoffray Matthieu Lacourba, Cédric Denis Robert Airaud
  • Patent number: 12045653
    Abstract: A computer-implemented method and related systems for reducing memory access stalls and memory allocation requests in data-intensive applications are provided. Invariants associated with execution paths that access data in a memory of the data-intensive application are identified. At least one field specialization technique using at least one speccode segment is then applied. The speccode segment exploits the identified invariants, thereby reducing at least one of memory stalls and memory allocation requests in a data-intensive application. The field specialization technique may include specialized software prefetching, a data distribution-based hash function, process to CPU binding, memory segment reuse, or memory layout optimization, or any combination thereof.
    Type: Grant
    Filed: June 22, 2018
    Date of Patent: July 23, 2024
    Assignee: DATAWARE VENTURES, LLC
    Inventors: Rui Zhang, Richard T. Snodgrass, Christian Convey
  • Patent number: 12045611
    Abstract: In one example, a method comprises: receiving input codes, wherein the input codes represent a computational dataflow graph; traversing the computational dataflow graph to identify single-entry-single-exit (SESE) subgraphs of the computational dataflow graph, wherein each SESE subgraph has a sequence of nodes comprising a root node and a child node and representing a sequence of element-wise operators, wherein the root node receives a single input tensor, and wherein the child node outputs a single output tensor; determining a merged operator for each SESE subgraph; and generating executable instructions for the computational dataflow graph to be executed by a hardware accelerator having a first execution unit and a second execution unit, wherein the executable instructions comprise first executable instructions for the merged operators targeted at the first execution unit, and second executable instructions for other operators of the computational dataflow graph targeted at the second execution unit.
    Type: Grant
    Filed: August 7, 2023
    Date of Patent: July 23, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Ron Diamant, Hongbin Zheng, Drazen Borkovic, Haichen Li