Patents Examined by Jacob Petranek
-
Patent number: 12164917Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.Type: GrantFiled: May 17, 2023Date of Patent: December 10, 2024Assignee: Google LLCInventors: Vinayak Anand Gokhale, Matthew Leever Hedlund, Matthew William Ashcraft, Indranil Chakraborty
-
Patent number: 12153920Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of one are described.Type: GrantFiled: December 13, 2019Date of Patent: November 26, 2024Assignee: Intel CorporationInventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 12153923Abstract: A supplemental computing system can provide card services while saving processing power of a data center for other tasks. For example, the supplemental computing system described herein can include a processor and a memory that includes instructions that are executable by the processor to perform operations. The operations can include receiving a first subset of card requests. The operations can further include performing at least one servicing task to a card request resulting in an altered card request. Additionally, the operations can include selecting, for each altered card request in the first subset, a secondary card processor from at least one secondary card processor. The operations can also include transforming the altered card request into a secondary card processor specific card request suitable for the selected secondary card processor. The operations can include submitting the secondary card processor specific card request to the selected secondary card processor.Type: GrantFiled: August 3, 2023Date of Patent: November 26, 2024Assignee: Truist BankInventors: Naga Mrudula Kalyani Chitturi, Glenn S. Bruce, Manikandan Dhanabalan, Gopinath Rajagopal, Harish Dindi, Vijay Srinivasan, Jay Poole
-
Patent number: 12153927Abstract: Merging branch target buffer entries includes maintaining, in a branch target buffer, an entry corresponding to first branch instruction, where the entry identifies a first branch target address for the first branch instruction and a second branch target address for a second branch instruction; and accessing, based on the first branch instruction, the entry.Type: GrantFiled: June 1, 2020Date of Patent: November 26, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Thomas Clouqueur, Marius Evers, Aparna Mandke, Steven R. Havlir, Robert Cohen, Anthony Jarvis
-
Patent number: 12153926Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.Type: GrantFiled: December 21, 2023Date of Patent: November 26, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
-
Patent number: 12141584Abstract: Disclosed herein are embodiments related to a power efficient multi-bit storage system. In one configuration, the multi-bit storage system includes a first storage circuit, a second storage circuit, a prediction circuit, and a clock gating circuit. In one aspect, the first storage circuit updates a first output bit according to a first input bit, in response to a trigger signal, and the second storage circuit updates a second output bit according to a second input bit, in response to the trigger signal. In one aspect, the prediction circuit generates a trigger enable signal indicating whether at least one of the first output bit or the second output bit is predicted to change a state. In one aspect, the clock gating circuit generates the trigger signal based on the trigger enable signal.Type: GrantFiled: July 7, 2022Date of Patent: November 12, 2024Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LIMITEDInventors: Kai-Chi Huang, Chi-Lin Liu, Wei-Hsiang Ma, Shang-Chih Hsieh
-
Patent number: 12135679Abstract: In an embodiment a system on chip includes at least one master device, at least one slave device, a connection interface configured to route signals between the at least one master device and the at least one slave device, the connection interface configured to operate according to configuration parameters, and a configuration bus connected to the connection interface, wherein the configuration bus is configured to deliver new configuration parameters to the connection interface so as to adapt operation of the connection interface.Type: GrantFiled: June 8, 2022Date of Patent: November 5, 2024Assignee: STMicroelectronics S.r.l.Inventors: Antonino Mondello, Salvatore Pisasale
-
Patent number: 12130744Abstract: A multi-core processor configured to improve processing performance in certain computing contexts is provided. The multi-core processor includes multiple processing cores that implement barrel threading to execute multiple instruction threads in parallel while ensuring that the effects of an idle instruction or thread upon the performance of the processor is minimized. The multiple cores can also share a common data cache, thereby minimizing the need for expensive and complex mechanisms to mitigate inter-cache coherency issues. The barrel-threading can minimize the latency impacts associated with a shared data cache. In some examples, the multi-core processor can also include a serial processor configured to execute single threaded programming code that may not yield satisfactory performance in a processing environment that employs barrel threading.Type: GrantFiled: February 15, 2022Date of Patent: October 29, 2024Assignee: Mobileye Vision Technologies Ltd.Inventors: Yosef Kreinin, Yosi Arbeli, Gil Israel Dogon
-
Patent number: 12130915Abstract: Systems, methods, and apparatuses relating to microarchitectural mechanisms for the prevention of side-channel attacks are disclosed herein. In one embodiment, a processor core includes an instruction fetch circuit to fetch instructions; a branch target buffer comprising a plurality of entries that each include a thread identification (TID) and a privilege level bit; and a branch predictor, coupled to the instruction fetch circuit and the branch target buffer, to predict a target instruction corresponding to a branch instruction based on at least one entry of the plurality of entries in the branch target buffer, and cause the target instruction to be fetched by the instruction fetch circuit.Type: GrantFiled: February 1, 2022Date of Patent: October 29, 2024Assignee: Intel CORPORATIONInventors: Robert S. Chappell, Jared W. Stark, IV, Joseph Nuzman, Stephen Robinson, Jason W. Brandt
-
Patent number: 12112205Abstract: Data format conversion processing of an accelerator accessed by a processor of a computing environment is reduced. The processor and accelerator use different data formats, and the accelerator is configured to perform an input conversion to convert data from a processor data format to an accelerator data format prior to performing an operation using the data, and an output conversion to convert resultant data from accelerator data format back to processor data format after performing the operation. The reducing includes determining that adjoining operations of a process to run on the processor and accelerator are to be performed by the accelerator, where the adjoining operations include a source operation and destination operation. Further, the reducing includes blocking an output data format conversion of the source operation and an input data format conversion of the input data for the destination operation.Type: GrantFiled: July 11, 2023Date of Patent: October 8, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Qi Liang, Yi Xuan Zhang, Gui Yu Jiang
-
Patent number: 12112399Abstract: A lens distortion correction function operates by backmapping output images to the uncorrected, distorted input images. As a vision image processor completes processing on the image data lines needed for the lens distortion correction function to operate on a group of output, undistorted image lines, the lens distortion correction function begins processing the image data. This improves image processing pipeline delays by overlapping the operations. The vision image processor provides output image data to a circular buffer in SRAM, rather than providing it to DRAM. The lens distortion correction function operates from the image data in the circular buffer. By operating from the SRAM circular buffer, access to the DRAM for the highly fragmented backmapping image data read operations is removed, improving available DRAM bandwidth. By using a circular buffer, less space is needed in the SRAM. The improved memory operations further improve the image processing pipeline delays.Type: GrantFiled: November 8, 2021Date of Patent: October 8, 2024Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Niraj Nandan, Rajasekhar Reddy Allu, Mihir Narendra Mody
-
Patent number: 12111789Abstract: The present disclosure is directed to a distributed graphics processor unit (GPU) architecture that includes an array of processing nodes. Each processing node may include a GPU node that is coupled to its own fast memory unit and its own storage unit. The fast memory unit and storage unit may be integrated into a single unit or may be separately coupled to the GPU node. The processing node may have its fast memory unit coupled to both the GPU node and the storage node. The various architectures provide a GPU-based system that may be treated as a storage unit, such as solid state drive (SSD) that performs onboard processing to perform memory-oriented operations. In this respect, the system may be viewed as a “smart drive” for big-data near-storage processing.Type: GrantFiled: April 22, 2020Date of Patent: October 8, 2024Assignee: Micron Technology, Inc.Inventor: Dmitri Yudanov
-
Patent number: 12106101Abstract: Techniques are disclosed for a vector processor architecture that enables data interpolation in accordance with multiple dimensions, such as one-, two-, and three-dimensional linear interpolation. The vector processor architecture includes a vector processor and accompanying vector addressable memory that enable a simultaneous retrieval of multiple entries in the vector addressable memory to facilitate linear interpolation calculations. The vector processor architecture vastly increases the speed in which such calculations may occur compared to conventional processing architectures. Example implementations include the calculation of digital pre-distortion (DPD) coefficients for use with radio frequency (RF) transmitter chains to support multi-band applications.Type: GrantFiled: December 23, 2020Date of Patent: October 1, 2024Assignee: Intel CorporationInventors: Kameran Azadet, Joseph Williams, Zoran Zivkovic
-
Patent number: 12099846Abstract: A data processing apparatus comprises receiver circuitry for receiving instructions from each of a plurality of requester devices. Processing circuitry executes the instructions associated with each of a subset of the requester devices at a time and arbitration circuitry determines the subset of the requester devices and causes the instructions associated with each of the subset of the requester devices to be executed next. In response to the receiver circuitry receiving an instruction of a predetermined type from one of the requester devices outside the subset of requester devices, the arbitration circuitry causes the instruction of the predetermined type to be executed next.Type: GrantFiled: August 9, 2021Date of Patent: September 24, 2024Assignee: Arm LimitedInventors: Frederic Claude Marie Piry, Cédric Denis Robert Airaud, Natalya Bondarenko, Luca Maroncelli, Geoffray Matthieu Lacourba
-
Patent number: 12093695Abstract: This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks comprising a training data. Here, set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information are fetched from the current environment to initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources using a data pre-processing technique, to compute a transformed training data and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters.Type: GrantFiled: February 22, 2023Date of Patent: September 17, 2024Assignee: TATA CONSULTANCY SERVICES LIMITEDInventors: Amit Kalele, Ravindran Subbiah, Anubhav Jain
-
Patent number: 12093694Abstract: Techniques and mechanisms for providing branch prediction information to facilitate instruction decoding by a processor. In an embodiment, entries of a branch prediction table (BTB) each identify, for a corresponding instruction, whether a prediction based on the instruction (if any) is eligible to be communicated, with another prediction, in a single fetch cycle. A branch prediction unit of the processor determines a linear address of a fetch region which is under consideration, and performs a search of the BTB based on the linear address. A result of the search is evaluated to detect for any hit entry which indicates a double prediction eligibility. In another embodiment, where it is determined that double prediction eligibility is indicated for an earliest one the instructions represented by the hit entries, multiple predictions are communicated in a single fetch cycle.Type: GrantFiled: March 26, 2021Date of Patent: September 17, 2024Assignee: Intel CorporationInventors: Mathew Lowes, Jonathan Combs, Martin Licht
-
Patent number: 12086598Abstract: The present disclosure provides new and innovative systems and methods for processing out-of-order events. In an example, a computer-implemented method includes obtaining data, committing the obtained data to a fixed-size storage pool, the fixed-size storage pool including a plurality of slots and a pool index including a fixed-length array, by acquiring a slot in the plurality of slots, locking the acquired slot, storing the obtained data in the acquired slot, updating the pool index for the storage pool by updating an element in the array corresponding to the acquired slot, the element storing an indication of the obtained data, and unlocking the acquired slot, and transmitting an indication that the data is available.Type: GrantFiled: August 13, 2021Date of Patent: September 10, 2024Assignee: Red Hat, Inc.Inventors: Andrea Tarocchi, Francesco Nigro
-
Patent number: 12079157Abstract: Argument registers in a reconfigurable processor are loaded from a runtime program running on a host processor. The runtime program stores a configuration file in a memory. A program load controller reads the configuration file from the memory and distributes it to configurable units in in the reconfigurable processor which sequentially shift it into a shift register of the configuration data store. The runtime program stores an argument load file in the memory and a fast argument load (FAL) controller reads the argument load file from memory and distributes (value, control) tuples to the configuration units in the reconfigurable processor. The configurable units process the tuples by writing the value directly into an argument register made up of a portion of the shift register in the configuration data store specified by the control of the tuple without shifting the value through the shift register.Type: GrantFiled: February 2, 2023Date of Patent: September 3, 2024Assignee: SambaNova Systems, Inc.Inventors: Manish K. Shah, Gregory Frederick Grohoski
-
Patent number: 12073215Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.Type: GrantFiled: December 16, 2019Date of Patent: August 27, 2024Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTDInventors: Yao Zhang, Bingrui Wang
-
Patent number: 12072834Abstract: This application describes a hardware accelerator and a device for accelerating neural network computations. An example accelerator may include multiple cores and a central processing unit (CPU) respectively associated with DDRs, a data exchange interface connecting a host device to the accelerator, and a three-layer NoC architecture. The three-layer NoC architecture includes an outer-layer NoC configured to transfer data between the host device and the DDRs, a middle-layer NoC configured to transfer data among the plurality of cores; and an inner-layer NoC within each core and including a cross-bar network for broadcasting weights and activations of neural networks from a global buffer of the core to a plurality of processing entity (PE) clusters within the core.Type: GrantFiled: May 15, 2023Date of Patent: August 27, 2024Assignee: Moffett International Co., LimitedInventors: Xiaoqian Zhang, Zhibin Xiao