Patents Examined by Jacob Petranek

System and method for processing convolutions on crossbar-based neural network accelerators for increased inference throughput

Patent number: 12254395

Abstract: Systems and methods are provided to improve traditional chip processing. Using crossbar computations, the convolution layer can be flattened into vectors, and the vectors can be grouped into a matrix where each row or column is a flattened filter. Each submatrix of the input corresponding to a position of a convolution window is also flattened into a vector. The convolution is computed as the dot product of each input vector and the filter matrix. Using intra-crossbar computations, the unused space of the crossbars is used to store replicas of the filters matrices and the unused space in XIN is used to store more elements of the input. In inter-crossbar computations, the unused crossbars are used to store replicas of the filters matrices and the unused XINs are used to store more elements of the input. Then, the method performs multiple convolution iterations in a single step.

Type: Grant

Filed: September 21, 2020

Date of Patent: March 18, 2025

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Glaucimar Da Silva Aguiar, Francisco Plínio Oliveira Silveira, Eun Sub Lee, Rodrigo Jose Da Rosa Antunes, Joaquim Gomes Da Costa Eulalio De Souza, Martin Foltin, Jefferson Rodrigo Alves Cavalcante, Lucas Leite, Arthur Carvalho Walraven Da Cunha, Monycky Vasconcelos Frazao, Alex Ferreira Ramires Trajano
Processor embedded streaming buffer

Patent number: 12248333

Abstract: Techniques are disclosed for the use of local buffers integrated into the execution units of a vector processor architecture. The use of local buffers results in less communication across the interconnection network implemented by vector processors, and increases interconnection network bandwidth, increases the speed of computations, and decreases power usage.

Type: Grant

Filed: June 25, 2021

Date of Patent: March 11, 2025

Assignee: Intel Corporation

Inventor: Joseph Williams
Overlay layer for network of processor cores

Patent number: 12248430

Abstract: Methods and systems related to the efficient execution of complex computations by a multicore processor and the movement of data among the various processing cores in the multicore processor are disclosed. A multicore processor includes a set of processing cores and associated sets of processing pipelines, core controllers, routers, and network interface units. The multicore processor also includes a computation layer, for conducting computations using the set of processing cores, with executable instructions for the set of processing pipelines which are executed by the set of core controllers. The multicore processor also includes a network-on-chip layer, for connecting the set of processing cores in the multicore processor, with executable instructions for the set of routers and the set of network interface units.

Type: Grant

Filed: September 14, 2022

Date of Patent: March 11, 2025

Assignee: Tenstorrent Inc.

Inventors: Davor Capalija, Ivan Matosevic, Jasmina Vasiljevic, Utku Aydonat, Andrew Lewycky, S. Alexander Chin, Ljubisa Bajic, Alex Cejkov, Milos Trajkovic
Supporting 8-bit floating point format operands in a computing architecture

Patent number: 12242846

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

Type: Grant

Filed: March 27, 2024

Date of Patent: March 4, 2025

Assignee: INTEL CORPORATION

Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
Apparatus and method for temperature-constrained frequency control and scheduling

Patent number: 12235792

Abstract: An apparatus and method for temperature-constrained frequency control and scheduling. For example, one embodiment of a processor comprises: a plurality of cores; power management circuitry to control a frequency of each core of the plurality of cores based, at least in part, on a temperature associated with one or more cores of the plurality of cores, the power management circuitry comprising: a temperature limit-driven frequency controller to determine a first frequency limit value based on a temperature of a corresponding core reaching a first threshold; frequency prediction hardware logic to predict a temperature-constrained frequency of the corresponding core based on the first frequency limit value and an initial frequency limit value; and performance determination hardware logic to determine a new performance value for the corresponding core based on the temperature-constrained frequency, the new performance value to be provided to a task scheduler.

Type: Grant

Filed: March 30, 2023

Date of Patent: February 25, 2025

Assignee: Intel Corporation

Inventors: Jianwei Dai, Somvir Singh Dahiya, Mahesh Kumar P, Stephen H. Gunther, Sapumal Wijeratne, Mark Gallina
Processor cores using content object identifiers for routing and computation

Patent number: 12236237

Abstract: Processor cores using content object identifiers for routing and computation are disclosed. One method includes executing a complex computation using a set of processing cores. The method includes routing a set of content objects using a set of content object identifiers and executing a set of instructions. The set of instructions are defined using a set of operand identifiers. The operand identifiers represent content object identifiers in the set of content object identifiers. The content objects can be routed according to a named data networking (NDN) or content-centric networking (CCN) paradigm with the content object identifiers mentioned above serving as the names for the computation data being routed by the network.

Type: Grant

Filed: March 31, 2023

Date of Patent: February 25, 2025

Assignee: Tenstorrent Inc.

Inventors: Davor Capalija, Ljubisa Bajic, Jasmina Vasiljevic, Yongbum Kim
Loop driven region based frontend translation control for performant and secure data-space guided micro-sequencing

Patent number: 12235791

Abstract: Methods and apparatus relating to loop driven region based frontend translation control for performant and secure data-space guided micro-sequencing are described. In an embodiment, Data-space Translation Logic (DTL) circuitry receives a static input and a dynamic input and generates one or more outputs based at least in part on the static input and the dynamic input. A frontend counter generates a count value for the dynamic input based at least in part on an incremented/decremented counter value and a next counter value from the DTL circuitry. The DTL circuitry is capable to receive a new dynamic input prior to consumption of the one or more outputs. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: August 23, 2021

Date of Patent: February 25, 2025

Assignee: Intel Corporation

Inventors: Kameswar Subramaniam, Christopher Russell
Atomic operation predictor to predict whether an atomic operation will complete successfully

Patent number: 12229557

Abstract: In an embodiment, a processor comprises an atomic predictor circuit to predict whether or not an atomic operation will complete successfully. The prediction may be used when a subsequent load operation to the same memory location as the atomic operation is executed, to determine whether or not to forward store data from the atomic operation to the subsequent load operation. If the prediction is successful, the store data may be forwarded. If the prediction is unsuccessful, the store data may not be forwarded. In cases where an atomic operation has been failing (not successfully performing the store operation), the prediction may prevent the forwarding of the store data and thus may prevent a subsequent flush of the load.

Type: Grant

Filed: March 11, 2024

Date of Patent: February 18, 2025

Assignee: Apple Inc.

Inventors: Brian R. Mestan, Gideon N. Levinsky, Michael L. Karm
Multiple system-on-chip arrangement for vehicle computing systems

Patent number: 12229079

Abstract: A computing system can include a first system on chip (SoC) and a second SoC. Each SoC can comprise a memory in which the SoC publishes state information. For the first SoC, the state information can correspond to a set of tasks being performed by the first SoC, where the first SoC utilizes a plurality of computational components to perform the set of tasks. The first SoC can directly access the memory of the first SoC to dynamically read the state information published by the first SoC. In a backup role, the second SoC maintains a subset of its computational components in a low power state. When the second SoC detects a trigger while reading the state information published in the first memory of the first SoC, the second SoC powers the subset of computational components to take over the set of tasks.

Type: Grant

Filed: May 10, 2023

Date of Patent: February 18, 2025

Assignee: Mercedes-Benz Group AG

Inventor: Francois Piednoel
Generating masks from decoded instructions to apply to fetched instructions for unmasking

Patent number: 12217052

Abstract: A method for executing a machine code using a microprocessor includes, after an operation of decoding a current loaded instruction, constructing a mask from the signals generated by an instruction decoder in response to decoding of the current loaded instruction by the decoder. The constructed mask varies as a function of the current loaded instruction. Subsequently, before an operation of decoding a next loaded instruction, the next loaded instruction is unmasked using the constructed mask.

Type: Grant

Filed: March 23, 2022

Date of Patent: February 4, 2025

Assignee: Commissariat à l'Energie Atomique et aux Energies Alternatives

Inventors: Gaëtan Leplus, Olivier Savry
Neural processing device, processing element included therein and method for operating various formats of neural processing device

Patent number: 12210872

Abstract: A neural processing device, a processing element included therein and a method for operating various formats of the neural processing device are provided. The neural processing device includes at least one neural processor, a shared memory shared by the at least one neural processor, and a global interconnection configured to transmit data between the at least one neural processor and the shared memory, wherein each of the at least one neural processor comprises at least one processing element, each of the at least one processing element receives an input in a first format and thereby performs an operation, and receives an input in a second format that is different from the first format and thereby performs an operation if a format conversion signal is received, and the first format and the second format have a same number of bits.

Type: Grant

Filed: March 12, 2024

Date of Patent: January 28, 2025

Assignee: Rebellions Inc.

Inventors: Karim Charfi, Jinwook Oh
Overlay layer hardware unit for network of processor cores

Patent number: 12210478

Abstract: Methods and systems for executing an application data flow graph on a set of computational nodes are disclosed. The computational nodes can each include a programmable controller from a set of programmable controllers, a memory from a set of memories, a network interface unit from a set of network interface units, and an endpoint from a set of endpoints. A disclosed method comprises configuring the programmable controllers with instructions. The method also comprises independently and asynchronously executing the instructions using the set of programmable controllers in response to a set of events exchanged between the programmable controllers themselves, between the programmable controllers and the network interface units, and between the programmable controllers and the set of endpoints. The method also comprises transitioning data in the set of memories on the computational nodes in accordance with the application data flow graph and in response to the execution of the instructions.

Type: Grant

Filed: May 11, 2023

Date of Patent: January 28, 2025

Assignee: Tenstorrent Inc.

Inventors: Ivan Matosevic, Davor Capalija, Jasmina Vasiljevic, Utku Aydonat, S. Alexander Chin, Djordje Maksimovic, Ljubisa Bajic
Stochastic hyperdimensional arithmetic computing

Patent number: 12204899

Abstract: Stochastic hyperdimensional arithmetic computing is provided. Hyperdimensional computing (HDC) is a neurally-inspired computation model working based on the observation that the human brain operates on high-dimensional representations of data, called hypervectors. Although HDC is powerful in reasoning and association of the abstract information, it is weak on feature extraction from complex data. Consequently, most existing HDC solutions rely on expensive pre-processing algorithms for feature extraction. This disclosure proposes StocHD, a novel end-to-end hyperdimensional system that supports accurate, efficient, and robust learning over raw data. StocHD expands HDC functionality to the computing area by mathematically defining stochastic arithmetic over HDC hypervectors. StocHD enables an entire learning application (including feature extractor) to process using HDC data representation, enabling uniform, efficient, robust, and highly parallel computation.

Type: Grant

Filed: May 13, 2022

Date of Patent: January 21, 2025

Assignee: The Regents of the University of California

Inventor: Mohsen Imani
Dynamic designation of instructions as sensitive for constraining instruction execution

Patent number: 12204904

Abstract: Described herein are systems and methods for dynamic designation of instructions as sensitive. For example, some methods include detecting that a first instruction of a first process has been designated as a sensitive instruction; checking whether a sensitive handling enable indicator in a process state register storing a state of the first process is enabled; responsive to detection of the sensitive instruction and enablement of the sensitive handling enable indicator, invoking a constraint for execution of the first instruction; executing the first instruction subject to the constraint; and executing a second instruction of the first process without the constraint.

Type: Grant

Filed: February 7, 2022

Date of Patent: January 21, 2025

Assignee: Marvell Asia Pte, Ltd.

Inventor: Shubhendu Sekhar Mukherjee
Processing elements array that includes delay queues between processing elements to hold shared data

Patent number: 12190224

Abstract: A processing element architecture adapted to a convolution comprises a plurality of processing elements and a delayed queue circuit. The plurality of processing elements includes a first processing element and a second processing element, wherein the first processing element and the second processing element perform the convolution according to a shared datum at least. The delayed queue circuit connects to the first processing element and connects to the second processing element. The delayed queue circuit receives the shared datum sent by the first processing element, and sends the shared datum to the second processing element after receiving the shared datum and waiting for a time interval.

Type: Grant

Filed: December 29, 2020

Date of Patent: January 7, 2025

Assignee: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Yao-Hua Chen, Yu-Xiang Yen, Wan-Shan Hsieh, Chih-Tsun Huang, Juin-Ming Lu, Jing-Jia Liou
Transposing at-speed in a vector-matrix accelerator

Patent number: 12164917

Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.

Type: Grant

Filed: May 17, 2023

Date of Patent: December 10, 2024

Assignee: Google LLC

Inventors: Vinayak Anand Gokhale, Matthew Leever Hedlund, Matthew William Ashcraft, Indranil Chakraborty
Processor-guided execution of offloaded instructions using fixed function operations

Patent number: 12153926

Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

Type: Grant

Filed: December 21, 2023

Date of Patent: November 26, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
Apparatuses, methods, and systems for instructions to multiply values of one

Patent number: 12153920

Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of one are described.

Type: Grant

Filed: December 13, 2019

Date of Patent: November 26, 2024

Assignee: Intel Corporation

Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
Multi-card processor access framework

Patent number: 12153923

Abstract: A supplemental computing system can provide card services while saving processing power of a data center for other tasks. For example, the supplemental computing system described herein can include a processor and a memory that includes instructions that are executable by the processor to perform operations. The operations can include receiving a first subset of card requests. The operations can further include performing at least one servicing task to a card request resulting in an altered card request. Additionally, the operations can include selecting, for each altered card request in the first subset, a secondary card processor from at least one secondary card processor. The operations can also include transforming the altered card request into a secondary card processor specific card request suitable for the selected secondary card processor. The operations can include submitting the secondary card processor specific card request to the selected secondary card processor.

Type: Grant

Filed: August 3, 2023

Date of Patent: November 26, 2024

Assignee: Truist Bank

Inventors: Naga Mrudula Kalyani Chitturi, Glenn S. Bruce, Manikandan Dhanabalan, Gopinath Rajagopal, Harish Dindi, Vijay Srinivasan, Jay Poole
Merged branch target buffer entries

Patent number: 12153927

Abstract: Merging branch target buffer entries includes maintaining, in a branch target buffer, an entry corresponding to first branch instruction, where the entry identifies a first branch target address for the first branch instruction and a second branch target address for a second branch instruction; and accessing, based on the first branch instruction, the entry.

Type: Grant

Filed: June 1, 2020

Date of Patent: November 26, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Thomas Clouqueur, Marius Evers, Aparna Mandke, Steven R. Havlir, Robert Cohen, Anthony Jarvis

1 2 3 4 5 … next