Patents Examined by Zachary K Huson
  • Patent number: 11580353
    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: February 14, 2023
    Assignee: Apple Inc.
    Inventor: Christopher L. Mills
  • Patent number: 11579880
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Grant
    Filed: April 26, 2021
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 11579843
    Abstract: Methods, Systems, and apparatuses related to performing bit string accumulation within a compute or memory device are described. A logic circuit with processing capability and a register within or near memory, for example, can perform multiple iterations of a recursive operation using several bit strings. Results of the various iterations may be written to the register, and subsequent iterations of the recursive operation using the bit strings may be performed. Results of the iterations of recursive operations may be accumulated within the register. Accumulated results may be written as data to another register or to memory that is external to or separate from the logic circuit.
    Type: Grant
    Filed: June 15, 2020
    Date of Patent: February 14, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Vijay S. Ramesh
  • Patent number: 11568906
    Abstract: Apparatuses and methods for writing and storing parameter codes for operating parameters, and selecting between the parameter codes to set an operating condition for a memory are disclosed. An example apparatus includes a first mode register and a second mode register. The first mode register is configured to store first and second parameter codes for a same operating parameter. The second mode register is configured to store a parameter code for a control parameter to select between the first and second parameter codes to set a current operating condition for the operating parameter. An example method includes storing in a first register a first parameter code for an operating parameter used to set a first memory operating condition, and further includes storing in a second register a second parameter code for the operating parameter used to set a second memory operating condition.
    Type: Grant
    Filed: April 6, 2021
    Date of Patent: January 31, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Dean D. Gans, Daniel C. Skinner
  • Patent number: 11567769
    Abstract: A data pipeline circuit includes an upstream interface circuit that receives sequential data and a downstream interface circuit that transfers the sequential data to a downstream circuit. A ready signal indicates the downstream circuit is ready to receive the sequential data. The data pipeline circuit includes a first data latch, a second data latch and a first status latch. The first data latch receives the sequential data. The first status latch generates an available signal that is asserted to indicate the second data latch is available to receive the sequential data. The second data latch receives the sequential data in response on the available signal being asserted and the ready signal indicating the downstream circuit is not ready to receive the sequential data on the data output. Limiting conditions in which the sequential data is stored in the second data latch significantly reduces power consumption of the data pipeline circuit.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: January 31, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Thomas Detwiler, Steven J. Urish
  • Patent number: 11561799
    Abstract: An execution unit comprising a processing pipeline configured to perform calculations to evaluate a plurality of mathematical functions. The processing pipeline comprises a plurality of stages through which each calculation for evaluating a mathematical function progresses to an end result. Each of a plurality of processing circuits in the pipeline is configured to perform an operation on input values during at least one stage of the plurality of stages. The plurality of processing circuits include multiplier circuits. A first multiplier circuit and a second multiplier circuit are configured to operate in parallel, such that at the same stage in the processing pipeline, the first multiplier circuit and the second multiplier circuit perform their processing. A third multiplier circuit is arranged in series with the first multiplier circuit and the second multiplier circuit and processes outputs from the first multiplier circuit and the second multiplier circuit.
    Type: Grant
    Filed: June 3, 2021
    Date of Patent: January 24, 2023
    Assignee: GRAPHCORE LIMITED
    Inventor: Jonathan Mangnall
  • Patent number: 11550585
    Abstract: A method and apparatus is provided for processing accelerator instructions in a data processing apparatus, where a block of one or more accelerator instructions is executable on a host processor or on an accelerator device. For an instruction executed on the host processor and referencing a first virtual address, the instruction is issued to an instruction queue of the host processor and executed the instruction by the host processor, the executing including translating, by translation hardware of the host processor, the first virtual address to a first physical address. For an instruction executed on the accelerator device and referencing the first virtual address, the first virtual address is translated, by the translation hardware, to a second physical address and the instruction is sent to the accelerator device referencing the second physical address. An accelerator task may be initiated by writing configuration data to an accelerator job queue.
    Type: Grant
    Filed: March 23, 2021
    Date of Patent: January 10, 2023
    Assignee: Arm Limited
    Inventors: Roxana Rusitoru, Jonathan Curtis Beard, Alexander Sebastian Bischoff
  • Patent number: 11550620
    Abstract: Apparatuses and methods are disclosed for performing data processing operations in main processing circuitry and delegating certain tasks to auxiliary processing circuitry. User-specified instructions executed by the main processing circuitry comprise a task dispatch specification specifying an indication of the auxiliary processing circuitry and multiple data words defining a delegated task comprising at least one virtual address indicator. In response to the task dispatch specification the main processing circuitry performs virtual-to-physical address translation with respect to the at least one virtual address indicator to derive at least one physical address indicator, and issues a task dispatch memory write transaction to the auxiliary processing circuitry comprises the indication of the auxiliary processing circuitry and the multiple data words, wherein the at least one virtual address indicator in the multiple data words is substituted by the at least one physical address indicator.
    Type: Grant
    Filed: March 3, 2021
    Date of Patent: January 10, 2023
    Assignee: Arm Limited
    Inventors: Håkan Lars-Göran Persson, Frederic Claude Marie Piry, Matthew Lucien Evans, Albin Pierrick Tonnerre
  • Patent number: 11550745
    Abstract: Techniques are disclosed relating to address mapping for message signaled interrupts. In some embodiments, an apparatus includes interrupt control circuitry configured to process, from multiple client circuits, message signaled interrupts that include addresses in an interrupt controller address space. First and second interface controller circuitry may control respective peripheral interfaces for multiple devices. Remap control circuitry may be configured to access a first table based on at least a portion of virtual addresses of a first message signaled interrupt from the first interface controller circuit and generate a first address in the interrupt controller address space based on an accessed entry in the first table and access a second table based on at least a portion of virtual addresses of a second message signaled interrupt from the second interface controller circuit and generate a second address in the interrupt controller address space based on an accessed entry in the second table.
    Type: Grant
    Filed: September 21, 2021
    Date of Patent: January 10, 2023
    Assignee: Apple Inc.
    Inventor: John H. Kelm
  • Patent number: 11544191
    Abstract: Hardware accelerators for accelerated grouped convolution operations. A first buffer of a hardware accelerator may receive a first row of an input feature map (IFM) from a memory. A first group comprising a plurality of tiles may receive a first row of the IFM. A plurality of processing elements of the first group may compute a portion of a first row of an output feature map (OFM) based on the first row of the IFM and a kernel. A second buffer of the accelerator may receive a third row of the IFM from the memory. A second group comprising a plurality of tiles may receive the third row of the IFM. A plurality of processing elements of the second group may compute a portion of a third row of the OFM based on the third row of the IFM and the kernel as part of a grouped convolution operation.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: January 3, 2023
    Assignee: INTEL CORPORATION
    Inventors: Ambili Vengallur, Bharat Daga, Pradeep K. Janedula, Bijoy Pazhanimala, Aravind Babu Srinivasan
  • Patent number: 11537399
    Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: December 27, 2022
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Ali Sazegari
  • Patent number: 11531607
    Abstract: According to certain embodiments, a system includes one or more processors and one or more computer-readable non-transitory storage media comprising instructions that, when executed by the one or more processors, cause one or more components to perform operations including executing a software process of a secondary instance, the secondary instance running in parallel with a primary instance and associated with a plurality of cores including a bootstrap core, registering a non-maskable interrupt for the bootstrap core in the secondary instance, determining whether the secondary instance is in a fault state, wherein, if the secondary instance is in the fault state, halting the plurality of cores associated with the secondary instance, without impact to the primary instance, and recovering the bootstrap core by switching a context of the bootstrap core from the secondary instance to the primary instance via the non-maskable interrupt.
    Type: Grant
    Filed: April 21, 2020
    Date of Patent: December 20, 2022
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Amit Chandra, Nivin Lawrence, Etienne Martineau
  • Patent number: 11526361
    Abstract: Devices and techniques for variable pipeline length in a barrel-multithreaded processor are described herein. A completion time for an instruction can be determined prior to insertion into a pipeline of a processor. A conflict between the instruction and a different instruction based on the completion time can be detected. Here, the different instruction is already in the pipeline and the conflict detected when the completion time equals the previously determined completion time for the different instruction. A difference between the completion time and an unconflicted completion time can then be calculated and completion of the instruction delayed by the difference.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: December 13, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Tony Brewer
  • Patent number: 11513994
    Abstract: Systems, methods, and apparatus improve synchronization of trigger timing when triggers are configured over a serial bus. A data communication apparatus has an interface circuit that couples the data communication apparatus to a serial bus and is configured to receive a clock signal from the serial bus, a plurality of counters configured to count pulses in the clock signal, and a controller configured to receive a datagram from the serial bus, the datagram including a plurality of data bytes corresponding to the plurality of counters, configure each of the plurality of counters with a count value based on content of a corresponding data byte when the corresponding data byte is received from the datagram, cause each of the counters to refrain from counting until all of the counters have been configured with count values, and actuate a trigger when a counter associated with the trigger has counted to zero.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: November 29, 2022
    Assignee: QUALCOMM Incorporated
    Inventors: Lalan Jee Mishra, Umesh Srikantiah, Richard Dominic Wietfeldt
  • Patent number: 11513800
    Abstract: A method for executing new instructions includes receiving an instruction, and determining whether the received instruction is a new instruction according to an operation code of the received instruction. When the received instruction is a new instruction, the basic decoding information of the received instruction is stored in a private register. And, the system for executing the new instructions enters a system management mode, and simulates the execution of the received instruction according to the basic decoding information stored in the private register in the system management mode; wherein the basic decoding information includes the operation code.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: November 29, 2022
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
  • Patent number: 11507821
    Abstract: The purpose of the present invention is to provide an efficient and versatile neural network circuit while significantly reducing the size and cost of the circuit. The neural network circuit comprises: memory cells 1 which are provided in the same number as that of pieces of input data I, and each of which performs a multiplication function by which each piece of input data I consisting of one bit is multiplied by a weighting coefficient W; and a majority determination circuit 2 for performing an addition/application function by which the multiplication results of the memory cells 1 are added up, an activation function is applied to the addition result, and a piece of one-bit output data is outputted.
    Type: Grant
    Filed: May 19, 2017
    Date of Patent: November 22, 2022
    Assignee: Tokyo Institute of Technology
    Inventor: Masato Motomura
  • Patent number: 11500629
    Abstract: A multiplying-and-accumulating (MAC) circuit includes a multiplying circuit and an adding circuit. The multiplying circuit includes a first multiplier and a second multiplier, and each of the first multiplier and the second multiplier performs a multiplying calculation for first input data with N bits and second input data with M bits to output multiplication result data with (N+M) bits (where, “N” and “M” are natural numbers which are equal to or greater than one). The adding circuit includes an adder which performs an adding calculation for the multiplication result data of the first multiplier and the multiplication result data of the second multiplier to output addition result data with (N+M) bits.
    Type: Grant
    Filed: January 8, 2021
    Date of Patent: November 15, 2022
    Assignee: SK hynix Inc.
    Inventor: Choung Ki Song
  • Patent number: 11494326
    Abstract: To perform complex arithmetic operations in neural networks without compromising the performance of the neural network accelerator, a programmable computation unit is integrated with a direct memory access (DMA) engine that is used to exchange neural network parameters between the neural network accelerator and system memory. The DMA engine may include a calculation circuit operable to perform a multiply-and-add calculation on a set of operands, and an operand selector circuit operable to select a source for each operand of the calculation circuit. The DMA engine may also include a control circuit operable to retrieve a meta-descriptor for performing a computation, configure the operand selector circuit based on the meta-descriptor, and use the calculation circuit to perform the computation based on the meta-descriptor to generate a computation result.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: November 8, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Kun Xu, Ron Diamant
  • Patent number: 11487680
    Abstract: An apparatus and a method are disclosed. In the apparatus, a memory management unit includes: a first cache unit, adapted to store a plurality of first source operands and one first write address; a second cache unit, adapted to store at least one pair of a second source operand and a second destination address; a write cache module, adapted to discriminate between destination addresses of a plurality of store instructions, so as to store, in the first cache unit, a plurality of source operands corresponding to consecutive destination addresses, and to store, in the second cache unit, non-consecutive destination addresses and source operands corresponding to the non-consecutive destination addresses, where the first write address is an initial address of the consecutive destination addresses; and a bus transmission module, adapted to transmit the plurality of first source operands and the first write address in the first cache unit to a memory through a bus in a write burst transmission mode.
    Type: Grant
    Filed: September 10, 2020
    Date of Patent: November 1, 2022
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Xiaoyan Xiang, Yimin Lu, Chaojun Zhao
  • Patent number: 11481352
    Abstract: An example includes detecting receiving a bus turn-around (BTA) sequence after detecting a voltage level; sending a BTA acknowledgement in response to the BTA sequence; and sending a configuration command to a peripheral device after the interface is initialized based on the BTA acknowledgement.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: October 25, 2022
    Assignee: Intel Corporation
    Inventors: Zhenyu Zhu, Nobuyuki Suzuki, Anoop Mukker, Daniel Nemiroff, David W. Vogel