Controlling Moving, Shifting, Or Rotation Operations (epo) Patents (Class 712/E9.034)
  • Patent number: 12164917
    Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.
    Type: Grant
    Filed: May 17, 2023
    Date of Patent: December 10, 2024
    Assignee: Google LLC
    Inventors: Vinayak Anand Gokhale, Matthew Leever Hedlund, Matthew William Ashcraft, Indranil Chakraborty
  • Patent number: 12141583
    Abstract: An apparatus has processing circuitry with execution units to perform operations, physical registers to store data, and forwarding circuitry to forward the data from the physical registers to the execution units. The forwarding circuitry provides an incomplete set of connections between the physical registers and the execution units such that, for each of at least some of the physical registers, the physical register is connected to only a subset of the execution units. The apparatus also has register renaming circuitry to map logical registers identified by the operations to respective physical registers and register reorganisation circuitry to monitor upcoming operations and to determine, based on the upcoming operations and the connections provided by the forwarding circuitry, whether to perform a register reorganisation procedure to change a mapping between the logical registers and the physical registers.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: November 12, 2024
    Assignee: Arm Limited
    Inventors: Xiaoyang Shen, Zichao Xie
  • Patent number: 12124848
    Abstract: An information processing apparatus according to the present invention includes: a load instruction generating unit configured to generate an instruction to continuously access a memory in which a real part and an imaginary part composing complex data are alternately arranged, in accordance with arrangement of the real part and the imaginary part, and load the real part and the imaginary part as respective elements of a vector register; and an operation instruction generating unit configured to generate a vector operation instruction including an instruction to perform a vector operation of elements corresponding to element numbers different from each other between two vector registers and an instruction to perform a masked vector operation.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: October 22, 2024
    Assignee: NEC CORPORATION
    Inventor: Kento Iwakawa
  • Patent number: 12099847
    Abstract: A data processing apparatus comprises: execution circuitry to execute instructions in order to perform data processing operations specified by those instructions; a plurality of registers to store data values for access by the execution circuitry when performing the data processing operations, each register having an associated physical register identifier; register rename circuitry to select physical register identifiers to associate with architectural register identifiers specified by the instructions; and rename storage having a plurality of entries, each entry being associated with one of the architectural register identifiers and used by the register rename circuitry to indicate a physical register identifier selected for association with that one of the architectural register identifiers; the register rename circuitry comprising an execute unit, and being responsive to detection of an early execute condition for a given instruction, the early execute condition requiring at least detection that each sour
    Type: Grant
    Filed: January 26, 2023
    Date of Patent: September 24, 2024
    Assignee: Arm Limited
    Inventors: Quentin Éric Nouvel, Luca Nassi, Adrien Pesle
  • Patent number: 12039332
    Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: July 16, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Dan Baum, Alexander Heinecke, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 12032490
    Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.
    Type: Grant
    Filed: December 1, 2022
    Date of Patent: July 9, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 11977886
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: May 7, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil, Raanan Sade
  • Patent number: 11941397
    Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: March 26, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Xiaodan Tan, Paul Gilbert Meyer
  • Patent number: 11907723
    Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: February 20, 2024
    Assignee: Arm Limited
    Inventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
  • Patent number: 11803382
    Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.
    Type: Grant
    Filed: September 2, 2022
    Date of Patent: October 31, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
  • Patent number: 11775307
    Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. The processing engines can be arranged in pipelines, with different processing engines executing different steps in a sequence of operations. Flow control or data synchronization between pipeline stages can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management. Firmware instructions can define data flow by reference to a virtual address space associated with pipeline buffers. A hardware interlock controller within the pipeline can track and enforce the data dependencies for the pipeline.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: October 3, 2023
    Assignee: Apple Inc.
    Inventors: Steve Hengchen Hsu, Thirunathan Sutharsan, Mohanned Omar Sinnokrot, On Wa Yeung
  • Patent number: 11720356
    Abstract: An apparatus comprises an instruction decoder to decode instructions, processing circuitry to perform data processing in response to the instructions decoded by the instruction decoder, and memory attribute checking circuitry to check whether a memory access request issued by the processing circuitry satisfies access permissions specified in a plurality of memory attribute entries. Each memory attribute entry specifies access permissions for a corresponding address region of variable size within an address space. In response to a range checking instruction specifying address identifying parameters for identifying a first address and a second address, the instruction decoder controls the processing circuitry to set, in at least one software-accessible storage location, a status value indicative of whether the first address and the second address correspond to the same memory attribute entry.
    Type: Grant
    Filed: August 20, 2019
    Date of Patent: August 8, 2023
    Assignee: Arm Limited
    Inventor: Thomas Christopher Grocutt
  • Patent number: 11720362
    Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: August 8, 2023
    Assignee: Intel Corporation
    Inventor: Berkin Akin
  • Patent number: 11681528
    Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: June 20, 2023
    Assignee: Intel Corporation
    Inventor: Berkin Akin
  • Patent number: 11650817
    Abstract: A processor includes a register file comprising a length register, a vector register file comprising a plurality of vector registers, a mask register file comprising a plurality of mask registers, and a vector instruction execution circuit to execute a masked vector instruction comprising a first length register identifier representing the length register, a first vector register identifier representing a first vector register of the vector register file, and a first mask register identifier representing a first mask register of the mask register file, wherein the length register is to store a length value representing a number of operations to be applied to data elements stored in the first vector register, the first mask register is to store a plurality of mask bits, and a first mask bit of the plurality of mask bits determines whether a corresponding first one of the operations causes an effect.
    Type: Grant
    Filed: September 18, 2019
    Date of Patent: May 16, 2023
    Assignee: Optimum Semiconductor Technologies Inc.
    Inventors: Mayan Moudgill, Murugappan Senthilvelan
  • Patent number: 11635967
    Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: April 25, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
  • Patent number: 11616596
    Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. To implement physical downlink shared channel (PDSCH) decoding, a cellular modem can include a pipeline having multiple processing engines, with the processing engine including functional units that execute instructions corresponding to different stages in the PDSCH decoding process. Flow control and data synchronization between instructions can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: March 28, 2023
    Assignee: Apple Inc.
    Inventors: Thirunathan Sutharsan, Mohanned Omar Sinnokrot
  • Patent number: 11604648
    Abstract: A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.
    Type: Grant
    Filed: June 22, 2021
    Date of Patent: March 14, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Dheera Balasubramanian Samudrala, Duc Bui
  • Patent number: 11595154
    Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. To implement PDCCH decoding, a cellular modem can include a pipeline having multiple processing engines, with the processing engines including functional units that execute instructions corresponding to different stages in the PDCCH decoding process. Flow control and data synchronization between instructions can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: February 28, 2023
    Assignee: Apple Inc.
    Inventors: Steve Hengchen Hsu, Mohanned Omar Sinnokrot
  • Patent number: 11561982
    Abstract: In a database environment including a plurality of logical object definitions having relationships defined according to a schema, and logical object instances following the logical object definitions include attribute names and respective attribute values indicating status of an enterprise in an enterprise resource planning system, the method can receive a starting exception definition specifying a first query against the logical object instances and derive a new exception definition based on the starting exception definition and one or more stored, acted-upon exception definition proposals. The first query can include one or more initial situational trigger conditions. The new exception definition can specify a second query against the logical object instances and the second query can include one or more modified situational trigger conditions.
    Type: Grant
    Filed: February 19, 2020
    Date of Patent: January 24, 2023
    Assignee: SAP SE
    Inventors: Axel Herbst, Knut Manske
  • Patent number: 11561828
    Abstract: Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.
    Type: Grant
    Filed: May 11, 2021
    Date of Patent: January 24, 2023
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Varghese George, Altug Koker, Aravindh Anantaraman, SungYe Kim, Valentin Andrei, Joydeep Ray
  • Patent number: 11544060
    Abstract: An image processor is described. The image processor includes a two dimensional shift register array that couples certain ones of its array locations to support execution of a shift instruction. The shift instruction is to include mask information. The mask information is to specify which of the array locations are to be written to with information being shifted. The two dimensional shift register array includes masking logic circuitry to write the information being shifted into specified ones of the array locations in accordance with the mask information.
    Type: Grant
    Filed: February 8, 2021
    Date of Patent: January 3, 2023
    Assignee: Google LLC
    Inventor: Albert Meixner
  • Patent number: 11520629
    Abstract: The subject technology provides for dynamic task allocation for neural network models. The subject technology determines an operation performed at a node of a neural network model. The subject technology assigns an annotation to indicate whether the operation is better performed on a CPU or a GPU based at least in part on hardware capabilities of a target platform. The subject technology determines whether the neural network model includes a second layer. The subject technology, in response to determining that the neural network model includes a second layer, for each node of the second layer of the neural network model, determines a second operation performed at the node. Further the subject technology assigns a second annotation to indicate whether the second operation is better performed on the CPU or the GPU based at least in part on the hardware capabilities of the target platform.
    Type: Grant
    Filed: January 29, 2020
    Date of Patent: December 6, 2022
    Assignee: Apple Inc.
    Inventors: Francesco Rossi, Gaurav Kapoor, Michael R. Siracusa, William B. March
  • Patent number: 11507130
    Abstract: Apparatuses, systems, and methods for distributing a global counter value in a multi-socket SoC complex. In exemplary aspects, an apparatus comprises a first system-on-a-chip (SoC) in a first socket and a second SoC in a second socket. The apparatus further comprises a reset circuit coupled to the first SoC and the second SoC, a reset synchronization circuit coupled to the reset circuit, the first SoC, and the second SoC, and a global counter clock signal coupled to the reset synchronization circuit, the first SoC, and the second SoC. The reset synchronization circuit is configured to generate a global counter reset signal in response to a reset signal received from the reset circuit and to distribute the global counter reset signal to the first SoC and the second SoC substantially simultaneously.
    Type: Grant
    Filed: February 3, 2021
    Date of Patent: November 22, 2022
    Assignee: Ampere Computing LLC
    Inventors: Kha Hong Nguyen, Brian Thomas Chase, Sean Philip Mirkes, Phil Mitchell, Graham B. Whitted, III
  • Patent number: 11481290
    Abstract: An apparatus and a method of operating a data processing apparatus, and simulators thereof, are disclosed. Data processing circuitry performs data processing operations in response to instructions, where some sets of instructions may be defined as a transaction which are to be performed atomically with respect to other operations performed by the data processing circuitry. When a synchronous exception occurs during a transaction the transaction is aborted and an exception counter is incremented. When the counter reaches a threshold value a transaction failure signal is generated, allowing, if appropriate a response to this number of exceptions causing transaction aborts to be carried out.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: October 25, 2022
    Assignee: Arm Limited
    Inventors: Matthew James Horsnell, Grigorios Magklis, Stephan Diestelhorst
  • Patent number: 11461244
    Abstract: Implementations described provide hardware support for the co-existence of restricted and non-restricted encryption keys on a computing system. Such hardware support may comprise a processor having a core, a hardware register to store a bit range to identify a number of bits, of physical memory addresses, that define key identifiers (IDs) and a partition key ID identifying a boundary between non-restricted and restricted key IDs. The core may allocate at least one of the non-restricted key IDs to a software program, such as a hypervisor. The core may further allocate a restricted key ID to a trust domain whose trust computing base does not comprise the software program. A memory controller coupled to the core may allocate a physical page of a memory to the trust domain, wherein data of the physical page of the memory is to be encrypted with an encryption key associated with the restricted key ID.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: October 4, 2022
    Assignee: Intel Corporation
    Inventors: Ido Ouziel, Arie Aharon, Dror Caspi, Baruch Chaikin, Jacob Doweck, Gideon Gerzon, Barry E. Huntley, Francis X. McKeen, Gilbert Neiger, Carlos V. Rozas, Ravi L. Sahita, Vedvyas Shanbhogue, Assaf Zaltsman, Hormuzd M. Khosravi
  • Patent number: 9015453
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: April 21, 2015
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 8914617
    Abstract: Methods and apparatus relating to a hardware move elimination and/or next page prefetching are described. In some embodiments, a logic may provide hardware move eliminations based on stored data. In an embodiment, a next page prefetcher is disclosed. Other embodiments are also described and claimed.
    Type: Grant
    Filed: December 24, 2010
    Date of Patent: December 16, 2014
    Assignee: Intel Corporation
    Inventors: Shlomo Raikin, David J. Sager, Zeev Sperber, Evgeni Krimer, Ori Lempel, Stanislav Shwartsman, Adi Yoaz, Omer Golz
  • Patent number: 8793475
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan