Controlling Moving, Shifting, Or Rotation Operations (epo) Patents (Class 712/E9.034)
-
Patent number: 12164917Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.Type: GrantFiled: May 17, 2023Date of Patent: December 10, 2024Assignee: Google LLCInventors: Vinayak Anand Gokhale, Matthew Leever Hedlund, Matthew William Ashcraft, Indranil Chakraborty
-
Patent number: 12141583Abstract: An apparatus has processing circuitry with execution units to perform operations, physical registers to store data, and forwarding circuitry to forward the data from the physical registers to the execution units. The forwarding circuitry provides an incomplete set of connections between the physical registers and the execution units such that, for each of at least some of the physical registers, the physical register is connected to only a subset of the execution units. The apparatus also has register renaming circuitry to map logical registers identified by the operations to respective physical registers and register reorganisation circuitry to monitor upcoming operations and to determine, based on the upcoming operations and the connections provided by the forwarding circuitry, whether to perform a register reorganisation procedure to change a mapping between the logical registers and the physical registers.Type: GrantFiled: September 13, 2022Date of Patent: November 12, 2024Assignee: Arm LimitedInventors: Xiaoyang Shen, Zichao Xie
-
Patent number: 12124848Abstract: An information processing apparatus according to the present invention includes: a load instruction generating unit configured to generate an instruction to continuously access a memory in which a real part and an imaginary part composing complex data are alternately arranged, in accordance with arrangement of the real part and the imaginary part, and load the real part and the imaginary part as respective elements of a vector register; and an operation instruction generating unit configured to generate a vector operation instruction including an instruction to perform a vector operation of elements corresponding to element numbers different from each other between two vector registers and an instruction to perform a masked vector operation.Type: GrantFiled: August 21, 2019Date of Patent: October 22, 2024Assignee: NEC CORPORATIONInventor: Kento Iwakawa
-
Patent number: 12099847Abstract: A data processing apparatus comprises: execution circuitry to execute instructions in order to perform data processing operations specified by those instructions; a plurality of registers to store data values for access by the execution circuitry when performing the data processing operations, each register having an associated physical register identifier; register rename circuitry to select physical register identifiers to associate with architectural register identifiers specified by the instructions; and rename storage having a plurality of entries, each entry being associated with one of the architectural register identifiers and used by the register rename circuitry to indicate a physical register identifier selected for association with that one of the architectural register identifiers; the register rename circuitry comprising an execute unit, and being responsive to detection of an early execute condition for a given instruction, the early execute condition requiring at least detection that each sourType: GrantFiled: January 26, 2023Date of Patent: September 24, 2024Assignee: Arm LimitedInventors: Quentin Éric Nouvel, Luca Nassi, Adrien Pesle
-
Patent number: 12039332Abstract: Detailed herein are embodiment systems, processors, and methods for matrix move. For example, a processor comprising decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to move each data element of the identified source matrix operand to corresponding data element position of the identified destination matrix operand is described.Type: GrantFiled: January 28, 2022Date of Patent: July 16, 2024Assignee: Intel CorporationInventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Dan Baum, Alexander Heinecke, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 12032490Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.Type: GrantFiled: December 1, 2022Date of Patent: July 9, 2024Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Mujibur Rahman
-
Patent number: 11977886Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.Type: GrantFiled: March 28, 2022Date of Patent: May 7, 2024Assignee: Intel CorporationInventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil, Raanan Sade
-
Patent number: 11941397Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.Type: GrantFiled: May 31, 2022Date of Patent: March 26, 2024Assignee: Amazon Technologies, Inc.Inventors: Xiaodan Tan, Paul Gilbert Meyer
-
Patent number: 11907723Abstract: A data processing apparatus is provided. Rename circuitry performs a register rename stage of a pipeline by storing, in storage circuitry, mappings between registers. Each of the mappings is associated with an elimination field value. Operation elimination circuitry replaces an operation that indicates an action is to be performed on data from a source register and stored in a destination register, with a new mapping in the storage circuitry that references the destination register and has the elimination field value set. Operation circuitry responds to a subsequent operation that accesses the destination register when the elimination field value is set; by obtaining contents of the source register, performing the action on the contents to obtain a result, and returning the result.Type: GrantFiled: March 21, 2022Date of Patent: February 20, 2024Assignee: Arm LimitedInventors: Nicholas Andrew Plante, Joseph Michael Pusdesris, Jungsoo Kim
-
Patent number: 11803382Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.Type: GrantFiled: September 2, 2022Date of Patent: October 31, 2023Assignee: Texas Instruments IncorporatedInventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
-
Patent number: 11775307Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. The processing engines can be arranged in pipelines, with different processing engines executing different steps in a sequence of operations. Flow control or data synchronization between pipeline stages can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management. Firmware instructions can define data flow by reference to a virtual address space associated with pipeline buffers. A hardware interlock controller within the pipeline can track and enforce the data dependencies for the pipeline.Type: GrantFiled: September 24, 2021Date of Patent: October 3, 2023Assignee: Apple Inc.Inventors: Steve Hengchen Hsu, Thirunathan Sutharsan, Mohanned Omar Sinnokrot, On Wa Yeung
-
Patent number: 11720356Abstract: An apparatus comprises an instruction decoder to decode instructions, processing circuitry to perform data processing in response to the instructions decoded by the instruction decoder, and memory attribute checking circuitry to check whether a memory access request issued by the processing circuitry satisfies access permissions specified in a plurality of memory attribute entries. Each memory attribute entry specifies access permissions for a corresponding address region of variable size within an address space. In response to a range checking instruction specifying address identifying parameters for identifying a first address and a second address, the instruction decoder controls the processing circuitry to set, in at least one software-accessible storage location, a status value indicative of whether the first address and the second address correspond to the same memory attribute entry.Type: GrantFiled: August 20, 2019Date of Patent: August 8, 2023Assignee: Arm LimitedInventor: Thomas Christopher Grocutt
-
Patent number: 11720362Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.Type: GrantFiled: December 22, 2020Date of Patent: August 8, 2023Assignee: Intel CorporationInventor: Berkin Akin
-
Patent number: 11681528Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.Type: GrantFiled: December 22, 2020Date of Patent: June 20, 2023Assignee: Intel CorporationInventor: Berkin Akin
-
Patent number: 11650817Abstract: A processor includes a register file comprising a length register, a vector register file comprising a plurality of vector registers, a mask register file comprising a plurality of mask registers, and a vector instruction execution circuit to execute a masked vector instruction comprising a first length register identifier representing the length register, a first vector register identifier representing a first vector register of the vector register file, and a first mask register identifier representing a first mask register of the mask register file, wherein the length register is to store a length value representing a number of operations to be applied to data elements stored in the first vector register, the first mask register is to store a plurality of mask bits, and a first mask bit of the plurality of mask bits determines whether a corresponding first one of the operations causes an effect.Type: GrantFiled: September 18, 2019Date of Patent: May 16, 2023Assignee: Optimum Semiconductor Technologies Inc.Inventors: Mayan Moudgill, Murugappan Senthilvelan
-
Patent number: 11635967Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.Type: GrantFiled: September 25, 2020Date of Patent: April 25, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari, Maxim V. Kazakov
-
Patent number: 11616596Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. To implement physical downlink shared channel (PDSCH) decoding, a cellular modem can include a pipeline having multiple processing engines, with the processing engine including functional units that execute instructions corresponding to different stages in the PDSCH decoding process. Flow control and data synchronization between instructions can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management.Type: GrantFiled: September 24, 2021Date of Patent: March 28, 2023Assignee: Apple Inc.Inventors: Thirunathan Sutharsan, Mohanned Omar Sinnokrot
-
Patent number: 11604648Abstract: A method to transpose source data in a processor in response to a vector bit transpose instruction includes specifying, in respective fields of the vector bit transpose instruction, a source register containing the source data and a destination register to store transposed data. The method also includes executing the vector bit transpose instruction by interpreting N×N bits of the source data as a two-dimensional array having N rows and N columns, creating transposed source data by transposing the bits by reversing a row index and a column index for each bit, and storing the transposed source data in the destination register.Type: GrantFiled: June 22, 2021Date of Patent: March 14, 2023Assignee: Texas Instruments IncorporatedInventors: Joseph Zbiciak, Dheera Balasubramanian Samudrala, Duc Bui
-
Patent number: 11595154Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. To implement PDCCH decoding, a cellular modem can include a pipeline having multiple processing engines, with the processing engines including functional units that execute instructions corresponding to different stages in the PDCCH decoding process. Flow control and data synchronization between instructions can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management.Type: GrantFiled: September 24, 2021Date of Patent: February 28, 2023Assignee: Apple Inc.Inventors: Steve Hengchen Hsu, Mohanned Omar Sinnokrot
-
Patent number: 11561982Abstract: In a database environment including a plurality of logical object definitions having relationships defined according to a schema, and logical object instances following the logical object definitions include attribute names and respective attribute values indicating status of an enterprise in an enterprise resource planning system, the method can receive a starting exception definition specifying a first query against the logical object instances and derive a new exception definition based on the starting exception definition and one or more stored, acted-upon exception definition proposals. The first query can include one or more initial situational trigger conditions. The new exception definition can specify a second query against the logical object instances and the second query can include one or more modified situational trigger conditions.Type: GrantFiled: February 19, 2020Date of Patent: January 24, 2023Assignee: SAP SEInventors: Axel Herbst, Knut Manske
-
Patent number: 11561828Abstract: Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.Type: GrantFiled: May 11, 2021Date of Patent: January 24, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Varghese George, Altug Koker, Aravindh Anantaraman, SungYe Kim, Valentin Andrei, Joydeep Ray
-
Patent number: 11544060Abstract: An image processor is described. The image processor includes a two dimensional shift register array that couples certain ones of its array locations to support execution of a shift instruction. The shift instruction is to include mask information. The mask information is to specify which of the array locations are to be written to with information being shifted. The two dimensional shift register array includes masking logic circuitry to write the information being shifted into specified ones of the array locations in accordance with the mask information.Type: GrantFiled: February 8, 2021Date of Patent: January 3, 2023Assignee: Google LLCInventor: Albert Meixner
-
Patent number: 11520629Abstract: The subject technology provides for dynamic task allocation for neural network models. The subject technology determines an operation performed at a node of a neural network model. The subject technology assigns an annotation to indicate whether the operation is better performed on a CPU or a GPU based at least in part on hardware capabilities of a target platform. The subject technology determines whether the neural network model includes a second layer. The subject technology, in response to determining that the neural network model includes a second layer, for each node of the second layer of the neural network model, determines a second operation performed at the node. Further the subject technology assigns a second annotation to indicate whether the second operation is better performed on the CPU or the GPU based at least in part on the hardware capabilities of the target platform.Type: GrantFiled: January 29, 2020Date of Patent: December 6, 2022Assignee: Apple Inc.Inventors: Francesco Rossi, Gaurav Kapoor, Michael R. Siracusa, William B. March
-
Patent number: 11507130Abstract: Apparatuses, systems, and methods for distributing a global counter value in a multi-socket SoC complex. In exemplary aspects, an apparatus comprises a first system-on-a-chip (SoC) in a first socket and a second SoC in a second socket. The apparatus further comprises a reset circuit coupled to the first SoC and the second SoC, a reset synchronization circuit coupled to the reset circuit, the first SoC, and the second SoC, and a global counter clock signal coupled to the reset synchronization circuit, the first SoC, and the second SoC. The reset synchronization circuit is configured to generate a global counter reset signal in response to a reset signal received from the reset circuit and to distribute the global counter reset signal to the first SoC and the second SoC substantially simultaneously.Type: GrantFiled: February 3, 2021Date of Patent: November 22, 2022Assignee: Ampere Computing LLCInventors: Kha Hong Nguyen, Brian Thomas Chase, Sean Philip Mirkes, Phil Mitchell, Graham B. Whitted, III
-
Patent number: 11481290Abstract: An apparatus and a method of operating a data processing apparatus, and simulators thereof, are disclosed. Data processing circuitry performs data processing operations in response to instructions, where some sets of instructions may be defined as a transaction which are to be performed atomically with respect to other operations performed by the data processing circuitry. When a synchronous exception occurs during a transaction the transaction is aborted and an exception counter is incremented. When the counter reaches a threshold value a transaction failure signal is generated, allowing, if appropriate a response to this number of exceptions causing transaction aborts to be carried out.Type: GrantFiled: April 8, 2019Date of Patent: October 25, 2022Assignee: Arm LimitedInventors: Matthew James Horsnell, Grigorios Magklis, Stephan Diestelhorst
-
Patent number: 11461244Abstract: Implementations described provide hardware support for the co-existence of restricted and non-restricted encryption keys on a computing system. Such hardware support may comprise a processor having a core, a hardware register to store a bit range to identify a number of bits, of physical memory addresses, that define key identifiers (IDs) and a partition key ID identifying a boundary between non-restricted and restricted key IDs. The core may allocate at least one of the non-restricted key IDs to a software program, such as a hypervisor. The core may further allocate a restricted key ID to a trust domain whose trust computing base does not comprise the software program. A memory controller coupled to the core may allocate a physical page of a memory to the trust domain, wherein data of the physical page of the memory is to be encrypted with an encryption key associated with the restricted key ID.Type: GrantFiled: December 20, 2018Date of Patent: October 4, 2022Assignee: Intel CorporationInventors: Ido Ouziel, Arie Aharon, Dror Caspi, Baruch Chaikin, Jacob Doweck, Gideon Gerzon, Barry E. Huntley, Francis X. McKeen, Gilbert Neiger, Carlos V. Rozas, Ravi L. Sahita, Vedvyas Shanbhogue, Assaf Zaltsman, Hormuzd M. Khosravi
-
Patent number: 9015453Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.Type: GrantFiled: December 29, 2012Date of Patent: April 21, 2015Assignee: Intel CorporationInventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
-
Patent number: 8914617Abstract: Methods and apparatus relating to a hardware move elimination and/or next page prefetching are described. In some embodiments, a logic may provide hardware move eliminations based on stored data. In an embodiment, a next page prefetcher is disclosed. Other embodiments are also described and claimed.Type: GrantFiled: December 24, 2010Date of Patent: December 16, 2014Assignee: Intel CorporationInventors: Shlomo Raikin, David J. Sager, Zeev Sperber, Evgeni Krimer, Ori Lempel, Stanislav Shwartsman, Adi Yoaz, Omer Golz
-
Patent number: 8793475Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.Type: GrantFiled: December 29, 2012Date of Patent: July 29, 2014Assignee: Intel CorporationInventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan