Patents Examined by Corey S Faherty
-
Patent number: 11544070Abstract: The present disclosure is directed to systems and methods for mitigating or eliminating the effectiveness of a side-channel based attack, such as one or more classes of an attack commonly known as Spectre. Novel instruction prefixes, and in certain embodiments one or more corresponding instruction prefix parameters, may be provided to enforce a serialized order of execution for particular instructions without serializing an entire instruction flow, thereby improving performance and mitigation reliability over existing solutions. In addition, improved mitigation of such attacks is provided by randomizing both the execution branch history as well as the source address of each vulnerable indirect branch, thereby eliminating the conditions required for such attacks.Type: GrantFiled: July 28, 2021Date of Patent: January 3, 2023Assignee: Intel CorporationInventors: Rodrigo Branco, Kekai Hu, Ke Sun, Henrique Kawakami
-
Patent number: 11537395Abstract: This application relates to a method for optimizing algorithm performance using precision scaling, wherein the method according to an embodiment of present invention comprises obtaining a number of iterations of a unit operation according to precisions of the algorithm including the unit operation that is repeatedly performed, wherein the precisions include a first precision and a second precision, and the number of iterations include a first number of iterations corresponding to the first precision and a second number of iterations corresponding to the second precision; inspecting available precisions of a device on which the algorithm is to be executed, wherein the available precisions include a first available precision corresponding to the first precision and a second available precision corresponding to the second precision; determining an optimal precision by repeatedly performing the unit operation corresponding to an initial operation of the algorithm using the inspected available precision; and repeaType: GrantFiled: November 4, 2021Date of Patent: December 27, 2022Assignee: Industry-University Cooperation Foundation Hanyang UniversityInventors: Yongjun Park, Seokwon Kang, Sang Wook Kim, Hong Kyun Bae, Jae Seo Yu, Kyunghwan Choi
-
Patent number: 11537397Abstract: Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.Type: GrantFiled: March 26, 2018Date of Patent: December 27, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Farzad Khorasani, Amin Farmahini-Farahani, Nuwan S. Jayasena
-
Patent number: 11531922Abstract: An apparatus and method for scalable qubit addressing. For example, one embodiment of a processor comprises: a decoder comprising quantum instruction decode circuitry to decode quantum instructions to generate quantum microoperations (uops) and non-quantum decode circuitry to decode non-quantum instructions to generate non-quantum uops; execution circuitry comprising: an address generation unit (AGU) to generate a system memory address responsive to execution of one or more of the non-quantum uops; and quantum index generation circuitry to generate quantum index values responsive to execution of one or more of the quantum uops, each quantum index value uniquely identifying a quantum bit (qubit) in a quantum processor; wherein to generate a first quantum index value for a first quantum uop, the quantum index generation circuitry is to read the first quantum index value from a first architectural register identified by the first quantum uop.Type: GrantFiled: September 27, 2018Date of Patent: December 20, 2022Assignee: Intel CorporationInventor: Xiang Zou
-
Patent number: 11520586Abstract: A renaming unit configured to rename source operands of instructions in a group. A renaming register maintains architectural to physical register mappings. Architectural to physical register mappings propagate from the renaming register through a chain of update units (U) over bus lines denoted with the architectural registers 0 to L. Update units (U) sequentially, in program order, insert physical register identifiers PR(i) allocated to instructions I(i) with destination operands DOP(i) on bus lines denoted with the destination operands DOP(i). Source operands of an instruction I(i) may be renamed to physical register identifiers after physical register identifiers allocated to instructions older than I(i) are sequentially, in program order, inserted on the bus lines, but before physical register identifiers allocated to I(i) and younger instructions are inserted on the bus lines. A source operand SOP(i) is renamed to a physical register identifier that propagates on a bus line denoted with SOP(i).Type: GrantFiled: July 8, 2021Date of Patent: December 6, 2022Inventor: Dejan Spasov
-
Patent number: 11507520Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.Type: GrantFiled: March 1, 2021Date of Patent: November 22, 2022Assignee: Texas Instruments IncorporatedInventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak
-
Patent number: 11500642Abstract: Provided is a method for assigning register tags to instructions at issue time. The method comprises receiving an instruction for execution by a microprocessor. The method further comprises dispatching the instruction to an issue queue without assigning a register tag to the instruction. The method further comprises determining that the instruction is ready to issue. In response to determining that the instruction is ready to issue, the method comprises assigning an available register tag to the instruction. The method further comprises issuing the instruction.Type: GrantFiled: November 10, 2020Date of Patent: November 15, 2022Assignee: International Busines Machines CorporationInventors: Steven J. Battle, Jentje Leenstra, Brian D. Barrick, Dung Q. Nguyen, Brian W. Thompto
-
Patent number: 11500636Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations.Type: GrantFiled: February 24, 2020Date of Patent: November 15, 2022Assignee: Intel CorporationInventors: Christopher J. Hughes, Joseph Nuzman, Jonas Svennebring, Doddaballapur N. Jayasimha, Samantika S. Sury, David A. Koufaty, Niall D. McDonnell, Yen-Cheng Liu, Stephen R. Van Doren, Stephen J. Robinson
-
Patent number: 11487541Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.Type: GrantFiled: November 30, 2020Date of Patent: November 1, 2022Assignee: Intel CorporationInventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
-
Patent number: 11481215Abstract: The present disclosure provides a computing method that is applied to a computing device. The computing device includes: a memory, a register unit, and a matrix computing unit. The method includes the following steps: controlling, by the computing device, the matrix computing unit to obtain a first operation instruction, where the first operation instruction includes a matrix reading instruction for a matrix required for executing the instruction; controlling, by the computing device, an operating unit to send a reading command to the memory according to the matrix reading instruction; and controlling, by the computing device, the operating unit to read a matrix corresponding to the matrix reading instruction in a batch reading manner, and executing the first operation instruction on the matrix. The technical solutions in the present disclosure have the advantages of fast computing speed and high efficiency.Type: GrantFiled: January 17, 2020Date of Patent: October 25, 2022Assignee: Cambricon (Xi'an) Semiconductor Co., Ltd.Inventors: Tianshi Chen, Shaoli Liu, Zai Wang, Shuai Hu
-
Patent number: 11481384Abstract: An apparatus is provided comprising storage elements to store data blocks, where each data block has capability metadata associated therewith identifying whether the data block specifies a capability, at least one capability type being a bounded pointer. Processing circuitry is then arranged to be responsive to a bulk capability metadata operation identifying a plurality of the storage elements, to perform an operation on the capability metadata associated with each data block stored in the plurality of storage elements. Via a single specified operation, this hence enables query and/or modification operations to be performed on multiple items of capability metadata, hence providing more efficient access to such capability metadata.Type: GrantFiled: March 29, 2017Date of Patent: October 25, 2022Assignee: Arm LimitedInventors: Graeme Peter Barnes, Stuart David Biles
-
Patent number: 11461105Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.Type: GrantFiled: April 7, 2017Date of Patent: October 4, 2022Assignee: Intel CorporationInventors: Liu Yang, Anbang Yao
-
Patent number: 11461095Abstract: The present disclosure relates to a method of storing, by a load and store circuit or other processing means, a variable precision floating point value to a memory address of a memory, the method comprising: reducing the bit length of the variable precision floating point value to no more than a size limit, and storing the variable precision floating point value to one of a plurality of storage zones in the memory, each of the plurality of storage zones having a storage space equal to or greater than the size limit (MBB).Type: GrantFiled: March 6, 2020Date of Patent: October 4, 2022Assignees: Commissariat à l'Energie Atomique et aux Energies Alternatives, Institut National des Sciences Appliquées de LyonInventors: Andrea Bocco, Florent Dupont De Dinechin, Yves Durand
-
Patent number: 11455167Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.Type: GrantFiled: December 2, 2019Date of Patent: September 27, 2022Assignee: Intel CoporationInventors: Raanan Sade, Thierry Pons, Amit Gradstein, Zeev Sperber, Mark J. Charney, Robert Valentine, Eyal Oz-Sinay
-
Patent number: 11449341Abstract: The invention relates to an electronic device for data processing, which includes an execution unit with a temporary register, a register file, a first feedback path from the data output of the execution unit to the register file, a second feedback path from the data output of the execution unit to the temporary register, a switch configured to connect the first feedback path and/or the second feedback path, and a logic stage coupled to control the switch. The control stage is configured to control the switch to connect the second feedback path if the data output of an execution unit is used as an operand in the subsequent operation of an execution unit.Type: GrantFiled: September 9, 2019Date of Patent: September 20, 2022Assignee: Texas Instruments IncorporatedInventors: Marko Krüger, Steven Bartling, Markus Kösler
-
Patent number: 11436015Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.Type: GrantFiled: September 13, 2019Date of Patent: September 6, 2022Assignee: Texas Instmments IncorporatedInventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
-
Patent number: 11436186Abstract: An algorithmic matching pipelined compiler and a reusable algorithmic pipelined core comprise a high throughput processor system. The reusable algorithmic pipelined core is a reconfigurable processing core with a pipelined structure comprising a processor with a setup interface for programming any of a plurality of operations as determined by setup data, a logic decision processor for programming a look up table, a loop counter and a constant register, and a block of memory. This can be used to perform functions. A reconfigurable, programmable circuit routes data and results from one core to another core and/or IO controller and/or interrupt generator, as required to complete an algorithm without further intervention from a central or peripheral processor during processing of an algorithm.Type: GrantFiled: June 22, 2018Date of Patent: September 6, 2022Assignee: ICAT LLCInventors: Robert D Catiller, Daniel Roig, Gnanashanmugam Elumalai
-
Patent number: 11422969Abstract: This disclosure relates to a distributed processing system for configuring multiple processing channels. The distributed processing system includes a main processor, such as an ARM processor, communicatively coupled to a plurality of co-processors, such as stream processors. The co-processors can execute instructions in parallel with each other and interrupt the ARM processor. Longer latency instructions can be executed by the main processor and lower latency instructions can be executed by the co-processors. There are several ways that a stream can be triggered in the distributed processing system. In an embodiment, the distributed processing system is a stream processor system that includes an ARM processor and stream processors configured to access different register sets. The stream processors can include a main stream processor and stream processors in respective transmit and receive channels. The stream processor system can be implemented in a radio system to configure the radio for operation.Type: GrantFiled: June 26, 2020Date of Patent: August 23, 2022Assignee: Analog Devices, Inc.Inventors: Manish J. Manglani, Shipra Bhal, Christopher Mayer
-
Patent number: 11416259Abstract: Disclosed herein are system, method, and computer program product embodiments for utilizing look-ahead-staging (LAS) to guarantee the ability to rollback and reconstruct a package while minimizing locking duration and enabling multiple packages to be processed in a data pipeline simultaneously. An embodiment operates by receiving a package from a source system for processing through a data pipeline. The embodiment stores the package in a persistent storage together with a respective package status. The embodiment transmits the package to the data pipeline in response to the storing. The embodiment receives a commit notification for the package from a target system in response to the transmitting. The embodiment then removes the package from the persistent storage in response to receiving the commit notification for the package.Type: GrantFiled: December 11, 2020Date of Patent: August 16, 2022Assignee: SAP SEInventors: Daniel Bos, Dan Liu, Tobias Karpstein
-
Patent number: 11416253Abstract: A processor includes two or more branch target buffer (BTB) tables for branch prediction, each BTB table storing entries of a different target size or width or storing entries of a different branch type. Each BTB entry includes at least a tag and a target address. For certain branch types that only require a few target address bits, the respective BTB tables are narrower thereby allowing for more BTB entries in the processor separated into respective BTB tables by branch instruction type. An increased number of available BTB entries are stored in a same or a less space in the processor thereby increasing a speed of instruction processing. BTB tables can be defined that do not store any target address and rely on a decode unit to provide it. High value BTB entries have dedicated storage and are therefore less likely to be evicted than low value BTB entries.Type: GrantFiled: July 10, 2020Date of Patent: August 16, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Thomas Clouqueur, Anthony Jarvis