Patents Examined by William V Nguyen
  • Patent number: 10289418
    Abstract: Techniques are provided for handling a trap encountered in a thread that is part of a thread array that is being executed in a plurality of execution units. In these techniques, a data structure with an identifier associated with the thread is updated to indicate that the trap occurred during the execution of the thread array. Also in these techniques, the execution units execute a trap handling routine that includes a context switch. The execution units perform this context switch for at least one of the execution units as part of the trap handling routine while allowing the remaining execution units to exit the trap handling routine before the context switch. One advantage of the disclosed techniques is that the trap handling routine operates efficiently in parallel processors.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: May 14, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Gerald F. Luiz, Philip Alexander Cuadra, Luke Durant, Shirish Gadre, Robert Ohannessian, Lacky V. Shah, Nicholas Wang, Arthur Danskin
  • Patent number: 10248425
    Abstract: A processor including physical registers, a reorder buffer, a master free list, a slave free list, a master recycle circuit, and a slave recycle circuit. The reorder buffer includes instruction entries in which each entry stores physical register indexes for recycling physical registers. The reorder buffer retires up to N instructions in each processor cycle. Each master and slave free list includes N input ports and stores physical register indexes, in which the master free list stores indexes of physical registers to be allocated to instructions being issued. When an instruction is retired, the master recycle circuit routes a first physical register index stored in an instruction entry of the instruction to an input port of the master free list, and the slave recycle circuit routes a second physical register index stored in the instruction entry of the instruction to an input port of the slave free list.
    Type: Grant
    Filed: August 12, 2016
    Date of Patent: April 2, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Xiaolong Fei
  • Patent number: 10235172
    Abstract: A branch predictor for predicting branch instructions performs different branch prediction operations for branches executing in a transaction than those not-executing in a transaction, including suppressing branch prediction functions based on progress of a re-execution of a previously aborted transaction, the transaction buffering data and committing the buffered data to memory when the transaction completes, but discarding the buffered data when the transaction aborts.
    Type: Grant
    Filed: June 2, 2014
    Date of Patent: March 19, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K Gschwind, Valentina Salapura
  • Patent number: 10229035
    Abstract: Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: March 12, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joerg Deutschle, Ursel Hahn, Joerg Walter, Ernst-Dieter Weissenberger
  • Patent number: 10223125
    Abstract: An execution slice circuit for a processor core has multiple parallel instruction execution slices and provides flexible and efficient use of internal resources. The execution slice circuit includes a master execution slice for receiving instructions of a first instruction stream and a slave execution slice for receiving instructions of a second instruction stream and instructions of the first instruction stream that require an execution width greater than a width of the slices. The execution slice circuit also includes a control logic that detects when a first instruction of the first instruction stream has the greater width and controls the slave execution slice to reserve a first issue cycle for issuing the first instruction in parallel across the master execution slice and the slave execution slice.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: March 5, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey Carl Brownscheidle, Sundeep Chadha, Maureen Anne Delaney, Hung Qui Le, Dung Quoc Nguyen, Brian William Thompto
  • Patent number: 10209986
    Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
  • Patent number: 10203955
    Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: February 12, 2019
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Christopher J. Hughes, Mark J. Charney, Zeev Sperber, Amit Gradstein, Simon Rubanovich, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil
  • Patent number: 10191742
    Abstract: A processor saves micro-architectural contexts to increase the efficiency of code execution and power management. Power management hardware during runtime monitors execution of a code block. The code block has been compiled to have a reserved space appended to one end of the code block. The reserved space includes a metadata block associated with the code block or an identifier of the metadata block. The hardware stores a micro-architectural context of the processor in the metadata block. The micro-architectural context includes performance data resulting from a first execution of the code block. The hardware reads the metadata block upon a second execution of the code block and tunes the second execution based on the performance data.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: January 29, 2019
    Assignee: Intel Corporation
    Inventors: Efraim Rotem, Eliezer Weissmann, Boris Ginzburg, Alon Naveh, Nadav Shulman, Ronny Ronen
  • Patent number: 10185566
    Abstract: In one embodiment, the present invention includes a multicore processor having first and second cores to independently execute instructions, the first core visible to an operating system (OS) and the second core transparent to the OS and heterogeneous from the first core. A task controller, which may be included in or coupled to the multicore processor, can cause dynamic migration of a first process scheduled by the OS to the first core to the second core transparently to the OS. Other embodiments are described and claimed.
    Type: Grant
    Filed: April 27, 2012
    Date of Patent: January 22, 2019
    Assignee: Intel Corporation
    Inventors: Alon Naveh, Yuval Yosef, Eliezer Weissmann, Anil Aggarwal, Efraim Rotem, Avi Mendelson, Ronny Ronen, Boris Ginzburg, Michael Mishaeli, Scott D. Hahn, David A. Koufaty, Ganapati Srinivasa, Guy Therien
  • Patent number: 10162635
    Abstract: An apparatus includes a network interface, memory, and a processor. The processor is coupled with the network interface and memory. The processor is configured to determine that an instruction instance is a branch instruction instance. Responsive to a determination that an instruction instance is a branch instruction instance, the processor is configured to obtain a branch prediction for the branch instruction instance and a confidence value of the branch prediction. The processor is further configured to determine that the confidence for the branch prediction is low based on the confidence value, and responsive to such a determination, generate predicated instruction instances based on the branch instruction instance.
    Type: Grant
    Filed: May 17, 2016
    Date of Patent: December 25, 2018
    Assignee: International Business Machines Corporation
    Inventor: Michael Karl Gschwind
  • Patent number: 10152321
    Abstract: A processor includes a core to execute an instruction and logic to determine that the instruction will require strided data converted from source data in memory. The strided data is to include corresponding indexed elements from structures in the source data to be loaded into a same register to be used to execute the instruction. The core also includes logic to load source data into preliminary vector registers. The source data is to be unaligned as resident in the vector registers. The core includes logic to apply blend instructions to contents of the preliminary vector registers to cause corresponding indexed elements from the plurality of structures to be loaded into respective interim vector registers, and to apply further blend instructions to contents of the interim vector registers to cause additional indexed elements from the structures to be loaded into respective source vector registers.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: December 11, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Patent number: 10133581
    Abstract: An execution slice circuit for a processor core has multiple parallel instruction execution slices and provides flexible and efficient use of internal resources. The execution slice circuit includes a master execution slice for receiving instructions of a first instruction stream and a slave execution slice for receiving instructions of a second instruction stream and instructions of the first instruction stream that require an execution width greater than a width of the slices. The execution slice circuit also includes a control logic that detects when a first instruction of the first instruction stream has the greater width and controls the slave execution slice to reserve a first issue cycle for issuing the first instruction in parallel across the master execution slice and the slave execution slice.
    Type: Grant
    Filed: January 13, 2015
    Date of Patent: November 20, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey Carl Brownscheidle, Sundeep Chadha, Maureen Anne Delaney, Hung Qui Le, Dung Quoc Nguyen, Brian William Thompto
  • Patent number: 10120834
    Abstract: A signal processing device including: one or more vector processors configured to perform vector processing to a signal using a parameter, one or more scalar processors configured to perform scalar processing for generating the parameter, a first circuit coupled to the one or more vector processors and the one or more scalar processors and configured to transfer the parameter from the one or more scalar processors to the one or more vector processors, and a second circuit coupled to the one or more vector processors and another circuit that inputs the signal to the second circuit, and configured to transfer the signal among the one or more vector processors and the other circuit.
    Type: Grant
    Filed: May 30, 2014
    Date of Patent: November 6, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Noboru Kobayashi
  • Patent number: 10114652
    Abstract: A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.
    Type: Grant
    Filed: April 12, 2016
    Date of Patent: October 30, 2018
    Assignee: International Business Machines Corporation
    Inventors: Miguel Comparan, Andrew D. Hilton, Hans M. Jacobson, Brian M. Rogers, Robert A. Shearer, Ken V. Vu, Alfred T. Watson, III
  • Patent number: 10114643
    Abstract: Various embodiments are generally directed to techniques to detect a return-oriented programming (ROP) attack by verifying target addresses of branch instructions during execution. An apparatus includes a processor component, and a comparison component for execution by the processor component to determine whether there is a matching valid target address for a target address of a branch instruction associated with a translated portion of a routine in a table comprising valid target addresses. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 23, 2013
    Date of Patent: October 30, 2018
    Assignee: INTEL CORPORATION
    Inventors: Koichi Yamada, Palanivelra Shanmugavelayutham, Arvind Krishnaswamy, Jason M. Agron, Jiwei Lu
  • Patent number: 10108424
    Abstract: The disclosure provides a micro-processing system operable in a hardware decoder mode and in a translation mode. In the hardware decoder mode, the hardware decoder receives and decodes non-native ISA instructions into native instructions for execution in a processing pipeline. In the translation mode, native translations of non-native ISA instructions are executed in the processing pipeline without using the hardware decoder. The system includes a code portion profile stored in hardware that changes dynamically in response to use of the hardware decoder to execute portions of non-native ISA code. The code portion profile is then used to dynamically form new native translations executable in the translation mode.
    Type: Grant
    Filed: March 14, 2013
    Date of Patent: October 23, 2018
    Assignee: Nvidia Corporation
    Inventors: Nathan Tuck, Alexander Klaiber, Ross Segelken, David Dunn, Ben Hertzberg, Rupert Brauch, Thomas Kistler, Guillermo J. Rozas, Madhu Swarna
  • Patent number: 10102003
    Abstract: Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques.
    Type: Grant
    Filed: February 28, 2013
    Date of Patent: October 16, 2018
    Assignee: International Business Machines Corporation
    Inventor: Randal C. Swanberg
  • Patent number: 10083033
    Abstract: A method and apparatus are described for efficient register reclamation. For example, one embodiment of an apparatus comprises: single usage detection and tagging logic to examine a sequence of instructions to detect logical registers used by the sequence of instructions that have a single use and to tag an instruction as a single usage instruction if the instruction is a consumer of a logical register that has a single use; an allocator to allocate processor resources to execute the sequence of instructions, the processor resources including physical registers mapped to logical registers to execute the sequence of instructions; and register reclamation logic to free up a logical to physical mapping of a single use register in response to detecting the tag provided by the instruction tagging logic.
    Type: Grant
    Filed: March 10, 2015
    Date of Patent: September 25, 2018
    Assignee: Intel Corporation
    Inventors: Sebastian Winkel, Girish Venkatasubramanian, Tyler N. Sondag, Rolf Kassa
  • Patent number: 10078518
    Abstract: Intelligent context management for thread switching is achieved by determining that a register bank has not been used by a thread for a predetermined number of dispatches, and responsively disabling the register bank for use by that thread. A counter is incremented each time the thread is dispatched but the register bank goes unused. Usage or non-usage of the register bank is inferred by comparing a previous checksum for the register bank to a current checksum. If the previous and current checksums match, the system concludes that the register bank has not been used. If a thread attempts to access a disabled bank, the processor takes an interrupt, enables the bank, and resets the corresponding counter. For a system utilizing transactional memory, it is preferable to enable all of the register banks when thread processing begins to avoid aborted transactions from register banks disabled by lazy context management techniques.
    Type: Grant
    Filed: November 1, 2012
    Date of Patent: September 18, 2018
    Assignee: International Business Machines Corporation
    Inventor: Randal C. Swanberg
  • Patent number: 10061591
    Abstract: A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.
    Type: Grant
    Filed: February 26, 2015
    Date of Patent: August 28, 2018
    Assignee: Samsung Electronics Company, Ltd.
    Inventors: Boris Beylin, John Brothers, Santosh Abraham, Lingjie Xu, Maxim Lukyanov, Alex Grosul