From Multiple Instruction Streams, E.g., Multistreaming (epo) Patents (Class 712/E9.053)
  • Patent number: 11508124
    Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: November 22, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Nishank Pathak
  • Patent number: 11086691
    Abstract: A producer-consumer technique includes creating a pool of consumer threads. Producer threads can enqueue work items on a work queue. Consumer threads from the consumer pool are activated to process work items on the work queue. Only one consumer thread at time is activated from the consumer pool, the remaining consumer threads in the pool waiting for an activation event. When signaled by a producer thread, the activated consumer thread pops all the work items from the work queue for processing. The activate consumer thread then signals another consumer thread in the consumer pool by generating an activation event. When the consumer thread has processed its work items, it places itself in the consumer pool by blocking to wait for an activation event.
    Type: Grant
    Filed: May 17, 2019
    Date of Patent: August 10, 2021
    Assignee: SAP SE
    Inventor: Muhammed Sharique
  • Patent number: 10977075
    Abstract: An apparatus comprising: a processing unit configured to execute a plurality of threads; a profiling unit configured to: profile the operation of the processing unit over a time period to generate an activity profile indicating when each of the plurality of threads is executed by the processing unit over the time period; analyse the generated activity profile to determine whether a signature of the processing unit's thread execution for the time period matches a signature indicating a baseline of thread execution for the processing unit; output an alert signal if the signature of the processing unit's thread execution for the time period does not match the signature indicating a baseline of thread execution for the processing unit.
    Type: Grant
    Filed: April 10, 2019
    Date of Patent: April 13, 2021
    Assignee: Mentor Graphics Corporation
    Inventor: Gajinder Singh Panesar
  • Patent number: 10950230
    Abstract: Included are a speech recognition result obtainer that obtains a speech recognition result, which is text data obtained by speech recognition processing, a priority obtainer that obtains priority corresponding to each of a plurality of tasks that are each identified by a plurality of dialog processing based on the speech recognition result; and a dialog processing controller that causes a plurality of devices to perform the distributed execution of the plurality of dialog processing mutually different from each other. The dialog processing controller provides, based on the priority, control information in accordance with a task identified by the distributed execution to an executer that operates based on the control information.
    Type: Grant
    Filed: October 23, 2017
    Date of Patent: March 16, 2021
    Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA
    Inventor: Yoshihiro Kojima
  • Patent number: 10565024
    Abstract: Generic Concurrency Restriction (GCR) may divide a set of threads waiting to acquire a lock into two sets: an active set currently able to contend for the lock, and a passive set waiting for an opportunity to join the active set and contend for the lock. The number of threads in the active set may be limited to a predefined maximum or even a single thread. Generic Concurrency Restriction may be implemented as a wrapper around an existing lock implementation. Generic Concurrency Restriction may, in some embodiments, be unfair (e.g., to some threads) over the short term, but may improve the overall throughput of the underlying multithreaded application via passivation of a portion of the waiting threads.
    Type: Grant
    Filed: October 19, 2016
    Date of Patent: February 18, 2020
    Assignee: Oracle International Corporation
    Inventors: David Dice, Alex Kogan
  • Patent number: 10360128
    Abstract: A system and method for the dynamic scaling of concurrent processing threads are provided. The system may include a scheduler, a master controller, a thread controller, a process invoker, a reprocess validator, and a server cluster comprising various managed servers. The master controller may to generate processing thread messages during an initial processing run. Thereafter, the master controller may dynamically scale the processing thread messages based on process performance data and system performance data.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: July 23, 2019
    Assignee: AMERICAN EXPRESS TRAVEL RELATED SERVICES COMPANY, INC.
    Inventor: Krishna K. Lingamneni
  • Patent number: 10354085
    Abstract: Techniques for simulating exclusive use of a processor core amongst multiple logical partitions (LPARs) include providing hardware thread-dependent status information in response to access requests by the LPARs that is reflective of exclusive use of the processor by the LPAR accessing the hardware thread-dependent information. The information returned in response to the access requests is transformed if the requestor is a program executing at a privilege level lower than the hypervisor privilege level, so that each logical partition views the processor as though it has exclusive use of the processor. The techniques may be implemented by a logical circuit block within the processor core that transforms the hardware thread-specific information to a logical representation of the hardware thread-specific information or the transformation may be performed by program instructions of an interrupt handler that traps access to the physical register containing the information.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: July 16, 2019
    Assignee: International Business Machines Corporation
    Inventors: Giles R. Frazier, Bruce Mealey, Naresh Nayar
  • Patent number: 10235178
    Abstract: Embodiments relate to improving user experiences when executing binary code that has been translated from other binary code. Binary code (instructions) for a source instruction set architecture (ISA) cannot natively execute on a processor that implements a target ISA. The instructions in the source ISA are binary-translated to instructions in the target ISA and are executed on the processor. The overhead of performing binary translation and/or the overhead of executing binary-translated code are compensated for by increasing the speed at which the translated code is executed, relative to non-translated code. Translated code may be executed on hardware that has one or more power-performance parameters of the processor set to increase the performance of the processor with respect to the translated code. The increase in power-performance for translated code may be proportional to the degree of translation overhead.
    Type: Grant
    Filed: June 2, 2017
    Date of Patent: March 19, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hee jun Park, Mehmet Iyigun
  • Patent number: 10013348
    Abstract: A liveness-based memory allocation module operating so that a program thread invoking the memory allocation module is provided with an allocation of memory including a reserve of free heap slots beyond the immediate requirements of the invoking thread. The module receives a parameter representing a thread execution window from an invoking thread; calculates a liveness metric based upon the parameter; calculates a reserve of memory to be passed to the invoking thread based upon the parameter; returns a block of memory corresponding to the calculated reserve of memory. Equations, algorithms, and sampling strategies for calculating liveness metrics are disclosed, as well as a method for adaptive control of the module to achieve a balance between memory efficiency and potential contention as specified by a single control parameter.
    Type: Grant
    Filed: September 10, 2015
    Date of Patent: July 3, 2018
    Assignee: UNIVERSITY OF ROCHESTER
    Inventors: Pengcheng Li, Chen Ding
  • Patent number: 10013240
    Abstract: A first processing element is configured to execute a first thread and one or more second processing elements are configured to execute one or more second threads that are redundant to the first thread. The first thread and the one or more second threads are to selectively bypass one or more comparisons of results of operations performed by the first thread and the one or more second threads depending on whether an event trigger for the comparison has occurred a configurable number of times since a previous comparison of previously encoded values of the results. In some cases the comparison can be performed based on hashed (or encoded) values of the results of a current operation and one or more previous operations.
    Type: Grant
    Filed: June 21, 2016
    Date of Patent: July 3, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Daniel I. Lowell
  • Patent number: 9766951
    Abstract: A method for synchronizing multiple processing units, comprises the steps of configuring a synchronization register in a target processing unit so that its content is overwritten only by bits that are set in words written in the synchronization register; assigning a distinct bit position of the synchronization register to each processing unit; and executing a program thread in each processing unit. When the program thread of a current processing unit reaches a synchronization point, the method comprises writing in the synchronization register of the target processing unit a word in which the bit position assigned to the current processing unit is set, and suspending the program thread. When all the bits assigned to the processing units are set in the synchronization register, the suspended program threads are resumed.
    Type: Grant
    Filed: May 26, 2015
    Date of Patent: September 19, 2017
    Assignee: KALRAY
    Inventors: Thomas Champseix, Benoît Dupont De Dinechin, Pierre Guironnet De Massas
  • Patent number: 9736270
    Abstract: An operation (such as a relational query) may be processed on a processing engine (such as a relational database server) on behalf of a client. A conventional processing involves the delivery of the operation to the processing engine, which executes the entire operation to completion and returns a result data set. It may be more efficient to allocate part of the operation to be performed on the client, but a developer may be unable or unavailable to rewrite the operation in a distributed manner. Instead, the operation may be automatically partitioned into a pre-engine client portion, a processing engine portion, and a client portion, and the instructions of each portion may be automatically allocated respectively to the client, the server, and the client. The partitioning may be adjusted to conserve computing resources, such as bandwidth and storage, and the instructions may be reordered to improve the processing of the operation.
    Type: Grant
    Filed: January 25, 2013
    Date of Patent: August 15, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Erik Meijer, Dinesh Chandrakant Kulkarni, Matthew J. Warren, Anders Hejlsberg
  • Patent number: 9047079
    Abstract: A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: June 2, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Becky Bruce, Giles R. Frazier, Bradly G. Frey, Kumar K. Gala, Cathy May, Michael D. Snyder, Gary Whisenhunt, James Xenidis
  • Patent number: 9021237
    Abstract: A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: April 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Miguel Comparan, Russell D. Hoover, Robert A. Shearer, Alfred T. Watson, III
  • Patent number: 9015504
    Abstract: A multi-threaded microprocessor for processing instructions in threads, including, in one embodiment, (1) at least one processor pipeline for the instructions; (2) a storage for a thread power management configuration; and (3) a power control circuit coupled to said at least one processor pipeline and responsive to said storage for thread power management configuration to control power used by different parts of the at least one processor pipeline depending on the threads, wherein said power control circuit is operable to establish different power voltages in different parts of the at least one processor pipeline depending on the threads.
    Type: Grant
    Filed: January 6, 2011
    Date of Patent: April 21, 2015
    Assignee: Texas Instruments Incorporated
    Inventor: Thang Tran
  • Patent number: 9009448
    Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: April 14, 2015
    Assignee: Intel Corporation
    Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
  • Patent number: 8972704
    Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
  • Patent number: 8959319
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: February 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 8788793
    Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: July 22, 2014
    Assignee: Panasonic Corporation
    Inventor: Hiroyuki Morishita
  • Patent number: 8639884
    Abstract: Systems and methods are disclosed for multi-threading computer systems. In a computer system executing multiple program threads in a processing unit, a first load/store execution unit is configured to handle instructions from a first program thread and a second load/store execution unit is configured to handle instructions from a second program thread. When the computer system executing a single program thread, the first and second load/store execution units are reconfigured to handle instructions from the single program thread, and a Level 1 (L1) data cache is reconfigured with a first port to communicate with the first load/store execution unit and a second port to communicate with the second load/store execution unit.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: January 28, 2014
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Thang M. Tran
  • Patent number: 8078840
    Abstract: A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 13, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Soumya Banerjee, Michael Gottlieb Jensen