From Multiple Instruction Streams, E.g., Multistreaming (epo) Patents (Class 712/E9.053)

Throttling hull shaders based on tessellation factors in a graphics pipeline

Patent number: 11508124

Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.

Type: Grant

Filed: December 15, 2020

Date of Patent: November 22, 2022

Assignee: Advanced Micro Devices, Inc.

Inventor: Nishank Pathak
Producer-consumer communication using multi-work consumers

Patent number: 11086691

Abstract: A producer-consumer technique includes creating a pool of consumer threads. Producer threads can enqueue work items on a work queue. Consumer threads from the consumer pool are activated to process work items on the work queue. Only one consumer thread at time is activated from the consumer pool, the remaining consumer threads in the pool waiting for an activation event. When signaled by a producer thread, the activated consumer thread pops all the work items from the work queue for processing. The activate consumer thread then signals another consumer thread in the consumer pool by generating an activation event. When the consumer thread has processed its work items, it places itself in the consumer pool by blocking to wait for an activation event.

Type: Grant

Filed: May 17, 2019

Date of Patent: August 10, 2021

Assignee: SAP SE

Inventor: Muhammed Sharique
Performance profiling for a multithreaded processor

Patent number: 10977075

Abstract: An apparatus comprising: a processing unit configured to execute a plurality of threads; a profiling unit configured to: profile the operation of the processing unit over a time period to generate an activity profile indicating when each of the plurality of threads is executed by the processing unit over the time period; analyse the generated activity profile to determine whether a signature of the processing unit's thread execution for the time period matches a signature indicating a baseline of thread execution for the processing unit; output an alert signal if the signature of the processing unit's thread execution for the time period does not match the signature indicating a baseline of thread execution for the processing unit.

Type: Grant

Filed: April 10, 2019

Date of Patent: April 13, 2021

Assignee: Mentor Graphics Corporation

Inventor: Gajinder Singh Panesar
Information processing device and information processing method

Patent number: 10950230

Abstract: Included are a speech recognition result obtainer that obtains a speech recognition result, which is text data obtained by speech recognition processing, a priority obtainer that obtains priority corresponding to each of a plurality of tasks that are each identified by a plurality of dialog processing based on the speech recognition result; and a dialog processing controller that causes a plurality of devices to perform the distributed execution of the plurality of dialog processing mutually different from each other. The dialog processing controller provides, based on the priority, control information in accordance with a task identified by the distributed execution to an executer that operates based on the control information.

Type: Grant

Filed: October 23, 2017

Date of Patent: March 16, 2021

Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA

Inventor: Yoshihiro Kojima
Generic concurrency restriction

Patent number: 10565024

Abstract: Generic Concurrency Restriction (GCR) may divide a set of threads waiting to acquire a lock into two sets: an active set currently able to contend for the lock, and a passive set waiting for an opportunity to join the active set and contend for the lock. The number of threads in the active set may be limited to a predefined maximum or even a single thread. Generic Concurrency Restriction may be implemented as a wrapper around an existing lock implementation. Generic Concurrency Restriction may, in some embodiments, be unfair (e.g., to some threads) over the short term, but may improve the overall throughput of the underlying multithreaded application via passivation of a portion of the waiting threads.

Type: Grant

Filed: October 19, 2016

Date of Patent: February 18, 2020

Assignee: Oracle International Corporation

Inventors: David Dice, Alex Kogan
System and method for dynamic scaling of concurrent processing threads

Patent number: 10360128

Abstract: A system and method for the dynamic scaling of concurrent processing threads are provided. The system may include a scheduler, a master controller, a thread controller, a process invoker, a reprocess validator, and a server cluster comprising various managed servers. The master controller may to generate processing thread messages during an initial processing run. Thereafter, the master controller may dynamically scale the processing thread messages based on process performance data and system performance data.

Type: Grant

Filed: January 23, 2017

Date of Patent: July 23, 2019

Assignee: AMERICAN EXPRESS TRAVEL RELATED SERVICES COMPANY, INC.

Inventor: Krishna K. Lingamneni
Providing logical partitions with hardware-thread specific information reflective of exclusive use of a processor core

Patent number: 10354085

Abstract: Techniques for simulating exclusive use of a processor core amongst multiple logical partitions (LPARs) include providing hardware thread-dependent status information in response to access requests by the LPARs that is reflective of exclusive use of the processor by the LPAR accessing the hardware thread-dependent information. The information returned in response to the access requests is transformed if the requestor is a program executing at a privilege level lower than the hypervisor privilege level, so that each logical partition views the processor as though it has exclusive use of the processor. The techniques may be implemented by a logical circuit block within the processor core that transforms the hardware thread-specific information to a logical representation of the hardware thread-specific information or the transformation may be performed by program instructions of an interrupt handler that traps access to the physical register containing the information.

Type: Grant

Filed: December 29, 2017

Date of Patent: July 16, 2019

Assignee: International Business Machines Corporation

Inventors: Giles R. Frazier, Bruce Mealey, Naresh Nayar
Performance scaling for binary translation

Patent number: 10235178

Abstract: Embodiments relate to improving user experiences when executing binary code that has been translated from other binary code. Binary code (instructions) for a source instruction set architecture (ISA) cannot natively execute on a processor that implements a target ISA. The instructions in the source ISA are binary-translated to instructions in the target ISA and are executed on the processor. The overhead of performing binary translation and/or the overhead of executing binary-translated code are compensated for by increasing the speed at which the translated code is executed, relative to non-translated code. Translated code may be executed on hardware that has one or more power-performance parameters of the processor set to increase the performance of the processor with respect to the translated code. The increase in power-performance for translated code may be proportional to the degree of translation overhead.

Type: Grant

Filed: June 2, 2017

Date of Patent: March 19, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Hee jun Park, Mehmet Iyigun
Parallel memory allocator employing liveness metrics

Patent number: 10013348

Abstract: A liveness-based memory allocation module operating so that a program thread invoking the memory allocation module is provided with an allocation of memory including a reserve of free heap slots beyond the immediate requirements of the invoking thread. The module receives a parameter representing a thread execution window from an invoking thread; calculates a liveness metric based upon the parameter; calculates a reserve of memory to be passed to the invoking thread based upon the parameter; returns a block of memory corresponding to the calculated reserve of memory. Equations, algorithms, and sampling strategies for calculating liveness metrics are disclosed, as well as a method for adaptive control of the module to achieve a balance between memory efficiency and potential contention as specified by a single control parameter.

Type: Grant

Filed: September 10, 2015

Date of Patent: July 3, 2018

Assignee: UNIVERSITY OF ROCHESTER

Inventors: Pengcheng Li, Chen Ding
Fingerprinting of redundant threads using compiler-inserted transformation code

Patent number: 10013240

Abstract: A first processing element is configured to execute a first thread and one or more second processing elements are configured to execute one or more second threads that are redundant to the first thread. The first thread and the one or more second threads are to selectively bypass one or more comparisons of results of operations performed by the first thread and the one or more second threads depending on whether an event trigger for the comparison has occurred a configurable number of times since a previous comparison of previously encoded values of the results. In some cases the comparison can be performed based on hashed (or encoded) values of the results of a current operation and one or more previous operations.

Type: Grant

Filed: June 21, 2016

Date of Patent: July 3, 2018

Assignee: Advanced Micro Devices, Inc.

Inventor: Daniel I. Lowell
Hardware synchronization barrier between processing units

Patent number: 9766951

Abstract: A method for synchronizing multiple processing units, comprises the steps of configuring a synchronization register in a target processing unit so that its content is overwritten only by bits that are set in words written in the synchronization register; assigning a distinct bit position of the synchronization register to each processing unit; and executing a program thread in each processing unit. When the program thread of a current processing unit reaches a synchronization point, the method comprises writing in the synchronization register of the target processing unit a word in which the bit position assigned to the current processing unit is set, and suspending the program thread. When all the bits assigned to the processing units are set in the synchronization register, the suspended program threads are resumed.

Type: Grant

Filed: May 26, 2015

Date of Patent: September 19, 2017

Assignee: KALRAY

Inventors: Thomas Champseix, Benoît Dupont De Dinechin, Pierre Guironnet De Massas
Automated client/server operation partitioning

Patent number: 9736270

Abstract: An operation (such as a relational query) may be processed on a processing engine (such as a relational database server) on behalf of a client. A conventional processing involves the delivery of the operation to the processing engine, which executes the entire operation to completion and returns a result data set. It may be more efficient to allocate part of the operation to be performed on the client, but a developer may be unable or unavailable to rewrite the operation in a distributed manner. Instead, the operation may be automatically partitioned into a pre-engine client portion, a processing engine portion, and a client portion, and the instructions of each portion may be automatically allocated respectively to the client, the server, and the client. The partitioning may be adjusted to conserve computing resources, such as bandwidth and storage, and the instructions may be reordered to improve the processing of the operation.

Type: Grant

Filed: January 25, 2013

Date of Patent: August 15, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Erik Meijer, Dinesh Chandrakant Kulkarni, Matthew J. Warren, Anders Hejlsberg
Indicating disabled thread to other threads when contending instructions complete execution to ensure safe shared resource condition

Patent number: 9047079

Abstract: A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.

Type: Grant

Filed: March 30, 2012

Date of Patent: June 2, 2015

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Becky Bruce, Giles R. Frazier, Bradly G. Frey, Kumar K. Gala, Cathy May, Michael D. Snyder, Gary Whisenhunt, James Xenidis
Low latency variable transfer network communicating variable written to source processing core variable register allocated to destination thread to destination processing core variable register allocated to source thread

Patent number: 9021237

Abstract: A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.

Type: Grant

Filed: December 20, 2011

Date of Patent: April 28, 2015

Assignee: International Business Machines Corporation

Inventors: Miguel Comparan, Russell D. Hoover, Robert A. Shearer, Alfred T. Watson, III
Managing power of thread pipelines according to clock frequency and voltage specified in thread registers

Patent number: 9015504

Abstract: A multi-threaded microprocessor for processing instructions in threads, including, in one embodiment, (1) at least one processor pipeline for the instructions; (2) a storage for a thread power management configuration; and (3) a power control circuit coupled to said at least one processor pipeline and responsive to said storage for thread power management configuration to control power used by different parts of the at least one processor pipeline depending on the threads, wherein said power control circuit is operable to establish different power voltages in different parts of the at least one processor pipeline depending on the threads.

Type: Grant

Filed: January 6, 2011

Date of Patent: April 21, 2015

Assignee: Texas Instruments Incorporated

Inventor: Thang Tran
Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens

Patent number: 9009448

Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.

Type: Grant

Filed: January 18, 2012

Date of Patent: April 14, 2015

Assignee: Intel Corporation

Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
Code section optimization by removing memory barrier instruction and enclosing within a transaction that employs hardware transaction memory

Patent number: 8972704

Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.

Type: Grant

Filed: December 15, 2011

Date of Patent: March 3, 2015

Assignee: International Business Machines Corporation

Inventors: Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
Executing first instructions for smaller set of SIMD threads diverging upon conditional branch instruction

Patent number: 8959319

Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.

Type: Grant

Filed: December 2, 2011

Date of Patent: February 17, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
Instruction issue to plural computing units from plural stream buffers based on priority in instruction order table

Patent number: 8788793

Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.

Type: Grant

Filed: May 18, 2010

Date of Patent: July 22, 2014

Assignee: Panasonic Corporation

Inventor: Hiroyuki Morishita
Systems and methods for configuring load/store execution units

Patent number: 8639884

Abstract: Systems and methods are disclosed for multi-threading computer systems. In a computer system executing multiple program threads in a processing unit, a first load/store execution unit is configured to handle instructions from a first program thread and a second load/store execution unit is configured to handle instructions from a second program thread. When the computer system executing a single program thread, the first and second load/store execution units are reconfigured to handle instructions from the single program thread, and a Level 1 (L1) data cache is reconfigured with a first port to communicate with the first load/store execution unit and a second port to communicate with the second load/store execution unit.

Type: Grant

Filed: February 28, 2011

Date of Patent: January 28, 2014

Assignee: Freescale Semiconductor, Inc.

Inventor: Thang M. Tran
Thread instruction fetch based on prioritized selection from plural round-robin outputs for different thread states

Patent number: 8078840

Abstract: A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread.

Type: Grant

Filed: December 30, 2008

Date of Patent: December 13, 2011

Assignee: MIPS Technologies, Inc.

Inventors: Soumya Banerjee, Michael Gottlieb Jensen