Patents by Inventor Xinmin Tian

Xinmin Tian has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

Patent number: 7571301

Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

Type: Grant

Filed: March 31, 2006

Date of Patent: August 4, 2009

Assignee: Intel Corporation

Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
Apparatus, systems, and methods for execution-driven loop splitting and load-safe code hosting

Patent number: 7549146

Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.

Type: Grant

Filed: June 21, 2005

Date of Patent: June 16, 2009

Assignee: Intel Corporation

Inventors: Xinmin Tian, Milind B. Girkar
Programmable event driven yield mechanism which may activate other threads

Patent number: 7487502

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Type: Grant

Filed: February 19, 2003

Date of Patent: February 3, 2009

Assignee: Intel Corporation

Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
Programming environment for heterogeneous processor resource integration

Publication number: 20080256330

Abstract: Compiling a source code program for a heterogeneous multi-core processor having a first instruction sequencer, having a first instruction set architecture, an accelerator to the first instruction sequencer, wherein the accelerator comprises a heterogeneous resource with respect to the first instruction sequencer having a second instruction set architecture, the source code program having specified therein a region of source code for the first instruction set architecture of the processor and a region of source code for the second instruction set architecture of the processor.

Type: Application

Filed: April 13, 2007

Publication date: October 16, 2008

Inventors: Perry Wang, Jamison Collins, Gautham Chinya, Hong Jiang, Hong Wang, Xinmin Tian, Guei-Yuan Lueh
METHOD AND APPARATUS FOR EXPLOITING THREAD-LEVEL PARALLELISM

Publication number: 20080244549

Abstract: According to one example embodiment, there is disclosed herein uses partial recurrence relaxation for parallelizing DOACROSS loops on multi-core computer architectures. By one example definition, a DOACROSS may be a loop that allows successive iterations executing by overlapping; that is, all iterations must impose a partial execution order. According to one embodiment, the inventive subject matter may be used to transform the dependence structure of a given loop with recurrences for maximal degree of thread-level parallelism (TLP), where the threads can be mapped on to either different logical processors (in a hyperthreaded processor) or can be mapped onto different physical cores (or processors) in a multi-core processor.

Type: Application

Filed: March 31, 2007

Publication date: October 2, 2008

Inventors: Arun Kejariwal, Xinmin Tian, Wei Li, Milind B. Girkar
Methods and apparatuses for thread management of multi-threading

Patent number: 7398521

Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Type: Grant

Filed: February 13, 2004

Date of Patent: July 8, 2008

Assignee: Intel Corporation

Inventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
METHODS AND APPARATUS TO PROVIDE PARAMETERIZED OFFLOADING ON MULTIPROCESSOR ARCHITECTURES

Publication number: 20080163183

Abstract: Methods and apparatus to provide parameterized offloading in multiprocessor systems are disclosed. An example method includes partitioning source code into a first task and a second task, and compiling object code from the source code, such that the first task is compiled to execute on a first processor core and the second task is compiled to execute on a second processor core, the assignment of the first task to the first core being dependent on an input parameter.

Type: Application

Filed: December 29, 2006

Publication date: July 3, 2008

Inventors: Zhiyuan Li, Xinmin Tian, Wei Li, Hong Wang
Methods and apparatus for reducing memory latency in a software application

Patent number: 7328433

Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Type: Grant

Filed: October 2, 2003

Date of Patent: February 5, 2008

Assignee: Intel Corporation

Inventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
Fast lock-free post-wait synchronization for exploiting parallelism on multi-core processors

Publication number: 20070234326

Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.

Type: Application

Filed: March 31, 2006

Publication date: October 4, 2007

Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
METHOD, SYSTEM, AND PROGRAM OF A COMPILER TO PARALLELIZE SOURCE CODE

Publication number: 20070234276

Abstract: Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.

Type: Application

Filed: March 31, 2006

Publication date: October 4, 2007

Inventors: Guilherme Ottoni, Xinmin Tian, Hong Wang, Richard Hankins, Wei Li, John Shen
Compiler-based scheduling optimization hints for user-level threads

Publication number: 20070124732

Abstract: Method, apparatus and system embodiments to schedule user-level OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. The scheduler routine may receive compiler-generated hints from a compiler. The compiler hints may be generated by the compiler without user-provided pragmas, and may be passed to the scheduler routine via an API-like interface. The interface may include a scheduling hint data structure that is maintained by the compiler. Other embodiments are also described and claimed.

Type: Application

Filed: November 29, 2005

Publication date: May 31, 2007

Inventors: Shih-wei Lia, Ryan Rakvic, Richard Hankins, Hong Wang, Gansha Wu, Guei-Yuan Lueh, Xinmin Tian, Paul Petersen, Sanjiv Shah, Trung Diep, John Shen, Gautham Chinya
Thread-data affinity optimization using compiler

Publication number: 20070079298

Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.

Type: Application

Filed: September 30, 2005

Publication date: April 5, 2007

Inventors: Xinmin Tian, Milind Girkar, David Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
Mechanism for instruction set based thread execution on a plurality of instruction sequencers

Publication number: 20070006231

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

Type: Application

Filed: June 30, 2005

Publication date: January 4, 2007

Inventors: Hong Wang, John Shen, Ed Grochowski, James Held, Bryant Bigbee, Shivnandan Kaushik, Gautham Chinya, Xiang Zou, Per Hammarlund, Xinmin Tian, Anil Aggarwal, Scott Rodgers, Prashant Sethi, Baiju Patel, Richard Hankins
Apparatus, systems, and methods for execution-driven loop splitting and load-safe code hosting

Publication number: 20060288334

Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.

Type: Application

Filed: June 21, 2005

Publication date: December 21, 2006

Inventors: Xinmin Tian, Milind Girkar
Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention

Publication number: 20060150184

Abstract: Method, apparatus and system embodiments to schedule OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. A scheduler routine may run on each enabled sequencer. The schedulers may retrieve shred descriptors from a queue system. The sequencer associated with the scheduler may then execute the shred described by the descriptor. Other embodiments are also described and claimed.

Type: Application

Filed: December 30, 2004

Publication date: July 6, 2006

Inventors: Richard Hankins, Hong Wang, Gautham Chinya, Trung Diep, Shivnandan Kaushik, Bryant Bigbee, John Shen, Asit Mallick, Baiju Patel, James Held, Milind Girkar, Prashant Sethi, Xinmin Tian
System, method and apparatus for dependency chain processing

Publication number: 20060070047

Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

Type: Application

Filed: September 28, 2004

Publication date: March 30, 2006

Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
Programmable event driven yield mechanism which may activate other threads

Publication number: 20050166039

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Type: Application

Filed: November 5, 2004

Publication date: July 28, 2005

Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
Methods and apparatus for reducing memory latency in a software application

Publication number: 20050086652

Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Type: Application

Filed: October 2, 2003

Publication date: April 21, 2005

Inventors: Xinmin Tian, Shih-Wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
Methods and apparatuses for thread management of multi-threading

Publication number: 20050081207

Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Type: Application

Filed: February 13, 2004

Publication date: April 14, 2005

Inventors: Gerolf Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
Methods and apparatuses for compiler-creating helper threads for multi-threading

Publication number: 20050071438

Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.

Type: Application

Filed: September 30, 2003

Publication date: March 31, 2005

Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf Hoflehner, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen

prev 1 2 3 4 5 next