Patents by Inventor Perry Wang

Perry Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatuses for thread management of multi-threading

Patent number: 7398521

Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Type: Grant

Filed: February 13, 2004

Date of Patent: July 8, 2008

Assignee: Intel Corporation

Inventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
User-level privilege management

Publication number: 20080163366

Abstract: In one embodiment, the present invention includes a method for receiving a request from a user-level agent for programming of a user-level privilege for at least one architectural resource of an application-managed sequencer (AMS) and programming the user-level privilege for the at least one architectural resource using an operating system-managed sequencer (OMS) coupled to the AMS. Other embodiments are described and claimed.

Type: Application

Filed: December 29, 2006

Publication date: July 3, 2008

Inventors: Gautham Chinya, Perry Wang, Hong Wang, Jamison Collins, Richard A. Hankins, Per Hammarlund, John Shen
Enabling multiple instruction stream/multiple data stream extensions on microprocessors

Publication number: 20080077909

Abstract: Embodiments described herein disclose a system for enabling emulation of a MIMD ISA extension which supports user-level sequencer management and control, and a set of privileged code executed by both operating system managed sequencers and application managed sequencers, including different sets of persistent per-CPU and per-thread data. In one embodiment, a lightweight code layer executes beneath the operating system. This code layer is invoked in response to particular monitored events, such as the need for communication between an operating system managed sequencer and an application managed sequencer. Control is transferred to this code layer, for execution of special operations, after which control returns back to originally executing code. The code layer is normally dormant and can be invoked at any time when either a user application or the operating system is executing.

Type: Application

Filed: September 27, 2006

Publication date: March 27, 2008

Inventors: Jamison Collins, Perry Wang, Bernard Lint, Koichi Yamada, Asit Mallick, Richard A. Hankins, Gautham Chinya
Methods and apparatus for reducing memory latency in a software application

Patent number: 7328433

Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Type: Grant

Filed: October 2, 2003

Date of Patent: February 5, 2008

Assignee: Intel Corporation

Inventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
Apparatus to implement mesocode

Patent number: 7260705

Abstract: In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.

Type: Grant

Filed: June 26, 2003

Date of Patent: August 21, 2007

Assignee: Intel Corporation

Inventors: Hong Wang, John Shen, Perry Wang, Marsha Eng, Gerolf F. Hoflehner, Dan Lavery, Wei Li, Alejandro Ramirez, Ed Grochowski
Building inter-block streams from a dynamic execution trace for a program

Patent number: 7228528

Abstract: In one embodiment, the invention provides a method for the processing of instructions. A method which comprises analyzing a dynamic execution trace for a program; identifying at least one stream comprising a plurality of basic blocks in the dynamic execution trace; collecting metrics associated with the at least one stream; and optimizing the at least one stream based on the metrics.

Type: Grant

Filed: June 26, 2003

Date of Patent: June 5, 2007

Assignee: Intel Corporation

Inventors: Hong Wang, Marsha Eng, Perry Wang, John P. Shen, Gerolf F. Hoflehner, Daniel Lavery, Wei Li
Thread-data affinity optimization using compiler

Publication number: 20070079298

Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.

Type: Application

Filed: September 30, 2005

Publication date: April 5, 2007

Inventors: Xinmin Tian, Milind Girkar, David Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
Quantization and compression for computation reuse

Patent number: 7069545

Abstract: Software reuse instances are found from an execution trace through a process of quantization, discovery, and synthesis. Quantization includes mapping n-dimensional vectors that correspond to instructions, live-in states, and live-out states to one dimensional symbols, and arranging the symbols into a text in program execution order. Discovery includes the identification of recurrent symbols and recurrent phrases of symbols within the text. Recurrent symbols and phrases correspond to reuse instances. Compression algorithms are applied to identify the recurrent symbols and phrases. Synthesis can include correlating the reuse instances with the binary program to identify the reuse regions within the software program. Synthesis can also include generating non-essential code and corresponding triggers for a conjugate processor.

Type: Grant

Filed: December 29, 2000

Date of Patent: June 27, 2006

Assignee: Intel Corporation

Inventors: Hong Wang, Perry Wang, Ralph Kling, Neil A. Chazin, John Shen
System, method and apparatus for dependency chain processing

Publication number: 20060070047

Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

Type: Application

Filed: September 28, 2004

Publication date: March 30, 2006

Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
Method and system to provide user-level multithreading

Publication number: 20050223199

Abstract: A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.

Type: Application

Filed: March 31, 2004

Publication date: October 6, 2005

Inventors: Edward Grochowski, Hong Wang, John Shen, Perry Wang, Jamison Collins, James Held, Partha Kundu, Raya Leviathan, Tin-Fook Ngai
Programmable event driven yield mechanism which may activate other threads

Publication number: 20050166039

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.

Type: Application

Filed: November 5, 2004

Publication date: July 28, 2005

Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
Mechanism to exploit synchronization overhead to improve multithreaded performance

Publication number: 20050149697

Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.

Type: Application

Filed: March 2, 2005

Publication date: July 7, 2005

Inventors: Natalie Enright, Jamison Collins, Perry Wang, Hong Wang, Xinmin Tran, John Shen, Gad Sheaffer, Per Hammarlund
User-programmable low-overhead multithreading

Publication number: 20050125802

Abstract: A virtual multithreading hardware mechanism provides multi-threading on a single-threaded processor. Thread switches are triggered by user-defined triggers. Synchronous triggers may be defined in the form of special trigger instructions. Asynchronous triggers may be defined via special marking instructions that identify an asynchronous trigger condition. The asynchronous trigger condition may be based on a plurality of atomic processor events. Minimal context information, such as only an instruction pointer address, is maintained by the hardware upon a thread switch. In contrast to traditional simultaneous multithreading schemes, the virtual multithreading hardware provides thread switches that are transparent to an operating system and that may be performed without operating system intervention.

Type: Application

Filed: December 5, 2003

Publication date: June 9, 2005

Inventors: Perry Wang, Hong Wang, John Shen, Ashok Seshadri, Anthony Mah, William Greene, Ravi Chandran, Piyush Desai, Steve Liao
Methods and apparatus for reducing memory latency in a software application

Publication number: 20050086652

Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Type: Application

Filed: October 2, 2003

Publication date: April 21, 2005

Inventors: Xinmin Tian, Shih-Wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
Methods and apparatuses for thread management of multi-threading

Publication number: 20050081207

Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Type: Application

Filed: February 13, 2004

Publication date: April 14, 2005

Inventors: Gerolf Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
Methods and apparatuses for compiler-creating helper threads for multi-threading

Publication number: 20050071438

Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.

Type: Application

Filed: September 30, 2003

Publication date: March 31, 2005

Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf Hoflehner, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
Methods and apparatuses for thread management of mult-threading

Publication number: 20050071841

Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.

Type: Application

Filed: September 30, 2003

Publication date: March 31, 2005

Inventors: Gerolf Hoflehner, Shih-Wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
Method and apparatus for efficient utilization for prescient instruction prefetch

Publication number: 20050055541

Abstract: Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread.

Type: Application

Filed: September 8, 2003

Publication date: March 10, 2005

Inventors: Tor Aamodt, Hong Wang, Per Hammarlund, John Shen, Steve Liao, Perry Wang
Method and apparatus for affinity-guided speculative helper threads in chip multiprocessors

Publication number: 20050027941

Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.

Type: Application

Filed: July 31, 2003

Publication date: February 3, 2005

Inventors: Hong Wang, Perry Wang, Jeffery Brown, Per Hammarlund, George Chrysos, Doron Orenstein, Steve Liao, John Shen
Apparatus to implement mesocode

Publication number: 20040268100

Abstract: In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.

Type: Application

Filed: June 26, 2003

Publication date: December 30, 2004

Inventors: Hong Wang, John Shen, Perry Wang, Marsha Eng, Gerolf F. Hoflehner, Dan Lavery, Wei Li, Alejandro Ramirez, Ed Grochowski

prev 1 2 3 next