Patents by Inventor Perry Wang
Perry Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7398521Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.Type: GrantFiled: February 13, 2004Date of Patent: July 8, 2008Assignee: Intel CorporationInventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
-
Publication number: 20080163366Abstract: In one embodiment, the present invention includes a method for receiving a request from a user-level agent for programming of a user-level privilege for at least one architectural resource of an application-managed sequencer (AMS) and programming the user-level privilege for the at least one architectural resource using an operating system-managed sequencer (OMS) coupled to the AMS. Other embodiments are described and claimed.Type: ApplicationFiled: December 29, 2006Publication date: July 3, 2008Inventors: Gautham Chinya, Perry Wang, Hong Wang, Jamison Collins, Richard A. Hankins, Per Hammarlund, John Shen
-
Publication number: 20080077909Abstract: Embodiments described herein disclose a system for enabling emulation of a MIMD ISA extension which supports user-level sequencer management and control, and a set of privileged code executed by both operating system managed sequencers and application managed sequencers, including different sets of persistent per-CPU and per-thread data. In one embodiment, a lightweight code layer executes beneath the operating system. This code layer is invoked in response to particular monitored events, such as the need for communication between an operating system managed sequencer and an application managed sequencer. Control is transferred to this code layer, for execution of special operations, after which control returns back to originally executing code. The code layer is normally dormant and can be invoked at any time when either a user application or the operating system is executing.Type: ApplicationFiled: September 27, 2006Publication date: March 27, 2008Inventors: Jamison Collins, Perry Wang, Bernard Lint, Koichi Yamada, Asit Mallick, Richard A. Hankins, Gautham Chinya
-
Patent number: 7328433Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.Type: GrantFiled: October 2, 2003Date of Patent: February 5, 2008Assignee: Intel CorporationInventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
-
Patent number: 7260705Abstract: In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.Type: GrantFiled: June 26, 2003Date of Patent: August 21, 2007Assignee: Intel CorporationInventors: Hong Wang, John Shen, Perry Wang, Marsha Eng, Gerolf F. Hoflehner, Dan Lavery, Wei Li, Alejandro Ramirez, Ed Grochowski
-
Patent number: 7228528Abstract: In one embodiment, the invention provides a method for the processing of instructions. A method which comprises analyzing a dynamic execution trace for a program; identifying at least one stream comprising a plurality of basic blocks in the dynamic execution trace; collecting metrics associated with the at least one stream; and optimizing the at least one stream based on the metrics.Type: GrantFiled: June 26, 2003Date of Patent: June 5, 2007Assignee: Intel CorporationInventors: Hong Wang, Marsha Eng, Perry Wang, John P. Shen, Gerolf F. Hoflehner, Daniel Lavery, Wei Li
-
Publication number: 20070079298Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.Type: ApplicationFiled: September 30, 2005Publication date: April 5, 2007Inventors: Xinmin Tian, Milind Girkar, David Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
-
Patent number: 7069545Abstract: Software reuse instances are found from an execution trace through a process of quantization, discovery, and synthesis. Quantization includes mapping n-dimensional vectors that correspond to instructions, live-in states, and live-out states to one dimensional symbols, and arranging the symbols into a text in program execution order. Discovery includes the identification of recurrent symbols and recurrent phrases of symbols within the text. Recurrent symbols and phrases correspond to reuse instances. Compression algorithms are applied to identify the recurrent symbols and phrases. Synthesis can include correlating the reuse instances with the binary program to identify the reuse regions within the software program. Synthesis can also include generating non-essential code and corresponding triggers for a conjugate processor.Type: GrantFiled: December 29, 2000Date of Patent: June 27, 2006Assignee: Intel CorporationInventors: Hong Wang, Perry Wang, Ralph Kling, Neil A. Chazin, John Shen
-
Publication number: 20060070047Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.Type: ApplicationFiled: September 28, 2004Publication date: March 30, 2006Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
-
Publication number: 20050223199Abstract: A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.Type: ApplicationFiled: March 31, 2004Publication date: October 6, 2005Inventors: Edward Grochowski, Hong Wang, John Shen, Perry Wang, Jamison Collins, James Held, Partha Kundu, Raya Leviathan, Tin-Fook Ngai
-
Publication number: 20050166039Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: ApplicationFiled: November 5, 2004Publication date: July 28, 2005Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
-
Publication number: 20050149697Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.Type: ApplicationFiled: March 2, 2005Publication date: July 7, 2005Inventors: Natalie Enright, Jamison Collins, Perry Wang, Hong Wang, Xinmin Tran, John Shen, Gad Sheaffer, Per Hammarlund
-
Publication number: 20050125802Abstract: A virtual multithreading hardware mechanism provides multi-threading on a single-threaded processor. Thread switches are triggered by user-defined triggers. Synchronous triggers may be defined in the form of special trigger instructions. Asynchronous triggers may be defined via special marking instructions that identify an asynchronous trigger condition. The asynchronous trigger condition may be based on a plurality of atomic processor events. Minimal context information, such as only an instruction pointer address, is maintained by the hardware upon a thread switch. In contrast to traditional simultaneous multithreading schemes, the virtual multithreading hardware provides thread switches that are transparent to an operating system and that may be performed without operating system intervention.Type: ApplicationFiled: December 5, 2003Publication date: June 9, 2005Inventors: Perry Wang, Hong Wang, John Shen, Ashok Seshadri, Anthony Mah, William Greene, Ravi Chandran, Piyush Desai, Steve Liao
-
Publication number: 20050086652Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.Type: ApplicationFiled: October 2, 2003Publication date: April 21, 2005Inventors: Xinmin Tian, Shih-Wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
-
Publication number: 20050081207Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.Type: ApplicationFiled: February 13, 2004Publication date: April 14, 2005Inventors: Gerolf Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
-
Publication number: 20050071438Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.Type: ApplicationFiled: September 30, 2003Publication date: March 31, 2005Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf Hoflehner, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
-
Publication number: 20050071841Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.Type: ApplicationFiled: September 30, 2003Publication date: March 31, 2005Inventors: Gerolf Hoflehner, Shih-Wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
-
Publication number: 20050055541Abstract: Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread.Type: ApplicationFiled: September 8, 2003Publication date: March 10, 2005Inventors: Tor Aamodt, Hong Wang, Per Hammarlund, John Shen, Steve Liao, Perry Wang
-
Publication number: 20050027941Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: ApplicationFiled: July 31, 2003Publication date: February 3, 2005Inventors: Hong Wang, Perry Wang, Jeffery Brown, Per Hammarlund, George Chrysos, Doron Orenstein, Steve Liao, John Shen
-
Publication number: 20040268100Abstract: In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.Type: ApplicationFiled: June 26, 2003Publication date: December 30, 2004Inventors: Hong Wang, John Shen, Perry Wang, Marsha Eng, Gerolf F. Hoflehner, Dan Lavery, Wei Li, Alejandro Ramirez, Ed Grochowski