Patents by Inventor Perry Wang

Perry Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20080256330
    Abstract: Compiling a source code program for a heterogeneous multi-core processor having a first instruction sequencer, having a first instruction set architecture, an accelerator to the first instruction sequencer, wherein the accelerator comprises a heterogeneous resource with respect to the first instruction sequencer having a second instruction set architecture, the source code program having specified therein a region of source code for the first instruction set architecture of the processor and a region of source code for the second instruction set architecture of the processor.
    Type: Application
    Filed: April 13, 2007
    Publication date: October 16, 2008
    Inventors: Perry Wang, Jamison Collins, Gautham Chinya, Hong Jiang, Hong Wang, Xinmin Tian, Guei-Yuan Lueh
  • Patent number: 7398521
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Grant
    Filed: February 13, 2004
    Date of Patent: July 8, 2008
    Assignee: Intel Corporation
    Inventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Publication number: 20080163366
    Abstract: In one embodiment, the present invention includes a method for receiving a request from a user-level agent for programming of a user-level privilege for at least one architectural resource of an application-managed sequencer (AMS) and programming the user-level privilege for the at least one architectural resource using an operating system-managed sequencer (OMS) coupled to the AMS. Other embodiments are described and claimed.
    Type: Application
    Filed: December 29, 2006
    Publication date: July 3, 2008
    Inventors: Gautham Chinya, Perry Wang, Hong Wang, Jamison Collins, Richard A. Hankins, Per Hammarlund, John Shen
  • Publication number: 20080077909
    Abstract: Embodiments described herein disclose a system for enabling emulation of a MIMD ISA extension which supports user-level sequencer management and control, and a set of privileged code executed by both operating system managed sequencers and application managed sequencers, including different sets of persistent per-CPU and per-thread data. In one embodiment, a lightweight code layer executes beneath the operating system. This code layer is invoked in response to particular monitored events, such as the need for communication between an operating system managed sequencer and an application managed sequencer. Control is transferred to this code layer, for execution of special operations, after which control returns back to originally executing code. The code layer is normally dormant and can be invoked at any time when either a user application or the operating system is executing.
    Type: Application
    Filed: September 27, 2006
    Publication date: March 27, 2008
    Inventors: Jamison Collins, Perry Wang, Bernard Lint, Koichi Yamada, Asit Mallick, Richard A. Hankins, Gautham Chinya
  • Patent number: 7328433
    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: February 5, 2008
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
  • Patent number: 7260705
    Abstract: In one embodiment, the invention provides a method for examining information about branch instructions. A method, comprising: examining information about branch instructions that reach a write-back stage of processing within a processor, defining a plurality of streams based on the examining, wherein each stream comprises a sequence of basic blocks in which only a last block in the sequence ends in a branch instruction, the execution of which causes program flow to branch, the remaining basic blocks in the stream each ending in a branch instruction, the execution of which does not cause program flow to branch.
    Type: Grant
    Filed: June 26, 2003
    Date of Patent: August 21, 2007
    Assignee: Intel Corporation
    Inventors: Hong Wang, John Shen, Perry Wang, Marsha Eng, Gerolf F. Hoflehner, Dan Lavery, Wei Li, Alejandro Ramirez, Ed Grochowski
  • Patent number: 7228528
    Abstract: In one embodiment, the invention provides a method for the processing of instructions. A method which comprises analyzing a dynamic execution trace for a program; identifying at least one stream comprising a plurality of basic blocks in the dynamic execution trace; collecting metrics associated with the at least one stream; and optimizing the at least one stream based on the metrics.
    Type: Grant
    Filed: June 26, 2003
    Date of Patent: June 5, 2007
    Assignee: Intel Corporation
    Inventors: Hong Wang, Marsha Eng, Perry Wang, John P. Shen, Gerolf F. Hoflehner, Daniel Lavery, Wei Li
  • Publication number: 20070079298
    Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Inventors: Xinmin Tian, Milind Girkar, David Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
  • Patent number: 7069545
    Abstract: Software reuse instances are found from an execution trace through a process of quantization, discovery, and synthesis. Quantization includes mapping n-dimensional vectors that correspond to instructions, live-in states, and live-out states to one dimensional symbols, and arranging the symbols into a text in program execution order. Discovery includes the identification of recurrent symbols and recurrent phrases of symbols within the text. Recurrent symbols and phrases correspond to reuse instances. Compression algorithms are applied to identify the recurrent symbols and phrases. Synthesis can include correlating the reuse instances with the binary program to identify the reuse regions within the software program. Synthesis can also include generating non-essential code and corresponding triggers for a conjugate processor.
    Type: Grant
    Filed: December 29, 2000
    Date of Patent: June 27, 2006
    Assignee: Intel Corporation
    Inventors: Hong Wang, Perry Wang, Ralph Kling, Neil A. Chazin, John Shen
  • Publication number: 20060070047
    Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.
    Type: Application
    Filed: September 28, 2004
    Publication date: March 30, 2006
    Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
  • Publication number: 20050223199
    Abstract: A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
    Type: Application
    Filed: March 31, 2004
    Publication date: October 6, 2005
    Inventors: Edward Grochowski, Hong Wang, John Shen, Perry Wang, Jamison Collins, James Held, Partha Kundu, Raya Leviathan, Tin-Fook Ngai
  • Publication number: 20050166039
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: November 5, 2004
    Publication date: July 28, 2005
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
  • Publication number: 20050149697
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and an event detector to detect a long latency event associated with a synchronization object. The event detector can cause a first thread switch in response to the long latency event associated with the synchronization object. The apparatus may also include a spin detector to detect that the synchronization object is a contended synchronization object. The spin detector can cause a second thread switch in response to the detection of the contended synchronization object to enable a spin detect response.
    Type: Application
    Filed: March 2, 2005
    Publication date: July 7, 2005
    Inventors: Natalie Enright, Jamison Collins, Perry Wang, Hong Wang, Xinmin Tran, John Shen, Gad Sheaffer, Per Hammarlund
  • Publication number: 20050125802
    Abstract: A virtual multithreading hardware mechanism provides multi-threading on a single-threaded processor. Thread switches are triggered by user-defined triggers. Synchronous triggers may be defined in the form of special trigger instructions. Asynchronous triggers may be defined via special marking instructions that identify an asynchronous trigger condition. The asynchronous trigger condition may be based on a plurality of atomic processor events. Minimal context information, such as only an instruction pointer address, is maintained by the hardware upon a thread switch. In contrast to traditional simultaneous multithreading schemes, the virtual multithreading hardware provides thread switches that are transparent to an operating system and that may be performed without operating system intervention.
    Type: Application
    Filed: December 5, 2003
    Publication date: June 9, 2005
    Inventors: Perry Wang, Hong Wang, John Shen, Ashok Seshadri, Anthony Mah, William Greene, Ravi Chandran, Piyush Desai, Steve Liao
  • Publication number: 20050086652
    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
    Type: Application
    Filed: October 2, 2003
    Publication date: April 21, 2005
    Inventors: Xinmin Tian, Shih-Wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
  • Publication number: 20050081207
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Application
    Filed: February 13, 2004
    Publication date: April 14, 2005
    Inventors: Gerolf Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20050071841
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Application
    Filed: September 30, 2003
    Publication date: March 31, 2005
    Inventors: Gerolf Hoflehner, Shih-Wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20050071438
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Application
    Filed: September 30, 2003
    Publication date: March 31, 2005
    Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf Hoflehner, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20050055541
    Abstract: Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread.
    Type: Application
    Filed: September 8, 2003
    Publication date: March 10, 2005
    Inventors: Tor Aamodt, Hong Wang, Per Hammarlund, John Shen, Steve Liao, Perry Wang
  • Publication number: 20050027941
    Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.
    Type: Application
    Filed: July 31, 2003
    Publication date: February 3, 2005
    Inventors: Hong Wang, Perry Wang, Jeffery Brown, Per Hammarlund, George Chrysos, Doron Orenstein, Steve Liao, John Shen