Patents by Inventor Milind Girkar

Milind Girkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20070079298
    Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.
    Type: Application
    Filed: September 30, 2005
    Publication date: April 5, 2007
    Inventors: Xinmin Tian, Milind Girkar, David Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
  • Publication number: 20060288334
    Abstract: Techniques for execution-driven loop splitting and load-safe code hosting are provided. Compiled code includes statements associated with an original loop and statements associated with an alternative loop. The alternative loop reproduces the original loop except for conditional load-safe invariant expressions that appeared in the original loop and that are separated out of the alternative loop. During processing, once the conditional load-safe invariant expressions are computed and referenced for a first time within the original loop, processing dynamically switches to the alternative loop where the conditional load-safe invariant expressions are computed outside of the alternative loop and referenced from within the alternative loop.
    Type: Application
    Filed: June 21, 2005
    Publication date: December 21, 2006
    Inventors: Xinmin Tian, Milind Girkar
  • Publication number: 20060150184
    Abstract: Method, apparatus and system embodiments to schedule OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. A scheduler routine may run on each enabled sequencer. The schedulers may retrieve shred descriptors from a queue system. The sequencer associated with the scheduler may then execute the shred described by the descriptor. Other embodiments are also described and claimed.
    Type: Application
    Filed: December 30, 2004
    Publication date: July 6, 2006
    Inventors: Richard Hankins, Hong Wang, Gautham Chinya, Trung Diep, Shivnandan Kaushik, Bryant Bigbee, John Shen, Asit Mallick, Baiju Patel, James Held, Milind Girkar, Prashant Sethi, Xinmin Tian
  • Publication number: 20060070047
    Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.
    Type: Application
    Filed: September 28, 2004
    Publication date: March 30, 2006
    Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
  • Patent number: 7020873
    Abstract: An apparatus and method for vectorization of detected saturation and clipping operations in serial code loops of a source program are described. In one embodiment, the method includes the analysis of source program code to identify source code utilizing conditional constructs to perform saturation/clipping operations. Once analysis is complete, identified source code is vectorized to implement identified saturation/clipping operations utilizing single instruction, multiple data (SIMD) saturation/clipping instructions. Accordingly, utilizing embodiments of the present invention, conditional statements utilized to implement saturation arithmetic, as well as clipping of data values, such as pixel values within graphics applications, are replaced with SIMD saturation arithmetic instructions, as well as clipping instructions.
    Type: Grant
    Filed: June 21, 2002
    Date of Patent: March 28, 2006
    Assignee: Intel Corporation
    Inventors: Aart J. C. Bik, Milind Girkar
  • Publication number: 20050166039
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: November 5, 2004
    Publication date: July 28, 2005
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
  • Publication number: 20050086652
    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
    Type: Application
    Filed: October 2, 2003
    Publication date: April 21, 2005
    Inventors: Xinmin Tian, Shih-Wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
  • Publication number: 20050081207
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Application
    Filed: February 13, 2004
    Publication date: April 14, 2005
    Inventors: Gerolf Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20050071841
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Application
    Filed: September 30, 2003
    Publication date: March 31, 2005
    Inventors: Gerolf Hoflehner, Shih-Wei Liao, Xinmin Tian, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20050071438
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Application
    Filed: September 30, 2003
    Publication date: March 31, 2005
    Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf Hoflehner, Hong Wang, Daniel Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John Shen
  • Publication number: 20040163083
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: February 19, 2003
    Publication date: August 19, 2004
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
  • Publication number: 20040154012
    Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.
    Type: Application
    Filed: August 1, 2003
    Publication date: August 5, 2004
    Inventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
  • Publication number: 20040006667
    Abstract: An apparatus and method for implementing adjacent, single non-unit stride memory access patterns are described. In one embodiment, the method includes compiler analysis of a source program to detect vectorizable loops having serial code statements that collectively perform adjacent, non-unit stride memory access. Once a vectorizable loop containing code statements that collectively perform adjacent, non-unit stride memory access in detected, the compiler vectorizes the serial code statements of the detected loop to perform the adjacent, non-unit stride memory access utilizing SIMD instructions. As such, the compiler repeats the analysis and vectorization for each vectorizable loop within the source program code.
    Type: Application
    Filed: June 21, 2002
    Publication date: January 8, 2004
    Inventors: Aart J.C. Bik, Milind Girkar
  • Publication number: 20040001066
    Abstract: An apparatus and method for vectorization of detected saturation and clipping operations in serial code loops of a source program are described. In one embodiment, the method includes the analysis of source program code to identify source code utilizing conditional constructs to perform saturation/clipping operations. Once analysis is complete, identified source code is vectorized to implement identified saturation/clipping operations utilizing single instruction, multiple data (SIMD) saturation/clipping instructions. Accordingly, utilizing embodiments of the present invention, conditional statements utilized to implement saturation arithmetic, as well as clipping of data values, such as pixel values within graphics applications, are replaced with SIMD saturation arithmetic instructions, as well as clipping instructions.
    Type: Application
    Filed: June 21, 2002
    Publication date: January 1, 2004
    Inventors: Aart J.C. Bik, Milind Girkar
  • Patent number: 6367070
    Abstract: A method for recognizing a parallel executable region within a sequence of source code. The parallel executable region is identified by locating a loop structure within the sequence of source code. The loop structure includes a field controlled by an induction variable. Furthermore, the field is set in every iteration of the loop structure.
    Type: Grant
    Filed: January 13, 1998
    Date of Patent: April 2, 2002
    Assignee: Intel Corporation
    Inventors: Mohammad R. Haghighat, Milind Girkar
  • Patent number: 6272676
    Abstract: A method and apparatus for finding loop_level parallelism in a sequence of instructions. In one embodiment, the method includes the steps of determining if a variable which identifies a memory address of a data structure is an induction variable; and determining if execution of the sequence of instructions terminates in response to a comparison of the variable to an invariant value. If the two conditions of the present invention are found to be true, the respective sequence of instructions is a candidate to be flagged for multi-threading execution, assuming the loop of the instructions terminates.
    Type: Grant
    Filed: January 13, 1998
    Date of Patent: August 7, 2001
    Assignee: Intel Corporation
    Inventors: Mohammad R. Haghighat, Milind Girkar