Patents by Inventor Milind Girkar

Milind Girkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190196828
    Abstract: An apparatus and method for performing signed fractional multiplication of packed data elements.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Inventors: VENKATESWARA MADDURI, CARL MURRAY, ELMOUSTAPHA OULD-AHMED-VALL, MARK CHARNEY, ROBERT VALENTINE, JESUS CORBAL, MILIND GIRKAR, BRET TOLL
  • Publication number: 20180095758
    Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers is to identify a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, to generate a complex result by summing the four products according to the instruction, and to store a result to the corresponding position of the identified destination operand.
    Type: Application
    Filed: October 1, 2016
    Publication date: April 5, 2018
    Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9910796
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: March 6, 2018
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
  • Publication number: 20180060258
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: November 6, 2017
    Publication date: March 1, 2018
    Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
  • Publication number: 20170206083
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: March 31, 2017
    Publication date: July 20, 2017
    Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
  • Patent number: 8868887
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: November 5, 2004
    Date of Patent: October 21, 2014
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
  • Patent number: 8719839
    Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Shoumeng Yan, Xiaocheng Zhou, Ying Gao, Mohan Rajagopalan, Rajiv Deodhar, David Putzolu, Clark Nelson, Milind Girkar, Robert Geva, Tiger Chen, Sai Luo, Stephen Junkins, Bratin Saha, Ravi Narayanaswamy, Patrick Xi
  • Patent number: 8612949
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: December 17, 2013
    Assignee: Intel Corporation
    Inventors: Shih-wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Publication number: 20130219096
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Application
    Filed: March 15, 2013
    Publication date: August 22, 2013
    Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
  • Publication number: 20130061240
    Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
    Type: Application
    Filed: October 30, 2009
    Publication date: March 7, 2013
    Inventors: Shoumeng Yan, Xiaocheng Zhou, Ying Gao, Mohan Rajagopalan, Rajiv Deodhar, David Putzolu, Clark Nelson, Milind Girkar, Robert Geva, Tiger Chen, Sai Luo, Stephen Junkins, Bratin Saha, Ravi Narayanaswamy, Patrick Xi
  • Patent number: 8037465
    Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: October 11, 2011
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Milind Girkar, David C. Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
  • Publication number: 20100281471
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Application
    Filed: December 31, 2009
    Publication date: November 4, 2010
    Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Patent number: 7657880
    Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.
    Type: Grant
    Filed: August 1, 2003
    Date of Patent: February 2, 2010
    Assignee: Intel Corporation
    Inventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
  • Patent number: 7603546
    Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 28, 2004
    Date of Patent: October 13, 2009
    Assignee: Intel Corporation
    Inventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
  • Patent number: 7571301
    Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: August 4, 2009
    Assignee: Intel Corporation
    Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
  • Patent number: 7487502
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: February 19, 2003
    Date of Patent: February 3, 2009
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
  • Patent number: 7398521
    Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.
    Type: Grant
    Filed: February 13, 2004
    Date of Patent: July 8, 2008
    Assignee: Intel Corporation
    Inventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Publication number: 20080162522
    Abstract: In some embodiments, a data structure may be received in a first processing system. The data structure may represent a plurality of instructions for a second processing system. For at least one instruction of the plurality of instructions, a determination may be made as to whether the instruction can be replaced by a compact instruction for the second processing system. A compact instruction may be generated if the instruction can be replaced by a compact instruction. In some embodiments, an instruction may be received in a processing system. A determination may be made as to whether the instruction is a compact instruction. A decompacted instruction may be generated if the instruction is a compact instruction.
    Type: Application
    Filed: December 29, 2006
    Publication date: July 3, 2008
    Inventors: Guei-Yuan Lueh, Hong Jiang, Andrew T. Riffel, Bixia Zheng, Chu-Cheow Lim, Milind Girkar, David C. Sehr, Thomas A. Piazza
  • Patent number: 7328433
    Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: February 5, 2008
    Assignee: Intel Corporation
    Inventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
  • Publication number: 20070234326
    Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.
    Type: Application
    Filed: March 31, 2006
    Publication date: October 4, 2007
    Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee