Patents by Inventor Milind Girkar
Milind Girkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190196828Abstract: An apparatus and method for performing signed fractional multiplication of packed data elements.Type: ApplicationFiled: December 21, 2017Publication date: June 27, 2019Inventors: VENKATESWARA MADDURI, CARL MURRAY, ELMOUSTAPHA OULD-AHMED-VALL, MARK CHARNEY, ROBERT VALENTINE, JESUS CORBAL, MILIND GIRKAR, BRET TOLL
-
Publication number: 20180095758Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers is to identify a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, to generate a complex result by summing the four products according to the instruction, and to store a result to the corresponding position of the identified destination operand.Type: ApplicationFiled: October 1, 2016Publication date: April 5, 2018Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 9910796Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: GrantFiled: March 15, 2013Date of Patent: March 6, 2018Assignee: Intel CorporationInventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
-
Publication number: 20180060258Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: ApplicationFiled: November 6, 2017Publication date: March 1, 2018Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
-
Publication number: 20170206083Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: ApplicationFiled: March 31, 2017Publication date: July 20, 2017Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
-
Patent number: 8868887Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: GrantFiled: November 5, 2004Date of Patent: October 21, 2014Assignee: Intel CorporationInventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
-
Patent number: 8719839Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.Type: GrantFiled: October 30, 2009Date of Patent: May 6, 2014Assignee: Intel CorporationInventors: Shoumeng Yan, Xiaocheng Zhou, Ying Gao, Mohan Rajagopalan, Rajiv Deodhar, David Putzolu, Clark Nelson, Milind Girkar, Robert Geva, Tiger Chen, Sai Luo, Stephen Junkins, Bratin Saha, Ravi Narayanaswamy, Patrick Xi
-
Patent number: 8612949Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.Type: GrantFiled: December 31, 2009Date of Patent: December 17, 2013Assignee: Intel CorporationInventors: Shih-wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
-
Publication number: 20130219096Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: ApplicationFiled: March 15, 2013Publication date: August 22, 2013Inventors: HONG WANG, PER HAMMARLUND, XIANG ZOU, JOHN P. SHEN, XINMIN TIAN, MILIND GIRKAR, PERRY H. WANG, PIYUSH N. DESAI
-
Publication number: 20130061240Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.Type: ApplicationFiled: October 30, 2009Publication date: March 7, 2013Inventors: Shoumeng Yan, Xiaocheng Zhou, Ying Gao, Mohan Rajagopalan, Rajiv Deodhar, David Putzolu, Clark Nelson, Milind Girkar, Robert Geva, Tiger Chen, Sai Luo, Stephen Junkins, Bratin Saha, Ravi Narayanaswamy, Patrick Xi
-
Patent number: 8037465Abstract: Thread-data affinity optimization can be performed by a compiler during the compiling of a computer program to be executed on a cache coherent non-uniform memory access (cc-NUMA) platform. In one embodiment, the present invention includes receiving a program to be compiled. The received program is then compiled in a first pass and executed. During execution, the compiler collects profiling data using a profiling tool. Then, in a second pass, the compiler performs thread-data affinity optimization on the program using the collected profiling data.Type: GrantFiled: September 30, 2005Date of Patent: October 11, 2011Assignee: Intel CorporationInventors: Xinmin Tian, Milind Girkar, David C. Sehr, Richard Grove, Wei Li, Hong Wang, Chris Newburn, Perry Wang, John Shen
-
Publication number: 20100281471Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.Type: ApplicationFiled: December 31, 2009Publication date: November 4, 2010Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
-
Patent number: 7657880Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.Type: GrantFiled: August 1, 2003Date of Patent: February 2, 2010Assignee: Intel CorporationInventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
-
Patent number: 7603546Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.Type: GrantFiled: September 28, 2004Date of Patent: October 13, 2009Assignee: Intel CorporationInventors: Satish Narayanasamy, Hong Wang, John Shen, Roni Rosner, Yoav Almog, Naftali Schwartz, Gerolf Hoflehner, Daniel LaVery, Wei Li, Xinmin Tian, Milind Girkar, Perry Wang
-
Patent number: 7571301Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.Type: GrantFiled: March 31, 2006Date of Patent: August 4, 2009Assignee: Intel CorporationInventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee
-
Patent number: 7487502Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.Type: GrantFiled: February 19, 2003Date of Patent: February 3, 2009Assignee: Intel CorporationInventors: Hong Wang, Per Hammarlund, Xiang Zou, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Piyush Desai
-
Patent number: 7398521Abstract: Methods and apparatuses for thread management for multi-threading are described herein. In one embodiment, exemplary process includes selecting, during a compilation of code having one or more threads executable in a data processing system, a current thread having a most bottom order, determining resources allocated to one or more child threads spawned from the current thread, and allocating resources for the current thread in consideration of the resources allocated to the current thread's one or more child threads to avoid resource conflicts between the current thread and its one or more child threads. Other methods and apparatuses are also described.Type: GrantFiled: February 13, 2004Date of Patent: July 8, 2008Assignee: Intel CorporationInventors: Gerolf F. Hoflehner, Shih-wei Liao, Xinmin Tian, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
-
Publication number: 20080162522Abstract: In some embodiments, a data structure may be received in a first processing system. The data structure may represent a plurality of instructions for a second processing system. For at least one instruction of the plurality of instructions, a determination may be made as to whether the instruction can be replaced by a compact instruction for the second processing system. A compact instruction may be generated if the instruction can be replaced by a compact instruction. In some embodiments, an instruction may be received in a processing system. A determination may be made as to whether the instruction is a compact instruction. A decompacted instruction may be generated if the instruction is a compact instruction.Type: ApplicationFiled: December 29, 2006Publication date: July 3, 2008Inventors: Guei-Yuan Lueh, Hong Jiang, Andrew T. Riffel, Bixia Zheng, Chu-Cheow Lim, Milind Girkar, David C. Sehr, Thomas A. Piazza
-
Patent number: 7328433Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.Type: GrantFiled: October 2, 2003Date of Patent: February 5, 2008Assignee: Intel CorporationInventors: Xinmin Tian, Shih-wei Liao, Hong Wang, Milind Girkar, John Shen, Perry Wang, Grant Haab, Gerolf Hoflehner, Daniel Lavery, Hideki Saito, Sanjiv Shah, Dongkeun Kim
-
Publication number: 20070234326Abstract: A method for improving parallel processing of computer programs. DOACROSS loops and similar code are identified and parallelized using a post-wait control structure. The post-wait control structure may be implemented to include any one of a single counter to enforce an order of execution, an array to track code completion that is indexed by a modulus of a positive integer number, and/or a set of arrays to track a last code completed by a thread and a current code being executed by a thread.Type: ApplicationFiled: March 31, 2006Publication date: October 4, 2007Inventors: Arun Kejariwal, Hideki Saito, Xinmin Tian, Milind Girkar, Sanjiv Shah, Wei Li, Utpal Banerjee