Patents by Inventor Milind B. Girkar

Milind B. Girkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9411592
    Abstract: Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: August 9, 2016
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Brett L. Toll
  • Publication number: 20160188343
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for DSX comprises decoder hardware to decode a class of instructions to support data speculative execution (DSX) including an instruction to begin a DSX, end a DSX, and speculative instructions to execute during a DSX, and execution hardware to speculatively execute decoded instructions that support DSX including the speculative instructions and update speculative instruction tracking hardware.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Publication number: 20160188328
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to reset data speculative execution (DSX) tracking hardware to track speculative memory accesses, clear a DSX status indication in a DSX status register, and commit all speculatively executed stores of the DSX region and thereby end a DSX region.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Publication number: 20160188330
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Publication number: 20160188329
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to continue a data speculative execution (DSX) and to determine that a DSX loop iteration is to be committed, commit speculative stores associated with the DSX loop iteration, and start a new DSX loop iteration.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Publication number: 20160188530
    Abstract: An apparatus and method for performing a vector permute.
    Type: Application
    Filed: December 27, 2014
    Publication date: June 30, 2016
    Inventors: JESUS CORBAL SAN ADRIAN, ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, MARK J. CHARNEY, MILIND B. GIRKAR, BRET L. TOLL, ROGER ESPASA, GUILLEM SOLE, JAIRO BALART, BRIAN HICKMAN
  • Publication number: 20160188333
    Abstract: An apparatus and method for mask compression. For example, one embodiment of a processor comprises: a source mask register to store a plurality of mask bits including a plurality of set bits and a plurality of bits that are not set; a destination mask register to store set bits read from the source mask register; and mask compression logic to read each of the set bits from the source mask register and to store the set bits in contiguous bit locations on one side of the destination mask register.
    Type: Application
    Filed: December 27, 2014
    Publication date: June 30, 2016
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL SAN ADRIAN, BRETT L. TOLL, MILIND B. GIRKAR, MARK J. CHARNEY, GUILLEM SOLE, ROGER ESPASA
  • Publication number: 20160188382
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR, Hideki IDO, Youfeng WU, Cheng WANG
  • Publication number: 20160188342
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction inside a speculative execution (DSX) and rollback execution to a stored address and clear a DSX status indication in a DSX status register, and thereby abort the DSX.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR
  • Publication number: 20160179530
    Abstract: In several embodiments, vector extensions to an instruction set architecture include instructions to perform saturated signed and unsigned integer additions. In one embodiment, a vector signed integer add with signed saturation is provided. In one embodiment, a vector unsigned integer add with unsigned saturation is provided. In one embodiment, packed doubleword and quadword integers are supported for both signed and unsigned instructions.
    Type: Application
    Filed: December 23, 2014
    Publication date: June 23, 2016
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Bret L. Toll, Jesus Corbal, Mark J. Charney, Milind B. Girkar
  • Publication number: 20160179528
    Abstract: An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.
    Type: Application
    Filed: December 23, 2014
    Publication date: June 23, 2016
    Inventors: CHRISTOPHER J. HUGHES, ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, MILIND B. GIRKAR
  • Publication number: 20160139897
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.
    Type: Application
    Filed: January 25, 2016
    Publication date: May 19, 2016
    Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Publication number: 20160140039
    Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.
    Type: Application
    Filed: November 14, 2014
    Publication date: May 19, 2016
    Inventors: Avinash Sodani, Robert J. Kyanko, Richard J. Greco, Andreas Kleen, Milind B. Girkar, Christopher M. Cantalupo
  • Patent number: 9323531
    Abstract: The execution of a KZBTZ finds a trailing least significant zero bit position in an first input mask and sets an output mask to have the values of the first input mask, but with all bit positions closer to the most significant bit position than the trailing least significant zero bit position in an first input mask set to zero. In some embodiments, a second input mask is used as a writemask such that bit positions of the first input mask are not considered in the trailing least significant zero bit position calculation depending upon a corresponding bit position in the second input mask.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: April 26, 2016
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Jesus Corbal, Milind B. Girkar, Elmoustapha Ould-Ahmed—Vall, Bret L. Toll, Robert Valentine
  • Publication number: 20160055004
    Abstract: An apparatus and method are described for non-speculative execution of conditional instructions. For example, one embodiment of a processor comprises: a register set including a first register to store a set of one or more condition bits; non-speculative execution logic to execute a first instruction to identify a first target instruction strand in response to a first conditional value read from the set of condition bits, the first instruction to wait until the first conditional value becomes known before causing the first target instruction strand to be fetched and executed, the non-speculative execution logic to execute a second instruction to identify an end of the first target instruction strand and responsively identify a new current instruction pointer for instructions which follow the second instruction; and out-of-order execution logic to fetch and execute the instructions which follow the second instruction prior to the execution of the second instruction.
    Type: Application
    Filed: August 21, 2014
    Publication date: February 25, 2016
    Inventors: EDWARD T. GROCHOWSKI, MILIND B. GIRKAR, VICTOR W. LEE, DMITRY M. MASLENNIKOV, ROBERT VALENTINE, SERGEY A. ROZHKOV, BORIS A. BABAYAN
  • Patent number: 9244677
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: January 26, 2016
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9069605
    Abstract: Method, apparatus and system embodiments to schedule OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. A scheduler routine may run on each enabled sequencer. The schedulers may retrieve shred descriptors from a queue system. The sequencer associated with the scheduler may then execute the shred described by the descriptor. Other embodiments are also described and claimed.
    Type: Grant
    Filed: October 30, 2013
    Date of Patent: June 30, 2015
    Assignee: Intel Corporation
    Inventors: Richard A. Hankins, Hong Wang, Gautham N. Chinya, Trung A. Diep, Shivnandan D. Kaushik, Bryant E. Bigbee, John P. Shen, Asit K. Mallick, Baiju V. Patel, James Paul Held, Milind B. Girkar, Prashant Sethi, Xinmin Tian
  • Patent number: 8996923
    Abstract: A processor includes an execution unit, a fault mask coupled to the execution unit, and a suppress mask coupled to the execution unit. The fault mask is to store a first plurality of bit values to indicate which elements of a multi-element vector have an associated fault generated in response to execution of an instruction on the element in the execution unit. The suppress mask is to store a second plurality of bit values to indicate which of the elements are to have an associated fault suppressed. The processor also includes counter logic to increment a counter in response to an indication of a first fault associated with the first element and received from the fault mask, and an indication of a first suppression associated with the first element and received from the suppress mask. Other embodiments are described as claimed.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: March 31, 2015
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Jesus Corbal, Mark J. Charney, Milind B. Girkar, Elmoustapha Ould-Ahmed-Vall, Robert Valentine
  • Patent number: 8972698
    Abstract: A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: March 3, 2015
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mark J. Charney, Yen-Kuang Chen, Jesus Corbal, Andrew T. Forsyth, Milind B. Girkar, Jonathan C. Hall, Hideki Ido, Robert Valentine, Jeffrey Wiedemeier
  • Publication number: 20150052333
    Abstract: Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask.
    Type: Application
    Filed: July 25, 2014
    Publication date: February 19, 2015
    Inventors: Christopher J. HUGHES, Jesus Corbal SAN ADRIAN, Roger Espasa SANS, Bret TOLL, Robert C. VALENTINE, Milind B. GIRKAR, Andrew T. FORSYTH, Edward T. GROCHOWSKI, Jonathan C. HALL