Patents by Inventor Nalini Vasudevan

Nalini Vasudevan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10402177
    Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.
    Type: Grant
    Filed: July 26, 2017
    Date of Patent: September 3, 2019
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 10372450
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
    Type: Grant
    Filed: July 11, 2017
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Victor W. Lee, Daehyun Kim, Tin-Fook Ngai, Jayashankar Bharadwaj, Albert Hartono, Sara Baghsorkhi, Nalini Vasudevan
  • Patent number: 9921832
    Abstract: A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially sets one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: March 20, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Jayashankar Bharadwaj, Nalini Vasudevan, Sara S. Baghsorkhi, Victor W. Lee, Daehyun Kim
  • Publication number: 20180067743
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
    Type: Application
    Filed: July 11, 2017
    Publication date: March 8, 2018
    Applicant: Intel Corporation
    Inventors: Victor W. LEE, Daehyun KIM, Tin-Fook NGAI, Jayashankar BHARADWAJ, Albert HARTONO, Sara BAGHSORKHI, Nalini VASUDEVAN
  • Patent number: 9910650
    Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: March 6, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
  • Patent number: 9898266
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.
    Type: Grant
    Filed: January 25, 2016
    Date of Patent: February 20, 2018
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Publication number: 20180018177
    Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
    Type: Application
    Filed: July 18, 2017
    Publication date: January 18, 2018
    Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
  • Publication number: 20180004517
    Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
    Type: Application
    Filed: September 18, 2017
    Publication date: January 4, 2018
    Inventors: Jayashankar BHARADWAJ, Nalini VASUDEVAN, Victor W. LEE, Daehyun KIM, Albert HARTONO, Sara S. BAGHSORKHI
  • Publication number: 20170322786
    Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.
    Type: Application
    Filed: July 26, 2017
    Publication date: November 9, 2017
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S Baghsorkhi
  • Patent number: 9798541
    Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: October 24, 2017
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9733913
    Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.
    Type: Grant
    Filed: February 8, 2016
    Date of Patent: August 15, 2017
    Assignee: Intel Corporation
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9720667
    Abstract: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: August 1, 2017
    Assignee: Intel Corporation
    Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Nalini Vasudevan, Cheng Wang
  • Patent number: 9710279
    Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: July 18, 2017
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Cheng Wang, Youfeng Wu, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 9703558
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: July 11, 2017
    Assignee: Intel Corporation
    Inventors: Victor W. Lee, Daehyun Kim, Tin-Fook Ngai, Jayashankar Bharadwaj, Albert Hartono, Sara Baghsorkhi, Nalini Vasudevan
  • Patent number: 9690582
    Abstract: A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: June 27, 2017
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Youfeng Wu, Cheng Wang, Sara Baghsorkhi, Albert Hartono
  • Publication number: 20160154638
    Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.
    Type: Application
    Filed: February 8, 2016
    Publication date: June 2, 2016
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
  • Publication number: 20160139897
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.
    Type: Application
    Filed: January 25, 2016
    Publication date: May 19, 2016
    Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Publication number: 20160092285
    Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
    Type: Application
    Filed: September 25, 2014
    Publication date: March 31, 2016
    Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
  • Publication number: 20160092234
    Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
    Type: Application
    Filed: September 26, 2014
    Publication date: March 31, 2016
    Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
  • Patent number: 9268626
    Abstract: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: February 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Jayashankar Bharadwaj, Victor W. Lee, Kim Daehyun, Nalini Vasudevan, Tin-Fook Ngai, Albert Hartono, Sara S. Baghsorkhi