Patents by Inventor Nalini Vasudevan

Nalini Vasudevan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and systems to vectorize scalar computer program loops having loop-carried dependences

Patent number: 10402177

Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.

Type: Grant

Filed: July 26, 2017

Date of Patent: September 3, 2019

Assignee: INTEL CORPORATION

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate

Patent number: 10372450

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.

Type: Grant

Filed: July 11, 2017

Date of Patent: August 6, 2019

Assignee: Intel Corporation

Inventors: Victor W. Lee, Daehyun Kim, Tin-Fook Ngai, Jayashankar Bharadwaj, Albert Hartono, Sara Baghsorkhi, Nalini Vasudevan
Instruction to reduce elements in a vector register with strided access pattern

Patent number: 9921832

Abstract: A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially sets one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.

Type: Grant

Filed: December 28, 2012

Date of Patent: March 20, 2018

Assignee: Intel Corporation

Inventors: Albert Hartono, Jayashankar Bharadwaj, Nalini Vasudevan, Sara S. Baghsorkhi, Victor W. Lee, Daehyun Kim
SYSTEMS, APPARATUSES, AND METHODS FOR SETTING AN OUTPUT MASK IN A DESTINATION WRITEMASK REGISTER FROM A SOURCE WRITE MASK REGISTER USING AN INPUT WRITEMASK AND IMMEDIATE

Publication number: 20180067743

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.

Type: Application

Filed: July 11, 2017

Publication date: March 8, 2018

Applicant: Intel Corporation

Inventors: Victor W. LEE, Daehyun KIM, Tin-Fook NGAI, Jayashankar BHARADWAJ, Albert HARTONO, Sara BAGHSORKHI, Nalini VASUDEVAN
Method and apparatus for approximating detection of overlaps between memory ranges

Patent number: 9910650

Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.

Type: Grant

Filed: September 25, 2014

Date of Patent: March 6, 2018

Assignee: Intel Corporation

Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
Loop vectorization methods and apparatus

Patent number: 9898266

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.

Type: Grant

Filed: January 25, 2016

Date of Patent: February 20, 2018

Assignee: Intel Corporation

Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
METHOD AND APPARATUS FOR SPECULATIVE VECTORIZATION

Publication number: 20180018177

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

Type: Application

Filed: July 18, 2017

Publication date: January 18, 2018

Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION USING AN INPUT MASK REGISTER

Publication number: 20180004517

Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

Type: Application

Filed: September 18, 2017

Publication date: January 4, 2018

Inventors: Jayashankar BHARADWAJ, Nalini VASUDEVAN, Victor W. LEE, Daehyun KIM, Albert HARTONO, Sara S. BAGHSORKHI
METHODS AND SYSTEMS TO VECTORIZE SCALAR COMPUTER PROGRAM LOOPS HAVING LOOP-CARRIED DEPENDENCES

Publication number: 20170322786

Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, at runtime, identifying, by executing an instruction with one or more processors, a first loop iteration that cannot be executed in parallel with a second loop iteration due to a set of conflicting scalar loop operations. The first loop iteration is executed after the second loop iteration. The method also includes sectioning, by executing an instruction with one or more processors, a vector loop into vector partitions including a first vector partition. The first vector partition executes consecutive loop iterations in parallel and the consecutive loop iterations start at the second loop iteration and end before the first loop iteration.

Type: Application

Filed: July 26, 2017

Publication date: November 9, 2017

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S Baghsorkhi
Apparatus and method for propagating conditionally evaluated values in SIMD/vector execution using an input mask register

Patent number: 9798541

Abstract: An apparatus and method for propagating conditionally evaluated values are disclosed. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.

Type: Grant

Filed: December 23, 2011

Date of Patent: October 24, 2017

Assignee: INTEL CORPORATION

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Methods and systems to vectorize scalar computer program loops having loop-carried dependences

Patent number: 9733913

Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.

Type: Grant

Filed: February 8, 2016

Date of Patent: August 15, 2017

Assignee: Intel Corporation

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
Automatic loop vectorization using hardware transactional memory

Patent number: 9720667

Abstract: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.

Type: Grant

Filed: March 21, 2014

Date of Patent: August 1, 2017

Assignee: Intel Corporation

Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Nalini Vasudevan, Cheng Wang
Method and apparatus for speculative vectorization

Patent number: 9710279

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

Type: Grant

Filed: September 26, 2014

Date of Patent: July 18, 2017

Assignee: Intel Corporation

Inventors: Nalini Vasudevan, Cheng Wang, Youfeng Wu, Albert Hartono, Sara S. Baghsorkhi
Systems, apparatuses, and methods for setting an output mask in a destination writemask register from a source write mask register using an input writemask and immediate

Patent number: 9703558

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.

Type: Grant

Filed: December 23, 2011

Date of Patent: July 11, 2017

Assignee: Intel Corporation

Inventors: Victor W. Lee, Daehyun Kim, Tin-Fook Ngai, Jayashankar Bharadwaj, Albert Hartono, Sara Baghsorkhi, Nalini Vasudevan
Instruction and logic for cache-based speculative vectorization

Patent number: 9690582

Abstract: A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.

Type: Grant

Filed: December 30, 2013

Date of Patent: June 27, 2017

Assignee: Intel Corporation

Inventors: Nalini Vasudevan, Youfeng Wu, Cheng Wang, Sara Baghsorkhi, Albert Hartono
METHODS AND SYSTEMS TO VECTORIZE SCALAR COMPUTER PROGRAM LOOPS HAVING LOOP-CARRIED DEPENDENCES

Publication number: 20160154638

Abstract: Methods and systems to convert a scalar computer program loop having loop-carried dependences into a vector computer program loop are disclosed. One such method includes, replacing the scalar recurrence operation in the scalar computer program loop with a first vector summing operation and a first vector recurrence operation. The first vector summing operation is to generate a first running sum and the first vector recurrence operation is to generate a first vector. In some examples, the first vector recurrence operation is based on the scalar recurrence operation. Disclosed methods also include inserting: 1) a renaming operation to rename the first vector, 2) a second vector summing operation that is to generate a second running sum; and 3) a second vector recurrence operation to generate a second vector based on the renamed first vector.

Type: Application

Filed: February 8, 2016

Publication date: June 2, 2016

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
LOOP VECTORIZATION METHODS AND APPARATUS

Publication number: 20160139897

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes prior to executing an original loop having iterations, analyzing, via a processor, the iterations of the original loop, identifying a dependency between a first one of the iterations of the original loop and a second one of the iterations of the original loop, after identifying the dependency, vectorizing a first group of the iterations of the original loop based on the identified dependency to form a vectorization loop, and setting a dynamic adjustment value of the vectorization loop based on the identified dependency.

Type: Application

Filed: January 25, 2016

Publication date: May 19, 2016

Inventors: Nalini Vasudevan, Jayashankar Bharadwaj, Christopher J. Hughes, Milind B. Girkar, Mark J. Charney, Robert Valentine, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Method and Apparatus for Approximating Detection of Overlaps Between Memory Ranges

Publication number: 20160092285

Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.

Type: Application

Filed: September 25, 2014

Publication date: March 31, 2016

Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
METHOD AND APPARATUS FOR SPECULATIVE VECTORIZATION

Publication number: 20160092234

Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.

Type: Application

Filed: September 26, 2014

Publication date: March 31, 2016

Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
Apparatus and method for vectorization with speculation support

Patent number: 9268626

Abstract: An apparatus and method are described for detecting and responding to fault conditions in a processor. For example, one embodiment of a method comprises: reading each active element in succession from a first vector register, each active element specifying an address for a gather or load operation; detecting one or more fault conditions associated with one or more of the active elements; for each active element read in succession prior to a detected fault condition on an element other than the first active element, storing the data loaded from an address associated with the active element in a first output vector register; and for each active element associated with the detected fault condition and following the detected fault condition, setting a bit in an output mask register to indicate the detected fault condition.

Type: Grant

Filed: December 23, 2011

Date of Patent: February 23, 2016

Assignee: INTEL CORPORATION

Inventors: Jayashankar Bharadwaj, Victor W. Lee, Kim Daehyun, Nalini Vasudevan, Tin-Fook Ngai, Albert Hartono, Sara S. Baghsorkhi

1 2 next