Patents by Inventor Sridhar Samudrala

Sridhar Samudrala has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9477441
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Grant
    Filed: November 23, 2015
    Date of Patent: October 25, 2016
    Assignee: Intel Corporation
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Patent number: 9389871
    Abstract: An error handling method includes identifying a code region eligible for cumulative multiply add (CMA) optimization and translating code region instructions into interpreter code instructions, which may include translating sequences of multiply add instructions in the code region instructions into fusion code including CMA instructions. Floating point (FP) exceptions generated by the fusion code may be monitored and at least a portion of the code region instructions may be re-translated to eliminate some or all fusion code if CMA intermediate rounding exceptions exceed a threshold.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: July 12, 2016
    Assignee: Intel Corporation
    Inventors: Marc Lupon, Grigorios Magklis, Sridhar Samudrala, Raul Martinez, Kyriakos A. Stavrou, Enric Gibert Codina
  • Patent number: 9329848
    Abstract: A mechanism is described for facilitating dynamic and efficient fusion of computing instructions according to one embodiment. A method of embodiments, as described herein, includes monitoring a software program for a program region having fusion candidate instructions for a fusion operation at a computing system; evaluating whether the macro operation of the candidate instructions is valuable to the software program; and performing the fusion operation if it is evaluated to be valuable.
    Type: Grant
    Filed: March 27, 2013
    Date of Patent: May 3, 2016
    Assignee: Intel Corporation
    Inventors: Marc Lupon, Raul Martinez, Enric Gibert Codina, Kyriakos A. Stavrou, Grigorios Magklis, Sridhar Samudrala
  • Publication number: 20160077802
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Application
    Filed: November 23, 2015
    Publication date: March 17, 2016
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Patent number: 9213523
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 15, 2015
    Assignee: Intel Corporation
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Patent number: 9141386
    Abstract: A semiconductor processor is described. The semiconductor processor includes logic circuitry to perform a logical reduction instruction. The logic circuitry has swizzle circuitry to swizzle a vector's elements so as to form a swizzle vector. The logic circuitry also has vector logic circuitry to perform a vector logic operation on said vector and said swizzle vector.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: September 22, 2015
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Patent number: 9092213
    Abstract: A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: July 28, 2015
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver, Eric W. Mahurin
  • Publication number: 20150195137
    Abstract: A virtual switch connected to at least one virtual machine of multiple virtual machines communicatively connected through an overlay network, receives a data packet, each of the virtual machines configured within a separate one of multiple virtual groups in the overlay network, the data packet comprising a packet header comprising at least one address. The virtual switch receives a virtual group identifier for the at least one address from at least one address resolution service returning the virtual group identifier and a resolved address for the at least one address, in response to an address resolution request for the at least one address. The virtual switch sends the data packet through the virtual switch to the resolved address only if the virtual group identifier is allowed according to a filtering policy applied by the virtual switch for a particular virtual group identified by the virtual group identifier.
    Type: Application
    Filed: January 6, 2014
    Publication date: July 9, 2015
    Inventors: VIVEK KASHYAP, SRIDHAR SAMUDRALA, DAVID L. STEVENS, JR.
  • Publication number: 20150026671
    Abstract: A mechanism is described for facilitating dynamic and efficient fusion of computing instructions according to one embodiment. A method of embodiments, as described herein, includes monitoring a software program for a program region having fusion candidate instructions for a fusion operation at a computing system; evaluating whether the macro operation of the candidate instructions is valuable to the software program; and performing the fusion operation if it is evaluated to be valuable.
    Type: Application
    Filed: March 27, 2013
    Publication date: January 22, 2015
    Inventors: Marc Lupon, Raul Martinez, Enric Gibert Codina, Kyriakos A. Stavrou, Grigorios Magklis, Sridhar Samudrala
  • Publication number: 20140372727
    Abstract: Vector blend and permute functionality are provided, responsive to instructions specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, a second vector register, and a third operand. Indices are read from fields in the second register. Each index has a first selector portion and a second selector portion. Corresponding unmasked vector elements are stored to fields of the destination register, wherein each vector element, responsive to the respective first selector portion having a first value, is copied to an intermediate vector from a corresponding data field of the first register, and responsive to the respective first selector portion having a second value, is copied to the intermediate vector from a corresponding data field of the third operand. Then unmasked data fields of the destination are replaced by data fields in the intermediate vector indexed by the corresponding second selector portions.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 18, 2014
    Applicant: INTEL CORPORATION
    Inventors: Robert Valentine, Bret L. Toll, Jesus Corbal, Jeff G. Wiedemeier, Sridhar Samudrala
  • Publication number: 20140281419
    Abstract: An error handling method includes identifying a code region eligible for cumulative multiply add (CMA) optimization and translating code region instructions into interpreter code instructions, which may include translating sequences of multiply add instructions in the code region instructions into fusion code including CMA instructions. Floating point (FP) exceptions generated by the fusion code may be monitored and at least a portion of the code region instructions may be re-translated to eliminate some or all fusion code if CMA intermediate rounding exceptions exceed a threshold.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: Intel Corporation
    Inventors: Marc Lupon, Grigorios Magklis, Sridhar Samudrala, Raul Martinez, Kyriakos A. Stavrou, Enric Gibert Codina
  • Publication number: 20140149724
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Application
    Filed: January 31, 2014
    Publication date: May 29, 2014
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Patent number: 8667042
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Publication number: 20140006467
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Publication number: 20130326199
    Abstract: Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 5, 2013
    Inventors: Grigorios Magklis, Josep M. Codina, Craig B. Zilles, Michael Neilly, Sridhar Samudrala, Alejandro Martinez Vicente, Polychronis Xekalakis, F. Jesus Sanchez, Marc Lupon, Georgios Tournavitis, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Carlos Madriles Gimeno, Pedro Marcuello, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou
  • Publication number: 20130305020
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Application
    Filed: September 30, 2011
    Publication date: November 14, 2013
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Publication number: 20120254588
    Abstract: Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Inventors: Jesus Corbal San Adrian, Bret L. Toll, Robert C. Valentine, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Andrew Thomas Forsyth, Elmoustapha Ould-Ahmed-Vall, Dennis R. Bradford, Lisa K. Wu
  • Publication number: 20120254592
    Abstract: Embodiments of systems, apparatuses, and methods for performing an expand and/or compress instruction in a computer processor are described. In some embodiments, the execution of an expand instruction causes the selection of elements from a source that are to be sparsely stored in a destination based on values of the writemask and store each selected data element of the source as a sparse data element into a destination location, wherein the destination locations correspond to each writemask bit position that indicates that the corresponding data element of the source is to be stored.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Inventors: Jesus Corbal San Adrian, Roger Espasa Sans, Robert C. Valentine, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Andrew Thomas Forsyth, Victor W. Lee
  • Publication number: 20120078992
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Publication number: 20120079253
    Abstract: A method of performing vector operations on a semiconductor chip is described. The method includes performing a first vector instruction with a vector functional unit implemented on the semiconductor chip and performing a second vector instruction with the vector functional unit. The first vector instruction is a vector multiply add instruction. The second vector instruction is a vector leading zeros count instruction.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver, Eric W. Mahurin