Patents by Inventor Bret L. Toll

Bret L. Toll has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9990202
    Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
    Type: Grant
    Filed: June 28, 2013
    Date of Patent: June 5, 2018
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Ronak Singhal, Buford M. Guy, Mishali Naik
  • Publication number: 20180150301
    Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
    Type: Application
    Filed: May 20, 2013
    Publication date: May 31, 2018
    Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
  • Patent number: 9983873
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor mask bit compression in response to a single mask bit compression instruction that includes a source writemask register operand, a destination writemask register operand, and an opcode are described.
    Type: Grant
    Filed: May 31, 2016
    Date of Patent: May 29, 2018
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Publication number: 20180136936
    Abstract: A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.
    Type: Application
    Filed: December 27, 2017
    Publication date: May 17, 2018
    Applicant: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Wajdi K. Feghali, Erdinc Ozturk, Gilbert M. Wolrich, Martin G. Dixon, Mark C. Davis, Sean P. Mirkes, Alexandre J. Farcy, Bret L. Toll, Maxim Loktyukhin
  • Publication number: 20180129506
    Abstract: According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
    Type: Application
    Filed: January 4, 2018
    Publication date: May 10, 2018
    Inventors: Christopher J. Hughes, Yen-Kuang (Y.K.) Chen, Mayank Bomb, Jason W. Brandt, Mark J. Buxton, Mark J. Charney, Srinivas Chennupaty, Jesus Corbal, Martin G. Dixon, Milind B. Girkar, Jonathan C. Hall, Hideki (Saito) Ido, Peter Lachner, Gilbert Neiger, Chris J. Newburn, Rajesh S. Parthasarathy, Bret L. Toll, Robert Valentine, Jeffrey G. Wiedemeier
  • Patent number: 9946540
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: May 22, 2017
    Date of Patent: April 17, 2018
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9940131
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: April 10, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D Guilford, Gilbert M Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Patent number: 9940130
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: April 10, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D Guilford, Gilbert M Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Publication number: 20180081689
    Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non-overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non-overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.
    Type: Application
    Filed: November 10, 2017
    Publication date: March 22, 2018
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY, ZEEV SPERBER, AMIT GRADSTEIN
  • Publication number: 20180074825
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Application
    Filed: November 10, 2017
    Publication date: March 15, 2018
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY, ZEEV SPERBER, AMIT GRADSTEIN
  • Publication number: 20180074822
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Application
    Filed: November 9, 2017
    Publication date: March 15, 2018
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9916160
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: March 13, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Publication number: 20180067742
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Application
    Filed: November 9, 2017
    Publication date: March 8, 2018
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Publication number: 20180060078
    Abstract: A heterogeneous processor architecture and a method of booting a heterogeneous processor is described. A processor according to one embodiment comprises: a set of large physical processor cores; a set of small physical processor cores having relatively lower performance processing capabilities and relatively lower power usage relative to the large physical processor cores; and a package unit, to enable a bootstrap processor. The bootstrap processor initializes the homogeneous physical processor cores, while the heterogeneous processor presents the appearance of a homogeneous processor to a system firmware interface.
    Type: Application
    Filed: August 8, 2017
    Publication date: March 1, 2018
    Inventors: Eliezer Weissmann, Rinat Rappoport, Michael Mishaeli, Hisham Shafi, Oron Lenz, Jason W. Brandt, Stephen A. Fischer, Bret L. Toll, Inder M. Sodhi, Alon Naveh, Ganapati N. Srinivasa, Ashish V, Choubal, Scott D. Hahn, David A. Koufaty, Russel J. Fenger, Gaurav Khanna, Eugene Gorbatov, Mishali Naik, Andrew J. Herdrich, Abirami Prabhakaran, Sanjeev S. Sahagirdar, Paul Brett, Paolo Narvaez, Andrew D. Henroid, Dheeraj R. Subbareddy
  • Publication number: 20180018170
    Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
    Type: Application
    Filed: February 13, 2017
    Publication date: January 18, 2018
    Inventors: Robert Valentine, Doron Orenstein, Bret L. Toll
  • Patent number: 9864602
    Abstract: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: January 9, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal San Andrian, Suleyman Sair, Bret L. Toll, Zeev Sperber, Amit Gradstein, Asaf Rubenstein
  • Publication number: 20180004523
    Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.
    Type: Application
    Filed: July 1, 2016
    Publication date: January 4, 2018
    Applicant: Intel Corporation
    Inventors: Mark J. Charney, Robert Valentine, Milind B. Girkar, Ashish Jha, Bret L. Toll, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal San Adrian, Jason W. Brandt
  • Publication number: 20170357510
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Application
    Filed: August 3, 2017
    Publication date: December 14, 2017
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL SAN ADRIAN, BRET L. TOLL, MARK J. CHARNEY, ZEEV SPERBER, AMIT GRADSTEIN
  • Publication number: 20170329606
    Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.
    Type: Application
    Filed: May 30, 2017
    Publication date: November 16, 2017
    Inventors: Christopher J. HUGHES, Mark J. CHARNEY, Jesus CORBAL, Milind B. GIRKAR, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Robert VALENTINE
  • Publication number: 20170329605
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Application
    Filed: August 3, 2017
    Publication date: November 16, 2017
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL SAN ADRIAN, BRET L. TOLL, MARK J. CHARNEY, ZEEV SPERBER, AMIT GRADSTEIN