Patents by Inventor Bret L. Toll

Bret L. Toll has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190339972
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, tile diagonal support is described. For example, a processor is detailed having decode circuitry to decode an instruction having fields for an opcode, a source operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to write the identified source operand to each element along a main diagonal of the identified destination matrix operand.
    Type: Application
    Filed: July 1, 2017
    Publication date: November 7, 2019
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Alexander HEINECKE
  • Patent number: 10459728
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: November 10, 2017
    Date of Patent: October 29, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 10445092
    Abstract: A processor for performing a vector permute comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mark J. Charney, Milind B. Girkar, Bret L. Toll, Roger Espasa, Guillem Sole, Jairo Balart, Brian Hickman
  • Publication number: 20190310846
    Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
    Type: Application
    Filed: October 9, 2018
    Publication date: October 10, 2019
    Inventors: Robert Valentine, Doron Orenstein, Bret L. Toll
  • Patent number: 10430193
    Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
    Type: Grant
    Filed: June 1, 2018
    Date of Patent: October 1, 2019
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Buford M. Guy, Ronak Singhal, Mishali Naik
  • Patent number: 10372455
    Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, Elmoustapha Ould-Ahmed-Vall, Vinodh Gopal
  • Patent number: 10372449
    Abstract: A method of an aspect includes receiving a packed data operation mask concatenation instruction. The packed data operation mask concatenation instruction indicates a first source having a first packed data operation mask, indicates a second source having a second packed data operation mask, and indicates a destination. A result is stored in the destination in response to the packed data operation mask concatenation instruction. The result includes the first packed data operation mask concatenated with the second packed data operation mask. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Andrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Publication number: 20190227800
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Application
    Filed: February 28, 2019
    Publication date: July 25, 2019
    Inventors: Robert C. VALENTINE, Jesus Corbal SAN ADRIAN, Roger Espasa SANS, Robert D. CAVIN, Bret L. TOLL, Santiago Galan DURAN, Jeffrey G. WIEDEMEIER, Sridhar SAMUDRALA, Milind Baburao GIRKAR, Edward Thomas GROCHOWSKI, Jonathan Cannon HALL, Dennis R. BRADFORD, Elmoustapha OULD-AHMED-VALL, James C. ABEL, Mark CHARNEY, Seth ABRAHAM, Suleyman SAIR, Andrew Thomas FORSYTH, Lisa WU, Charles YOUNT
  • Patent number: 10324718
    Abstract: A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: January 8, 2018
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal San Andrian, Suleyman Sair, Bret L. Toll, Zeev Sperber, Amit Gradstein, Asaf Rubinstein
  • Patent number: 10282296
    Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: May 7, 2019
    Assignee: Intel Corporation
    Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, David Papworth, James D. Allen
  • Patent number: 10261879
    Abstract: Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: April 16, 2019
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Bret L. Toll, Konrad K. Lai, Matthew C. Merten, Martin G. Dixon
  • Publication number: 20190108029
    Abstract: Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 11, 2019
    Inventors: Jesus CORBAL SAN ADRIAN, Bret L. TOLL, Robert C. VALENTINE, Jeffrey G. WIEDEMEIER, Sridhar SAMUDRALA, Milind Baburao GIRKAR, Andrew Thomas FORSYTH, Elmoustapha OULD-AHMED-VALL, Dennis R. BRADFORD, Lisa K. WU
  • Publication number: 20190108030
    Abstract: Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
    Type: Application
    Filed: September 27, 2018
    Publication date: April 11, 2019
    Inventors: Jesus CORBAL SAN ADRIAN, Bret L. TOLL, Robert C. VALENTINE, Jeffrey G. WIEDEMEIER, Sridhar SAMUDRALA, Milind Baburao GIRKAR, Andrew Thomas FORSYTH, Elmoustapha OULD-AHMED-VALL, Dennis R. BRADFORD, Lisa K. WU
  • Patent number: 10255072
    Abstract: A processor of an aspect includes a decode unit to decode an instruction. The instruction is to explicitly specify a first architectural register and is to implicitly indicate at least a second architectural register. The second architectural register is implicitly to be at a higher register number than the first architectural register. The processor also includes an architectural register replacement unit coupled with the decode unit. The architectural register replacement unit is to replace the first architectural register with a third architectural register, and is to replace the second architectural register with a fourth architectural register. The third architectural register is to be at a lower register number than the first architectural register. The fourth architectural register is to be at a lower register number than the second architectural register. Other processors are also disclosed, as are methods and systems.
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: April 9, 2019
    Assignee: Intel Corporation
    Inventors: Mark J. Charney, Robert Valentine, Milind B. Girkar, Ashish Jha, Bret L. Toll, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal San Adrian, Jason W. Brandt
  • Patent number: 10248524
    Abstract: Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: April 2, 2019
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Bret L. Toll, Konrad K. Lai, Matthew C. Merten, Martin G. Dixon
  • Publication number: 20190095643
    Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.
    Type: Application
    Filed: September 25, 2018
    Publication date: March 28, 2019
    Inventors: ELMOUSTAPHA OULD-AHMED-VALL, ROBERT VALENTINE, JESUS CORBAL, BRET L. TOLL, MARK J. CHARNEY
  • Patent number: 10241792
    Abstract: A processor core that includes a hardware decode unit and an execution engine unit. The hardware decode unit to decode a vector frequency expand instruction, wherein the vector frequency compress instruction includes a source operand and a destination operand, wherein the source operand specifies a source vector register that includes one or more pairs of a value and run length that are to be expanded into a run of that value based on the run length. The execution engine unit to execute the decoded vector frequency expand instruction which causes, a set of one or more source data elements in the source vector register to be expanded into a set of destination data elements comprising more elements than the set of source data elements and including at least one run of identical values which were run length encoded in the source vector register.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: March 26, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles Yount, Bret L. Toll
  • Patent number: 10228941
    Abstract: A processor of an aspect includes a set of registers capable of storing packed data. An execution unit is coupled with the set of registers. The execution unit is to access the set of registers in at least two different ways in response to instructions. The at least two different ways include a first way in which the set of registers are to represent a plurality of N-bit registers. The at least two different ways also include a second way in which the set of registers are to represent a single register of at least 2N-bits. In one aspect, the at least 2N-bits is to be at least 256-bits.
    Type: Grant
    Filed: June 28, 2013
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Ronak Singhal, Buford M. Guy, Mishali Naik
  • Patent number: 10223227
    Abstract: Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Bret L. Toll, Konrad K. Lai, Matthew C. Merten, Martin G. Dixon
  • Patent number: 10210065
    Abstract: Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Bret L. Toll, Konrad K. Lai, Matthew C. Merten, Martin G. Dixon