Floating Point Or Vector Patents (Class 712/222)
  • Patent number: 11188337
    Abstract: Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: November 30, 2021
    Assignees: The Florida State University Research Foundation, Inc., Michigan Technological University
    Inventors: David Whalley, Soner Onder
  • Patent number: 11175890
    Abstract: Examples of techniques for hexadecimal exponent alignment for a binary floating point unit (BFU) of a computer processor are described herein. An aspect includes receiving, by the BFU, a first operand comprising a first fraction and a first exponent, and a second operand comprising a second fraction and a second exponent. Another aspect includes, based on the first operand and the second operand being in a first floating point format, multiplying each of the first exponent and the second exponent by a factor corresponding to a number of bits in a digit in the first floating point format.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: November 16, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kerstin Claudia Schelm, Petra Leber, Nicol Hofmann, Michael Klein
  • Patent number: 11169809
    Abstract: Method and apparatus for converting scatter control elements to gather control elements used to permute vector data elements is described herein. One embodiment of a method includes decoding an instruction having a field for a source vector operand storing a plurality of data elements, wherein each of the data element includes a set bit and a plurality of unset bits. Each of the set bits is set at a unique bit offset within the respective data element. The method further includes executing the decoded instruction by generating, for each bit offset across the plurality of data elements in the source vector operand, a count of unset bits between a first data element having a bit set at a current bit offset and a second data element comprising a least significant bit (LSB). A set of control elements is generated based on the count of unset bits generated for each bit offset.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: November 9, 2021
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 11137912
    Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: October 5, 2021
    Assignee: SK hynix Inc.
    Inventors: Sung Kwan Hong, Yeong Sik Yi
  • Patent number: 11138151
    Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program determines a scale value based on a plurality of floating point values. The program further scales the plurality of floating point values based on the scale value. The program also converts the plurality of floating point values to a plurality of integer values. The program further determines an integer encoding scheme from a plurality of integer encoding schemes. The program also encodes the plurality of integer values based on the determined integer encoding scheme.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: October 5, 2021
    Assignee: SAP SE
    Inventor: Martin Rupp
  • Patent number: 11126691
    Abstract: An apparatus is provided that receives a scalar start value, an adjust amount and wrapping control information, and includes vector generating circuitry for generating a vector comprising a plurality of elements such that a value of a first element is dependent on the scalar start value, and values of the plurality of elements follow a regularly progressing sequence that is constrained to wrap as required to ensure that each value is within bounds determined from the wrapping control information. The adjust amount is used to determine a difference between values of adjacent elements in the regularly progressing sequence. The vector generating circuitry has first adder circuitry for generating a plurality of first candidate values for the plurality of elements, assuming absence of a wrapping condition, and second adder circuitry for generating a plurality of second candidate values for the plurality of elements, assume presence of a wrapping condition.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 21, 2021
    Assignee: Arm Limited
    Inventor: Jack William Derek Andrew
  • Patent number: 11115015
    Abstract: A circuit component has an address determined from a voltage level applied to a single electrical contact of the circuit component. The circuit component is configured to be assigned one of at least three unique addresses and to select from among the at least three unique addresses based on the voltage level.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: September 7, 2021
    Assignee: SKYWORKS SOLUTIONS, INC.
    Inventors: Bo Zhou, Guillaume Alexandre Blin
  • Patent number: 11080046
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11042375
    Abstract: An apparatus and method of operating the apparatus are provided for performing a count operation. Instruction decoder circuitry is responsive to a count instruction specifying an input data item to generate control signals to control the data processing circuitry to perform a count operation. The count operation determines a count value indicative of a number of input elements of a subset of elements in the specified input data item which have a value which matches a reference value in a reference element in a reference data item. A plurality of count operations may be performed to determine a count data item corresponding to the input data item. A register scatter storage instruction, a gather index generation instruction, and respective apparatuses responsive to them, as well as simulator implementations, are also provided.
    Type: Grant
    Filed: August 1, 2017
    Date of Patent: June 22, 2021
    Assignee: ARM Limited
    Inventors: Mbou Eyole, Jesse Garrett Beu, Alejandro Martinez Vicente, Timothy Hayes
  • Patent number: 11042370
    Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: June 22, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
  • Patent number: 11036511
    Abstract: An apparatus has a processing pipeline, and first and second register files. A temporary-register-using instruction is supported which controls the pipeline to perform an operation using a temporary variable derived from an operand stored in the first register file. In response to the instruction, when a predetermined condition is not satisfied, the pipeline processes at least one register move micro-operation to transfer data from the at least one source register of the first register file to at least one newly allocated temporary register of the second register file. When the condition is satisfied, the operation can be performed using a temporary variable already stored in the temporary register of the second register file used by an earlier temporary-register-using instruction specifying the same source register for determining the temporary variable, in the absence of an intervening instruction for rewriting the source register.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: June 15, 2021
    Assignee: Arm Limited
    Inventors: Xiaoyang Shen, Damien Robin Martin, Cédric Denis Robert Airaud, Luca Nassi, François Donati
  • Patent number: 10996957
    Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.
    Type: Grant
    Filed: June 20, 2019
    Date of Patent: May 4, 2021
    Assignee: MARVELL ASIA PTE, LTD.
    Inventor: David A. Carlson
  • Patent number: 10970072
    Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Evangelos Georganas, Christopher J. Hughes, Raanan Sade, Robert Valentine
  • Patent number: 10963247
    Abstract: A method to classify source data in a processor in response to a vector floating-point classification instruction includes specifying, in respective fields of the vector floating-point classification instruction, a source register containing the source data and a destination register to store classification indications for the source data. The source register includes a plurality of lanes that each contains a floating-point value and the destination register includes a plurality of lanes corresponding to the lanes of the source register. The method further includes executing the vector floating-point classification instruction by, for each lane in the source register, classifying the floating-point value in the lane to identify a type of the floating-point value, and storing a value indicative of the identified type in the corresponding lane of the destination register.
    Type: Grant
    Filed: May 24, 2019
    Date of Patent: March 30, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Brett L. Huber, Duc Bui
  • Patent number: 10963246
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 10942879
    Abstract: A first operation identifier is assigned to a current operation directed to a memory component, the first operation identifier having a first entry in a first data structure that associates the first operation identifier with a first buffer identifier. It is determined whether the current operation collides with a prior operation assigned a second operation identifier, the second operation identifier having a second entry in the first data structure that associates the second operation identifier with a second buffer identifier. A latest flag is updated to indicate that the first entry is a latest operation directed to an address (1) in response to determining that the current operation collides with the prior operation and that the current and prior operations are read operations, or (2) in response to determining to determining that the current operation does not collide with a prior operation.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: March 9, 2021
    Assignee: MICRON TECHNOLOGY, INC.
    Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
  • Patent number: 10761970
    Abstract: The invention is notably directed to a computer-implemented method for performing safety check operations. The method comprises steps that are implemented while executing a computer program, which is instrumented with safety check operations. As a result, this computer program forms a sequence of ordered instructions. Such instructions comprise safety check operation instructions, in addition to generic execution instructions and system inputs. System inputs allow the executing program to interact with an operating system, which manages resources for the computer program to execute. A series of instructions are identified while executing the computer program. Namely, a first instruction is identified in the sequence, as one of the safety check operation instructions, in view of its subsequent execution. After having identified the first instruction, a second instruction is identified in the sequence. The second instruction is identified as one of the generic computer program instructions.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: September 1, 2020
    Assignee: International Business Machines Corporation
    Inventors: Anil Kurmus, Matthias Neugschwandtner, Alessandro Sorniotti
  • Patent number: 10761728
    Abstract: Provided herein may be a memory controller and a method of operating the same. The memory controller may include a command processor configured to generate a flush command in response to a flush request input from an external host and to assign a slot number corresponding to the flush command; a sequence generator configured to determine flush data to be stored in response to the flush command, and to generate a write sequence in which the flush data is to be stored based on a size of the flush data and an assigned device sequence of the plurality of memory devices; and a memory operation controller configured to control the plurality of memory devices to store the flush data in the plurality of memory devices.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: September 1, 2020
    Assignee: SK hynix Inc.
    Inventors: Sung Kwan Hong, Yeong Sik Yi
  • Patent number: 10740067
    Abstract: Setting or updating of floating point controls is managed. Floating point controls include controls used for floating point operations, such as rounding mode and/or other controls. Further, floating point controls include status associated with floating point operations, such as floating point exceptions and/or others. The management of the floating point controls includes efficiently updating the controls, while reducing costs associated therewith.
    Type: Grant
    Filed: June 23, 2017
    Date of Patent: August 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10726329
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: July 28, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
  • Patent number: 10705842
    Abstract: Methods and apparatuses relating to high-performance authenticated encryption are described.
    Type: Grant
    Filed: April 2, 2018
    Date of Patent: July 7, 2020
    Assignee: INTEL CORPORATION
    Inventors: Vikram Suresh, Sanu Mathew, Sudhir Satpathy, Vinodh Gopal
  • Patent number: 10672100
    Abstract: An apparatus determines a second pixel range of an uncorrected image necessary to generate a first pixel range having pixels in a preset range of a corrected image, including a cache unit determining the second pixel range and reading and holding the second pixel range from memory before executing correction. Correspondences indicating positions of the uncorrected image corresponding to positions of pixels of the corrected image, respectively, are preset. The cache unit specifies a position of the uncorrected image corresponding to a pixel of one of four corners of a rectangular third pixel range including the first pixel range based on the correspondence, specifies pixel ranges of the uncorrected image necessary for pixel value generation, respectively, at the four corners of the third pixel range based on the specified position, and determines a pixel range including a convex set including the specified pixel ranges as the second pixel range.
    Type: Grant
    Filed: October 12, 2016
    Date of Patent: June 2, 2020
    Assignee: Hitachi Automotive Systems, Ltd.
    Inventors: Yusuke Uchida, Tetsuya Yamada, Shigeru Matsuo, Manabu Sasamoto
  • Patent number: 10592243
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: March 17, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10565283
    Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four consecutive non-negative integers in numerical order. In an aspect, the instruction does not indicate a source packed data operand having a plurality of packed data elements in an architecturally-visible storage location. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 18, 2020
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
  • Patent number: 10567249
    Abstract: A method and system are described. The method and system include determining a grouping characteristic for a plurality of nodes and a corresponding plurality of links. The nodes and the links correspond to components of a network and are associated with network performance information. The grouping characteristic includes at least one of partitionability into pages and a hop distance. The method and system also include generating a graphical visualization based on the grouping characteristic, the nodes and the links.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: February 18, 2020
    Assignee: ThousandEyes, Inc.
    Inventors: John Moeses Ercia Bauan, Sunil Bandla, Ricardo V. Oliveira
  • Patent number: 10521383
    Abstract: A first operation identifier is assigned to a first operation directed to a memory component, the first operation identifier having an entry in a first data structure that associates the first operation identifier with a first plurality of buffer identifiers. It is determined whether the first operation collides with a prior operation assigned a second operation identifier, the second operation identifier having an entry in the first data structure that associates the second operation identifier with a second plurality of buffer identifiers. It is determined whether the first operation is a read or a write operation. In response to determining that the first operation collides with the prior operation and that the first operation is a read operation, the first plurality of buffer identifiers are updated with a buffer identifier included in the second plurality of buffer identifiers.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: December 31, 2019
    Assignee: MICRON TECHNOLOGY, INC.
    Inventors: Lyle E. Adams, Mark Ish, Pushpa Seetamraju, Karl D. Schuh, Dan Tupy
  • Patent number: 10467185
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
  • Patent number: 10360036
    Abstract: A computer processing system is provided. The computer processing system includes a processor configured to crack a Move-To-FPSCR instruction into two internal instructions. A first one of the two internal instructions executes out-of-order to update a control field and a second one of the two internal instructions executes in-order to compute a trap decision.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: July 23, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian J. D. Barrick, Maarten J. Boersma, Niels Fricke, Michael J. Genden
  • Patent number: 10339057
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: July 2, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10318433
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: June 11, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10296334
    Abstract: Apparatus, method, and system for performing a vector bit gather are describe herein. One embodiment of a processor includes: a first vector register storing one or more source data elements, a second vector register storing one or more control elements, and a vector bit gather logic. Each of the control elements includes a plurality of bit fields, each of which is associated with a plurality of corresponding bit positions in a destination vector register and is to identify a bit from the one or more corresponding source data element to be copied to each of the plurality of corresponding bit positions. The vector bit shuffle logic is to read the bit fields from the second vector register and, for each bit field, to identify a bit from the source data elements and responsively copy it to each of the plurality of corresponding bit positions in the destination vector register.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal San Adrian, Mark J. Charney, Guillem Sole, Roger Espasa
  • Patent number: 10291406
    Abstract: Embodiments include a method of adding first and second binary numbers having C bits and divided into D words to provide a third binary number in E successive adding operations, C, D and E being plural positive integers, the method comprising: a first group of D adding operations adding together respective words of the first and second binary numbers to provide D sum and carry outputs ranging from a least significant to a most significant sum and carry output; one or more subsequent groups of adding operations adding together sum and carry outputs from an immediately preceding group of adding operations, a final group of the one or more subsequent groups resulting in the third binary number consisting of the sum outputs from the final group and a carry from the most significant carry output of the final group, wherein E is less than D.
    Type: Grant
    Filed: June 7, 2017
    Date of Patent: May 14, 2019
    Assignee: NXP B.V.
    Inventors: Marinus van Splunter, Artur Tadeusz Burchard
  • Patent number: 10275216
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: April 30, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10228909
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10223113
    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10209989
    Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
    Type: Grant
    Filed: March 7, 2017
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
  • Patent number: 10114650
    Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive an instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.
    Type: Grant
    Filed: February 23, 2015
    Date of Patent: October 30, 2018
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Liang-Kai Wang
  • Patent number: 10089076
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: October 2, 2018
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10078593
    Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores, wherein at least one of a number of the processor cores, a size of each of the plurality of caches, or a size of each of the plurality of memories is configured for performing a reverse-time-migration (RTM) computation.
    Type: Grant
    Filed: October 26, 2012
    Date of Patent: September 18, 2018
    Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
    Inventors: John Shalf, David Donofrio, Leonid Oliker, Jens Kruger, Samuel Williams
  • Patent number: 10061560
    Abstract: A method and system are disclosed for executing a machine instruction in a central processing unit. The method comprises the steps of obtaining a perform floating-point operation instruction; obtaining a test bit; and determining a value of the test bit. If the test bit has a first value, (a) a specified floating-point operation function is performed, and (b) a condition code is set to a value determined by said specified function. If the test bit has a second value, (c) a check is made to determine if said specified function is valid and installed on the machine, (d) if said specified function is valid and installed on the machine, the condition code is set to one code value, and (e) if said specified function is either not valid or not installed on the machine, the condition code is set to a second code value.
    Type: Grant
    Filed: April 25, 2016
    Date of Patent: August 28, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michel H. T. Hack, Ronald M. Smith, Sr.
  • Patent number: 9996346
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: June 12, 2018
    Assignee: International Business Machines Corporation
    Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
  • Patent number: 9921832
    Abstract: A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially sets one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: March 20, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Jayashankar Bharadwaj, Nalini Vasudevan, Sara S. Baghsorkhi, Victor W. Lee, Daehyun Kim
  • Patent number: 9804850
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: June 21, 2016
    Date of Patent: October 31, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 9733935
    Abstract: A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: August 15, 2017
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Roger Espasa, Manel Fernandez, Thomas D. Fletcher
  • Patent number: 9734224
    Abstract: A synchronization infrastructure that synchronizes data stored between components in a cloud infrastructure system is described. A first component in the cloud infrastructure system may store subscription information related to a subscription order which may in turn be utilized by a second component in the cloud infrastructure system to orchestrate the provisioning of services and resources for the order placed by the customer. The synchronization architecture utilizes transactionally consistent checkpoints that describe the state of the data stored in the components to synchronize the data between these components.
    Type: Grant
    Filed: March 20, 2015
    Date of Patent: August 15, 2017
    Assignee: Oracle International Corporation
    Inventors: Ramkrishna Chatterjee, Ramesh Vasudevan, Anjani Kalyan Prathipati, Gopalan Arun
  • Patent number: 9710270
    Abstract: A method for programmably controlling an exception includes performing, by a processor, a step of executing a control specification instruction for exception control specification that indicates whether an exception is enabled or not and setting a control specification value for the exception in a register and a step of executing a control execution instruction for exception control execution that indicates whether the exception is to be raised or not, determining whether the control specification value set in the register is a value for enabling the exception, and, when the control specification value is the value for enabling the exception, raising the exception. The method further includes performing a step of not raising the exception when the control specification value set in the register is not the value for enabling the exception.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: July 18, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Noriaki Asamoto
  • Patent number: 9672034
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 9619233
    Abstract: A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed.
    Type: Grant
    Filed: February 19, 2016
    Date of Patent: April 11, 2017
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Jaikrishnan Menon, Marc Asher De Kruijf, Karthikeyan Sankaralingam
  • Patent number: 9613350
    Abstract: A payment reader includes a contactless interface for communicating with a contactless device. The payment reader has a processor that executes instructions stored in memory, and the instructions include instructions for a plurality of firmware modules including a message dispatcher module and a plurality of functional modules. The functional modules generate messages and the message dispatcher module stores the messages in a queued data structure such as a stack or a queue. The messages are provided to the functional modules from the queued data structure. Some of the messages are timed messages that are returned to the queued data structure.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: April 4, 2017
    Inventor: Kshitiz Vadera
  • Patent number: 9600280
    Abstract: A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, at least one of the vector memory operations has addresses specified using a scalar address in the operands (and a vector attribute associated with the vector). In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: March 21, 2017
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion