Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Patent number: 11507531
    Abstract: Examples described herein include systems and methods which include an apparatus comprising a plurality of configurable logic units and a plurality of switches, with each switch being coupled to at least one configurable logic unit of the plurality of configurable logic units. The apparatus further includes an instruction register configured to provide respective switch instructions of a plurality of switch instructions to each switch based on a computation to be implemented among the plurality of configurable logic units. For example, the switch instructions may include allocating the plurality of configurable logic units to perform the computation and activating an input of the switch and an output of the switch to couple at least a first configurable logic unit and a second configurable logic unit. In various embodiments, configurable logic units can include arithmetic logic units (ALUs), bit manipulation units (BMUs), and multiplier-accumulator units (MACs).
    Type: Grant
    Filed: February 25, 2021
    Date of Patent: November 22, 2022
    Assignee: MICRON TECHNOLOGY, INC.
    Inventors: Fa-Long Luo, Tamara Schmitz, Jeremy Chritz, Jaime Cummins
  • Patent number: 11468541
    Abstract: Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple or mixed precisions and dynamic ranges.
    Type: Grant
    Filed: April 14, 2022
    Date of Patent: October 11, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anhang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
  • Patent number: 11416261
    Abstract: Methods, systems and apparatuses for graph streaming processing are disclosed. One method includes loading, by a group load register, a subset of a an input tensor from a data cache, wherein the group load register provides the subset of the input tensor to all of a plurality of processors, loading, by a plurality of weight data registers, a plurality of weights of a weight tensor, wherein each of the weight data registers provide an weight to a single of the plurality of processors, and performing, by the plurality of processors, a SOMAC (Sum-Of-Multiply-Accumulate) instruction, including simultaneously determining, by each of the plurality of processors, an instruction size of the SOMAC instruction, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed and is equal to a number of outputs within a subset of a plurality of output tensors.
    Type: Grant
    Filed: July 15, 2020
    Date of Patent: August 16, 2022
    Assignee: Blaize, Inc.
    Inventors: Satyaki Koneru, Kamaraj Thangam, Sruthikesh Surineni
  • Patent number: 11294671
    Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: April 5, 2022
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Michael Espig, Dan Baum, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11288066
    Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.
    Type: Grant
    Filed: June 8, 2018
    Date of Patent: March 29, 2022
    Assignee: Arm Limited
    Inventors: David Hennah Mansell, Rune Holm, Ian Michael Caulfield, Jelena Milanovic
  • Patent number: 11227071
    Abstract: A method and an apparatus for hardware security to countermeasure side-channel attacks are provided. The method or apparatus may introduce at least one redundant or partial redundant computation having a similar power dissipation profile or an electromagnetic emission profile when compared to that of a genuine operation for cryptographic devices, and/or to reorder the iterations of operations in a different sequence. The redundant or partial redundant computation may be performed by using a different password key and/or a different raw data (e.g., plaintext). The presence of the redundant or partial redundant computation would make side-channel attacks difficult in the sense that genuine or redundant/partial redundant operations are difficult to be clearly identified, hence serving as a countermeasure for hardware security.
    Type: Grant
    Filed: March 19, 2018
    Date of Patent: January 18, 2022
    Assignee: Nanyang Technological University
    Inventors: Kwen Siong Chong, Bah Hwee Gwee, Ali Akbar Pammu
  • Patent number: 11221982
    Abstract: A multilayer butterfly network is shown that is operable to transform and align a plurality of fields from an input to an output data stream. Many transformations are possible with such a network which may include separate control of each multiplexer. This invention supports a limited set of multiplexer control signals, which enables a similarly limited set of data transformations. This limited capability is offset by the reduced complexity of the multiplexor control circuits. This invention used precalculated inputs and simple combinatorial logic to generate control signals for the butterfly network. Controls are independent for each layer and therefore are dependent only on the input and output patterns. Controls for the layers can be calculated in parallel.
    Type: Grant
    Filed: August 20, 2019
    Date of Patent: January 11, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Dheera Balasubramanian, Joseph Zbiciak, Sureshkumar Govindaraj
  • Patent number: 11175891
    Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: November 16, 2021
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, Amit Gradstein, Zeev Sperber, Mrinmay Dutta
  • Patent number: 11175926
    Abstract: Providing exception stack management using stack panic fault exceptions in processor-based devices is disclosed. In this regard, a processor device defines a “stack panic fault exception” that may be raised upon execution of an exception handler store operation attempting to write state data into an exception stack, and provides a dedicated plurality of stack panic fault exception state registers in which stack panic fault exception state data may be saved. Upon detecting a first exception, the processor device transfers program control to an exception handler for the first exception. If a second exception occurs upon execution of a store operation in the exception handler, the processor device determines that the second exception should be handled as a stack panic fault exception, saves the stack panic fault exception state data in the stack panic fault exception state registers, and transfers program control to a stack panic fault exception handler.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: November 16, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Thomas Andrew Sartorius, Michael Scott McIlvaine, James Norris Dieffenderfer, Aaron S. Giles
  • Patent number: 11138686
    Abstract: Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple precisions.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: October 5, 2021
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
  • Patent number: 11126549
    Abstract: In an example, a method includes identifying, using at least one processor, data portions of a plurality of distinct data objects stored in at least one memory which are to be processed using the same logical operation. The method may further include identifying a representation of an operand stored in at least one memory, the operand being to provide the logical operation and providing a logical engine with the operand. The data portions may be stored in a plurality of input data buffers, wherein each of the input data buffers comprises a data portion of a different data object. The logical operation may be carried out on each of the data portions using the logical engine, and the outputs for each data portion may be stored in a plurality of output data buffers, wherein each of the outputs comprising data derived from a different data object.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: September 21, 2021
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Naveen Muralimanohar, Ali Shafiee Ardestani
  • Patent number: 11119818
    Abstract: Contextual awareness associated with resources can be employed to facilitate controlling access to resources of a system, including function blocks. A resource manager component (RMC) can pre-load a defined number of respective versions of configuration parameter data associated with respective applications in each resource. With regard to each application, the RMC can associate a context value, unique for each application, with the respective versions of configuration parameter data associated with that application. When a current application is being changed to a next application, the RMC can write the context value associated with the next application to a context select component (CSC). Each resource can read the context value in the CSC, identify and retrieve the version of configuration parameter data associated with the next application based on the context value, and configure the function block based on the version of configuration parameter data.
    Type: Grant
    Filed: December 31, 2019
    Date of Patent: September 14, 2021
    Assignee: GE Aviation Systems, LLC
    Inventors: Melanie Sue-Hanson Graffy, Colin Holmwood, Jon Marc Diekema
  • Patent number: 11099868
    Abstract: A system and method are provided for translating a guest instruction of a guest architecture into at least one host instruction of a host architecture. The method comprises providing multiple representation states, each representation state providing a representation in the host architecture for at least one item of state from the guest architecture. A current representation state is then determined from amongst the multiple representation states, and the guest instruction is translated into at least one host instruction in dependence on the current representation state. Through the use of multiple representation states, it has been found that the efficiency of the code translation can be significantly increased, thereby giving rise to performance and energy consumption benefits.
    Type: Grant
    Filed: March 4, 2016
    Date of Patent: August 24, 2021
    Assignee: ARM LIMITED
    Inventor: Edmund Thomas Grimley-Evans
  • Patent number: 11093277
    Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: August 17, 2021
    Assignee: Intel Corporation
    Inventors: Rajesh M. Sankaran, Gilbert Neiger, Narayan Ranganathan, Stephen R. Van Doren, Joseph Nuzman, Niall D. McDonnell, Michael A. O'Hanlon, Lokpraveen B. Mosur, Tracy Garrett Drysdale, Eriko Nurvitadhi, Asit K. Mishra, Ganesh Venkatesh, Deborah T. Marr, Nicholas P. Carter, Jonathan D. Pearce, Edward T. Grochowski, Richard J. Greco, Robert Valentine, Jesus Corbal, Thomas D. Fletcher, Dennis R. Bradford, Dwight P. Manley, Mark J. Charney, Jeffrey J. Cook, Paul Caprioli, Koichi Yamada, Kent D. Glossop, David B. Sheffield
  • Patent number: 11010323
    Abstract: An apparatus in various embodiments is for use in a local area network and includes a discernment logic circuit and logic circuitry. The discernment logic circuit discerns whether a requested communications transaction received over the management communications bus from another of the logic nodes involves a first type of transaction or a second type of transaction, the second type of transaction having a plurality of commands associated with the requested communications transaction to convey respectively different parts of the requested communications transaction including an address part and a data part. The logic circuitry disables, in response to a reset of an address pointer in the one of the plurality of logic nodes and the requested communications transaction being the second type of transaction, the address pointer to mitigate a likelihood that the requested communications transaction is performed via the communication protocol while the address pointer for the second type of transaction is erroneous.
    Type: Grant
    Filed: June 28, 2019
    Date of Patent: May 18, 2021
    Assignee: NXP B.V.
    Inventor: Gerrit Willem den Besten
  • Patent number: 11010276
    Abstract: A method, computer program product, and system performing a method that include a processor defining a code fingerprint by obtaining parameters describing at least one of an event type or an event. The code fingerprint includes a first sequence. The processor loads the code fingerprint into a register accessible to the processor. Concurrent with executing a program, the processor obtains the code fingerprint from the register and identifies the code fingerprint in the program by comparing a second sequence in the program to the first sequence. Based on identifying the code fingerprint in the program, the processor alerts a runtime environment where the program is executing.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: May 18, 2021
    Assignee: International Business Machines Corporation
    Inventors: Giles R. Frazier, Michael K. Gschwind, Christian Jacobi, Chung-Lung K. Shum
  • Patent number: 10984500
    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: April 20, 2021
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
  • Patent number: 10984027
    Abstract: Disclosed techniques can generate content object summaries. Content of a content object can be parsed into a set of word groups. For each word group, at least one topic to which the word group pertains can be identified and it can be determined, via a user model, at least one weight of the plurality of weights corresponding to the topic(s). For each word group, a score can be determined for the word group based on the weight(s). A subset of the set of word groups can be selected based on the scores for the word group. A summary of the content object can be generated that includes the subset but that does not include one or more other word groups in the set of word groups that are not in the subset. At least part of the summary of the content object can be output.
    Type: Grant
    Filed: November 11, 2016
    Date of Patent: April 20, 2021
    Assignee: SRI International
    Inventors: Girish Acharya, John Niekrasz, John Byrnes, Chih-Hung Yeh
  • Patent number: 10956356
    Abstract: A computer system for performing control of an electronic control unit (ECU) having a processor for executing computer-readable instructions and a memory for maintaining the computer-executable instructions, the computer-executable instructions when executed by the processor perform the following functions by a processor. The functions include configuring a communication controller to while operating in a secure mode, transiting to an unsecure mode, executing a program in the unsecure mode that utilizes the communication controller; and in response to detecting a clock off request while a transmit buffer of the communication controller is not empty, inhibiting the clock off request until the transmit buffer is empty.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: March 23, 2021
    Assignee: Robert Bosch GmbH
    Inventors: Sekar Kulandaivel, Shalabh Jain, Jorge Guajardo Merchan
  • Patent number: 10891231
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: January 12, 2021
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 10860315
    Abstract: Embodiments of systems, apparatuses, and methods for broadcast arithmetic in a processor are described.
    Type: Grant
    Filed: September 24, 2018
    Date of Patent: December 8, 2020
    Assignee: Intel Corporation
    Inventors: Rama Kishan V. Malladi, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10846087
    Abstract: Embodiments of systems, apparatuses, and methods for instruction execution. In some embodiments, an instruction has fields for a first and a second source operand, and a destination operand. When executed, the instruction causes an arithmetic operation on broadcasted packed data elements of the first source operand and storage of results of each arithmetic operation in the destination operand, wherein the packed data elements of the first source operand to be broadcast are dictated by values of packed data elements stored in a second source operand, wherein the arithmetic operation is defined by the instruction.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: November 24, 2020
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Jesus Corbal, Robert Valentine
  • Patent number: 10804906
    Abstract: Adaptive clocking schemes for synchronized on-chip functional blocks are provided. The clocking schemes enable synchronous clocking which can be adapted according to changes in signal path propagation delay due temperature, process, and voltage variations, for example. In embodiments, the clocking schemes allow for the capacity utilization of a logic path to be increased.
    Type: Grant
    Filed: June 25, 2018
    Date of Patent: October 13, 2020
    Assignee: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED
    Inventors: Paul Penzes, Mark Fullerton
  • Patent number: 10803243
    Abstract: A non-transitory computer-readable recording medium stores therein a data generation program that causes a computer to execute a program including: arranging a first morpheme in an order of a position of the first morpheme in text data by referring to an index generated by the text data, in which positions of a plurality of morphemes included in the text data are associated with each of the morphemes; and referring to relationship information indicating a relationship between morphemes, and when the first morpheme is a specific type having a relationship with a second morpheme, arranging the second morpheme in an order of a position of the second morpheme in the text data by referring to the index.
    Type: Grant
    Filed: March 13, 2019
    Date of Patent: October 13, 2020
    Assignee: FUJITSU LIMITED
    Inventors: Masahiro Kataoka, Takahiro Okubo, Ryo Matsumura
  • Patent number: 10776110
    Abstract: An apparatus and method for performing efficient, adaptable tensor operations.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: September 15, 2020
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
  • Patent number: 10607567
    Abstract: An environment map, such as a cube map, can be obtained for a scene that is appropriate for the current lighting state. A grayscale image representation is generated that represents physical objects visible in the scene. The grayscale representation is provided to a device for rendering AR content. A color lookup table (LUT) is generated for coloring the grayscale image representation. The color LUT can be appropriate for the current lighting conditions of the scene. As the lighting state changes, such as over the course of a day, different color LUTs can be sent to the device for purposes of updating the environment map. The grayscale image representation, once colored, can serve as an environment map for purposes of creating reflection effects on AR content to be rendered with respect to a live view of the scene.
    Type: Grant
    Filed: March 16, 2018
    Date of Patent: March 31, 2020
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Richard Schritter, Sidharth Moudgil, Pratik Patel
  • Patent number: 10579338
    Abstract: An apparatus and method are provided for processing input operand values. The apparatus has a set of vector data storage elements, each vector data storage element providing a plurality of sections for storing data values. A plurality of lanes are considered to be provided within the set of storage elements, where each lane comprises a corresponding section from each vector data storage element. Processing circuitry is arranged to perform an arithmetic operation on an input operand value comprising a plurality of portions, by performing an independent arithmetic operation on each of the plurality of portions, in order to produce a result value comprising a plurality of result portions. Storage circuitry is arranged to store the result value within a selected lane of the plurality of lanes, such that each result portion is stored in a different vector data storage element within the corresponding section for the selected lane.
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: March 3, 2020
    Assignee: ARM Limited
    Inventors: Christopher Neal Hinds, Neil Burgess, David Raymond Lutz
  • Patent number: 10460416
    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: October 29, 2019
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
  • Patent number: 10423413
    Abstract: A method of loading and duplicating scalar data from a source into a destination register. The data may be duplicated in byte, half word, word or double word parts, according to a duplication pattern.
    Type: Grant
    Filed: July 9, 2014
    Date of Patent: September 24, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Duc Quang Bui, Peter Richard Dent
  • Patent number: 10402198
    Abstract: A signal processing device comprising at least one control unit arranged to receive at least one pack-insert instruction, decode the received at least one pack-insert instruction, and output at least one pack-insert control signal in accordance with the received pack-insert instruction. The signal processing device further comprising at least one pack-insert component arranged to receive at least a first data block to be inserted into a sequence of data blocks to be output to at least one destination register, receive a plurality of further data blocks to be packed within the sequence of data blocks to be output to the at least one destination register, arrange the at least first data block and the plurality of further data blocks into a sequence of data blocks based at least partly on the at least one pack-insert control signal, and output the sequence of data blocks.
    Type: Grant
    Filed: June 18, 2013
    Date of Patent: September 3, 2019
    Assignee: NXP USA, Inc.
    Inventors: Avi Gal, Fabrice Aidan, Noam Eshel-Goldman, Roy Glasner, Dmitry Lachover, Itay Peled
  • Patent number: 10379860
    Abstract: A condition code can depend upon a numerical output of a floating point operation for a processing pipeline. A classification can be determined for the floating point operation of a received instruction. In response to the classification and using condition determination logic, a value can be calculated for the condition code by inferring from data that is available from the processing pipeline before the numerical output is available. The value for the condition code can be provided to branch decision logic of the processing pipeline.
    Type: Grant
    Filed: May 3, 2017
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Son T. Dao, Petra Leber, Silvia M. Mueller
  • Patent number: 10379859
    Abstract: A condition code can depend upon a numerical output of a floating point operation for a processing pipeline. A classification can be determined for the floating point operation of a received instruction. In response to the classification and using condition determination logic, a value can be calculated for the condition code by inferring from data that is available from the processing pipeline before the numerical output is available. The value for the condition code can be provided to branch decision logic of the processing pipeline.
    Type: Grant
    Filed: May 3, 2017
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Son T. Dao, Petra Leber, Silvia M. Mueller
  • Patent number: 10372451
    Abstract: A sequence alignment method that may be performed by a vector processor is may include loading a sequence that is an instance of vector data including a plurality of elements, dividing the sequence into two groups, aligning respective elements of the groups to generate a sequence of sorted elements according to a single instruction multiple data mode, and iteratively performing an alignment operation based on a determination that each group in the sequence of sorted elements includes more than one element of the plurality of elements. Each iteration may include dividing each group to form new groups and aligning respective elements of each pair of adjacent new groups to generate a new sequence of sorted elements. The new sequence of a current iteration of the alignment operation may be transmitted as a data output, based on a determination that each new group does not include more than one element.
    Type: Grant
    Filed: November 3, 2017
    Date of Patent: August 6, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyun Pil Kim, Hyun Woo Sim, Seong Woo Ahn
  • Patent number: 10331830
    Abstract: Techniques for logic gate simulation. Program instructions may be executable by a processor to select logic gates from a netlist that specifies a gate-level representation of a digital circuit. Each logic gate may be assigned to a corresponding element position of a single-instruction, multiple-data (SIMD) shuffle or population count instruction, and at least two logic gates may specify different logic functions. Simulation-executable instructions including the SIMD shuffle or population count instruction may be generated. When executed, the simulation-executable instructions simulate the functionality of the selected logic gates. More particularly, execution of the SIMD shuffle or population count instruction may concurrently simulate operation of at least two logic gates that specify different logic functions.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: June 25, 2019
    Assignee: Apple Inc.
    Inventor: Alex S. Teiche
  • Patent number: 10296342
    Abstract: Systems, methods, and apparatuses for executing an instruction are described. For example, an instruction includes at least an opcode, a field for a packed data source operand, and a field for a packed data destination operand. When executed, the instruction causes for each data element position of the source operand, add to a value stored in that data element position all values stored in preceding data element positions of the packed data source operand and store a result of the addition into a corresponding data element position of the packed data destination operand.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: William M. Brown, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10282169
    Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
    Type: Grant
    Filed: April 6, 2016
    Date of Patent: May 7, 2019
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
  • Patent number: 10162640
    Abstract: A processor is described having a functional unit within an instruction execution pipeline. The functional unit having circuitry to determine whether substantive data from a larger source data size will fit within a smaller data size that the substantive data is to flow to.
    Type: Grant
    Filed: August 16, 2016
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Martin G. Dixon, Baiju V. Patel, Rajeev Gopalakrishna
  • Patent number: 10157064
    Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: December 18, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10146535
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporatoin
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Patent number: 10102032
    Abstract: Embodiments relate to facilitating quick and graceful transitions for massively parallel computing applications. A computer-implemented method for facilitating termination of a plurality of threads of a process is provided. The method maintains information about open communications between one or more of the threads of the process and one or more of other processes. In response to receiving a command to terminate one or more of the threads of the process, the method completes the open communications on behalf of the threads after terminating the threads.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: October 16, 2018
    Assignee: Raytheon Company
    Inventors: Benjamin M. Howe, Jacob L. Sanders
  • Patent number: 10091092
    Abstract: This invention provides systems and methods to make communication networks more resilient, stealthier and robust. This invention discloses systems and methods wherein either a communications user equipment (UE) with multiple types of wireless links, potentially operating in different frequency bands, or an apparatus which performs communications routing functions, changes the communications routing in pseudo-random manner.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: October 2, 2018
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventor: Amjad Soomro
  • Patent number: 10078512
    Abstract: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp?Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to ?K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds ?ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.
    Type: Grant
    Filed: October 3, 2016
    Date of Patent: September 18, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Thomas Elmer
  • Patent number: 10067742
    Abstract: Arithmetic operation circuits and a verification circuit are formed by loading configuration information into a configuration memory in an FPGA. Arithmetic operation circuits have the same arithmetic operation function, but are different from each other in combination of the circuit blocks. The arithmetic operation circuits are formed by combining the circuit blocks to make the maximum use of the DSP block, while the arithmetic operation circuit is formed by combining the circuit blocks other than DSP block. The arithmetic operation circuits each are configured to use a block RAM as the data hold memory, while the arithmetic operation circuit is configured to use a distributed RAM as the data hold memory. Each of the arithmetic operation circuits receives the input data, and outputs arithmetic operation result data (V1 to V3). A verification circuit compares the arithmetic operation result data to verify whether errors occur.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: September 4, 2018
    Assignee: Control System Laboratory Ltd.
    Inventor: Kenichi Morimoto
  • Patent number: 10037804
    Abstract: Examples disclosed herein relate to programming a first conductance of a first resistive memory device based on a first target value. The first conductance of the first resistive memory device is measured to determine a deviation of the first resistive memory device from the first target value. A second target value of a second resistive memory device is adjusted based on the deviation, and a second conductance of the second resistive memory device is programmed based on the adjusted second target value.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: July 31, 2018
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Brent Buchanan, Le Zheng, John Paul Strachan
  • Patent number: 9910896
    Abstract: In an embodiment, a method comprises processing an input data stream as the data stream is streamed and producing a derived stream therefrom; storing the input data stream in an input archive; suspending processing of the input data stream; subsequent to suspending processing, resuming processing of the input data stream, wherein resuming comprises: storing newly received data in the input data stream in a buffer, as the input data stream is streamed; determining a first timestamp; determining a second timestamp; searching the input archive to find a data item that matches the first timestamp of the last processed data item; processing data in the input archive having timestamps that are greater than the first timestamp until arriving at data with a third timestamp that is greater than the second timestamp; processing the input data stream from the buffer; continuing processing the input data stream as the input stream is streamed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: March 6, 2018
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Sailesh Krishnamurthy, Chris Metz, Rex E. Fernando, Jisu Bhattacharya
  • Patent number: 9851972
    Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: December 26, 2017
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Robert Valentine
  • Patent number: 9830154
    Abstract: Techniques and mechanisms for programming an accelerator device to enable performance of a data processing algorithm. In an embodiment, an accelerator of a computer platform is programmed based on programming information received from a host processor of the computer platform. In another embodiment, programming of the accelerator is to enable data driven execution of an instruction by a data stream processing engine of the accelerator.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: November 28, 2017
    Assignee: Intel Corporation
    Inventor: Vladimir Ivanov
  • Patent number: 9690586
    Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provides instruction processing flexibility. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: June 12, 2014
    Date of Patent: June 27, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9672043
    Abstract: Techniques for managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provide flexibility in execution of program instructions by a processor core. An event is detected indicating that either resource requirement or resource availability will not be met by the execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: June 6, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9658856
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: May 23, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes