Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Patent number: 10331830
    Abstract: Techniques for logic gate simulation. Program instructions may be executable by a processor to select logic gates from a netlist that specifies a gate-level representation of a digital circuit. Each logic gate may be assigned to a corresponding element position of a single-instruction, multiple-data (SIMD) shuffle or population count instruction, and at least two logic gates may specify different logic functions. Simulation-executable instructions including the SIMD shuffle or population count instruction may be generated. When executed, the simulation-executable instructions simulate the functionality of the selected logic gates. More particularly, execution of the SIMD shuffle or population count instruction may concurrently simulate operation of at least two logic gates that specify different logic functions.
    Type: Grant
    Filed: June 13, 2016
    Date of Patent: June 25, 2019
    Assignee: Apple Inc.
    Inventor: Alex S. Teiche
  • Patent number: 10296342
    Abstract: Systems, methods, and apparatuses for executing an instruction are described. For example, an instruction includes at least an opcode, a field for a packed data source operand, and a field for a packed data destination operand. When executed, the instruction causes for each data element position of the source operand, add to a value stored in that data element position all values stored in preceding data element positions of the packed data source operand and store a result of the addition into a corresponding data element position of the packed data destination operand.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: William M. Brown, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10282169
    Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
    Type: Grant
    Filed: April 6, 2016
    Date of Patent: May 7, 2019
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
  • Patent number: 10162640
    Abstract: A processor is described having a functional unit within an instruction execution pipeline. The functional unit having circuitry to determine whether substantive data from a larger source data size will fit within a smaller data size that the substantive data is to flow to.
    Type: Grant
    Filed: August 16, 2016
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Martin G. Dixon, Baiju V. Patel, Rajeev Gopalakrishna
  • Patent number: 10157064
    Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: December 18, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10146535
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporatoin
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Patent number: 10102032
    Abstract: Embodiments relate to facilitating quick and graceful transitions for massively parallel computing applications. A computer-implemented method for facilitating termination of a plurality of threads of a process is provided. The method maintains information about open communications between one or more of the threads of the process and one or more of other processes. In response to receiving a command to terminate one or more of the threads of the process, the method completes the open communications on behalf of the threads after terminating the threads.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: October 16, 2018
    Assignee: Raytheon Company
    Inventors: Benjamin M. Howe, Jacob L. Sanders
  • Patent number: 10091092
    Abstract: This invention provides systems and methods to make communication networks more resilient, stealthier and robust. This invention discloses systems and methods wherein either a communications user equipment (UE) with multiple types of wireless links, potentially operating in different frequency bands, or an apparatus which performs communications routing functions, changes the communications routing in pseudo-random manner.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: October 2, 2018
    Assignee: The United States of America as represented by the Secretary of the Air Force
    Inventor: Amjad Soomro
  • Patent number: 10078512
    Abstract: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp?Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to ?K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds ?ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.
    Type: Grant
    Filed: October 3, 2016
    Date of Patent: September 18, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Thomas Elmer
  • Patent number: 10067742
    Abstract: Arithmetic operation circuits and a verification circuit are formed by loading configuration information into a configuration memory in an FPGA. Arithmetic operation circuits have the same arithmetic operation function, but are different from each other in combination of the circuit blocks. The arithmetic operation circuits are formed by combining the circuit blocks to make the maximum use of the DSP block, while the arithmetic operation circuit is formed by combining the circuit blocks other than DSP block. The arithmetic operation circuits each are configured to use a block RAM as the data hold memory, while the arithmetic operation circuit is configured to use a distributed RAM as the data hold memory. Each of the arithmetic operation circuits receives the input data, and outputs arithmetic operation result data (V1 to V3). A verification circuit compares the arithmetic operation result data to verify whether errors occur.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: September 4, 2018
    Assignee: Control System Laboratory Ltd.
    Inventor: Kenichi Morimoto
  • Patent number: 10037804
    Abstract: Examples disclosed herein relate to programming a first conductance of a first resistive memory device based on a first target value. The first conductance of the first resistive memory device is measured to determine a deviation of the first resistive memory device from the first target value. A second target value of a second resistive memory device is adjusted based on the deviation, and a second conductance of the second resistive memory device is programmed based on the adjusted second target value.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: July 31, 2018
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Brent Buchanan, Le Zheng, John Paul Strachan
  • Patent number: 9910896
    Abstract: In an embodiment, a method comprises processing an input data stream as the data stream is streamed and producing a derived stream therefrom; storing the input data stream in an input archive; suspending processing of the input data stream; subsequent to suspending processing, resuming processing of the input data stream, wherein resuming comprises: storing newly received data in the input data stream in a buffer, as the input data stream is streamed; determining a first timestamp; determining a second timestamp; searching the input archive to find a data item that matches the first timestamp of the last processed data item; processing data in the input archive having timestamps that are greater than the first timestamp until arriving at data with a third timestamp that is greater than the second timestamp; processing the input data stream from the buffer; continuing processing the input data stream as the input stream is streamed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: March 6, 2018
    Assignee: CISCO TECHNOLOGY, INC.
    Inventors: Sailesh Krishnamurthy, Chris Metz, Rex E. Fernando, Jisu Bhattacharya
  • Patent number: 9851972
    Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: December 26, 2017
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Robert Valentine
  • Patent number: 9830154
    Abstract: Techniques and mechanisms for programming an accelerator device to enable performance of a data processing algorithm. In an embodiment, an accelerator of a computer platform is programmed based on programming information received from a host processor of the computer platform. In another embodiment, programming of the accelerator is to enable data driven execution of an instruction by a data stream processing engine of the accelerator.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: November 28, 2017
    Assignee: Intel Corporation
    Inventor: Vladimir Ivanov
  • Patent number: 9690586
    Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provides instruction processing flexibility. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: June 12, 2014
    Date of Patent: June 27, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9672043
    Abstract: Techniques for managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provide flexibility in execution of program instructions by a processor core. An event is detected indicating that either resource requirement or resource availability will not be met by the execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: May 12, 2014
    Date of Patent: June 6, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 9658856
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: May 23, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9645826
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: May 9, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9632792
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: April 25, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9626192
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: April 18, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9626193
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: April 18, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9612842
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: April 4, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9575765
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: February 21, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9563429
    Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: February 7, 2017
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
  • Patent number: 9552209
    Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.
    Type: Grant
    Filed: December 27, 2013
    Date of Patent: January 24, 2017
    Assignee: Intel Corporation
    Inventors: Tal Uliel, Robert Valentine
  • Patent number: 9529653
    Abstract: Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.
    Type: Grant
    Filed: October 9, 2014
    Date of Patent: December 27, 2016
    Assignee: International Business Machines Corporation
    Inventors: Pradip Bose, Chen-Yong Cher, Meeta S. Gupta
  • Patent number: 9513916
    Abstract: A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization, where the two or more instructions include a memory load instruction and a data processing instruction to process data based on the memory load instruction. The method includes merging, by a processor, the two or more instructions into a single optimized internal instruction and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: December 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9513918
    Abstract: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: December 6, 2016
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mostafa Hagog, Jesus Corbal, Bret L Toll, Mark J Charney, Tal Uliel, Zeev Sperber, Amit Gradstein
  • Patent number: 9513915
    Abstract: A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.
    Type: Grant
    Filed: March 28, 2012
    Date of Patent: December 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9514790
    Abstract: A data transmission circuit may include data line groups and pass sections arranged among the data line groups to allow the data line groups to form one line. The data transmission circuit may include an input/output unit configured to be coupled to the data line groups and to process write data to be transmitted to the data line groups or read data transmitted from the data line groups. The data transmission circuit may include a pass control unit configured to selectively enable the pass sections in response to an address for specifying a target data line group of the data line groups.
    Type: Grant
    Filed: May 13, 2015
    Date of Patent: December 6, 2016
    Assignee: SK HYNIX INC.
    Inventor: Sang Oh Lim
  • Patent number: 9495162
    Abstract: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from the destination operand and a second source operand based on index values stored in a first source operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the destination operand and the second source operand may be copied to any one of the data element positions within the destination operand; if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: November 15, 2016
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mostafa Hagog, Jesus Corbal, Tal Uliel, Zeev Sperber, Amit Gradstein
  • Patent number: 9465611
    Abstract: Methods and systems for executing SIMD instructions that efficiently implement new SIMD instructions and conventional existing SIMD MAC-type instructions, while avoiding replication of functions in order to keep the size of the logic circuit size to as low a level as can reasonably be achieved. An instruction unit executes Single Instruction Multiple Data instructions, including instructions acting on operands representing complex numbers. The instruction unit includes functional blocks that are commonly utilized to execute a plurality of the instructions, wherein the plurality of instructions utilize various individual functional blocks in various combinations with one another. The plurality of instructions is optionally executed in a pipeline fashion.
    Type: Grant
    Filed: October 4, 2004
    Date of Patent: October 11, 2016
    Assignee: Broadcom Corporation
    Inventors: Mark Taunton, Andrew Jon Dawson
  • Patent number: 9424038
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: August 23, 2016
    Assignee: NVIDIA Corporation
    Inventors: Gregory Diamos, Mojtaba Mehrara
  • Patent number: 9411586
    Abstract: A method is described that includes performing the following with a single instruction: receiving a first input operand V; receiving a second input operand S; calculating V?S; determining if V?S is positive or negative; and, providing as a resultant: V if V?S is negative; V?S if V?S is positive.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: August 9, 2016
    Assignee: Intel Corporation
    Inventors: Thomas R. Craver, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9389861
    Abstract: Embodiments of systems, apparatuses, and methods for performing a range mapping instruction in a computer processor are described. In some embodiments, the execution of a range mapping instruction maps a data element having a source data range to a destination data element having a destination data range and storage of the of the destination data element.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: July 12, 2016
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Ray Carver
  • Patent number: 9389854
    Abstract: An apparatus includes memory storing an instruction that identifies a first register, a second register, and a third register. Upon execution of the instruction by a processor, a vector addition operation is performed by the processor to add first values from the first register to second values from the second register. A vector subtraction operation is also performed upon execution of the instruction to subtract the second value from third values from the third register. A vector compare operation is also performed upon execution of the instruction to compare results of the vector addition operation to results of the vector subtraction operation.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: July 12, 2016
    Assignee: Qualcomm Incorporated
    Inventor: Nico De Laurentiis
  • Patent number: 9298464
    Abstract: A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization. Eligibility is based on a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The method includes merging the two or more machine instructions into a single optimized internal instruction that is configured to perform first and second functions of two or more machine instructions employing operands specified by the two or more machine instructions. The single optimized internal instruction specifies the first target register only as a single target register and the single optimized internal instruction specifies the first and second functions to be performed. The method includes executing the single optimized internal instruction to perform the first and second functions of the two or more instructions.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: March 29, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9292291
    Abstract: A computer system for optimizing instructions is configured to identify two or more machine instructions as being eligible for optimization, to merge the two or more machine instructions into a single optimized internal instruction that is configured to perform functions of the two or more machine instructions, and to execute the single optimized internal instruction to perform the functions of the two or more machine instructions. Being eligible includes determining that the two or more machine instructions include a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The second instruction is a next sequential instruction of the first instruction in program order, wherein the first instruction specifies a first function to be performed, and the second instruction specifies a second function to be performed.
    Type: Grant
    Filed: March 28, 2012
    Date of Patent: March 22, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9274792
    Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: March 1, 2016
    Assignee: NVIDIA Corporation
    Inventors: Gregory Diamos, Mojtaba Mehrara
  • Patent number: 9241164
    Abstract: In a high-speed mode, a software processing unit notifies a hardware processing unit of settings information about output pictures before the hardware processing unit starts to encode an input picture, and the hardware processing unit performs continuous encoding for the output pictures, based on the settings information notified of by the software processing unit, without a notification signifying a completion for every picture, and upon completion of encoding for all of a specified number of the output pictures, sends an interrupt notification signifying a completion of encoding to the software processing unit.
    Type: Grant
    Filed: May 22, 2014
    Date of Patent: January 19, 2016
    Assignee: MegaChips Corporation
    Inventor: Kazuhiro Saito
  • Patent number: 9195466
    Abstract: Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false.
    Type: Grant
    Filed: November 14, 2012
    Date of Patent: November 24, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Rodney Wayne Smith, Jeffery M. Schottmiller, Andrew S. Irwin, Michael William Morrow
  • Patent number: 9183614
    Abstract: Systems, processors and methods are disclosed for organizing processing datapaths to perform operations in parallel while executing a single program. Each datapath executes the same sequence of instructions, using a novel instruction sequencing method. Each datapath is implemented through a processor having a data memory partitioned into identical regions. A master processor fetches instructions and conveys them to the datapath processors. All processors are connected serially by an instruction pipeline, such that instructions are executed in parallel datapaths, with execution in each datapath offset in time by one clock cycle from execution in adjacent datapaths. The system includes an interconnection network that enables full sharing of data in both horizontal and vertical dimensions, with the effect of coupling any datapath to the memory of any other datapath without adding processing cycles in common usage.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: November 10, 2015
    Assignee: Mireplica Technology, LLC
    Inventor: William M. Johnson
  • Patent number: 9171161
    Abstract: A trusted device having virtualized registers provides an extensible amount of storage for hash values and other information stored within a trusted device. The trusted device includes a buffer to which registers are virtualized to and from external storage, by encrypting the register values using a private device key. The registers may be platform control registers (PCRs) or other storage of the trusted device, which may be a trusted platform module (TPM). The registers are accessed in accordance with a register number. When the externally stored values are retrieved, they are decrypted and placed in the buffer. The buffer may implement a cache mechanism, such as a most recently used algorithm, so that encryption/decryption and fetch overhead is reduced. A register shadowing technique may be employed at boot time, to ensure that the trusted device is not compromised by tampering with the externally stored virtualized registers.
    Type: Grant
    Filed: November 9, 2006
    Date of Patent: October 27, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Arun P. Anbalagan, Pruthvi P. Nataraj, Bipin Tomar
  • Patent number: 9164762
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: July 22, 2013
    Date of Patent: October 20, 2015
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Gulilford, Gilbert M. Wolrich, Waidi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Patent number: 9135006
    Abstract: In accordance with the teachings described herein, systems and methods are provided for advanced execution of branch instructions in a microprocessor pipeline. In one embodiment, a branch instruction of an assembly language program code is executed that includes (i) a condition operand, (ii) a branch destination operand, and (iii) a program count operand. It is determined whether a current program count matches a stored program count operand. After determining that a condition was met when the branch instruction was executed, and in response to determining that the current program count matches the stored program count operand, a destination instruction specified by the stored branch destination operand is fetched.
    Type: Grant
    Filed: September 5, 2012
    Date of Patent: September 15, 2015
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventors: Li Sha, Ching-Han Tsai, Chi-Kuang Chen, Tzun-Wei Lee
  • Patent number: 9128701
    Abstract: An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline.
    Type: Grant
    Filed: March 9, 2012
    Date of Patent: September 8, 2015
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Patent number: 9110713
    Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.
    Type: Grant
    Filed: August 30, 2012
    Date of Patent: August 18, 2015
    Assignee: QUALCOMM Incorporated
    Inventor: Liang-Kai Wang
  • Publication number: 20150149746
    Abstract: An arithmetic processing device promotes transmission efficiency between a processor and a memory. The arithmetic processing device has an arithmetic processing unit which issues an instruction accompanying with data which is sent to the memory, a judgment unit which judges whether or not a redundancy degree of the data which is accompanied with the instruction is more than a predetermined value, a compression unit which judges whether or not compress the data based on an waiting time and a compression time when the redundancy degree of the data is more than the predetermined value, and compress the data when judging that performs the compression, and an instruction arbitration unit which transfers the instruction accompanying with the compressed data to the memory when the compression unit performs the compression and transfers the instruction accompanying with the non-compressed data to the memory when the compression unit does not perform the compression.
    Type: Application
    Filed: November 10, 2014
    Publication date: May 28, 2015
    Inventors: Makoto SUGA, AKIO TOKOYODA, Koji HOSOE, Masatoshi Aihara, Yuta Toyoda
  • Patent number: 9043581
    Abstract: A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.
    Type: Grant
    Filed: November 14, 2011
    Date of Patent: May 26, 2015
    Assignee: FUJITSU LIMITED
    Inventor: Masaki Ukai
  • Publication number: 20150134936
    Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.
    Type: Application
    Filed: January 20, 2015
    Publication date: May 14, 2015
    Inventors: Corey GEE, Bapiraju VINNAKOTA, Saleem MOHAMMADALI, Carl A. ALBEROLA