Arithmetic Operation Instruction Processing Patents (Class 712/221)
-
Patent number: 10162640Abstract: A processor is described having a functional unit within an instruction execution pipeline. The functional unit having circuitry to determine whether substantive data from a larger source data size will fit within a smaller data size that the substantive data is to flow to.Type: GrantFiled: August 16, 2016Date of Patent: December 25, 2018Assignee: Intel CorporationInventors: Martin G. Dixon, Baiju V. Patel, Rajeev Gopalakrishna
-
Patent number: 10157064Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.Type: GrantFiled: February 27, 2017Date of Patent: December 18, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 10146535Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.Type: GrantFiled: October 20, 2016Date of Patent: December 4, 2018Assignee: Intel CorporatoinInventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
-
Patent number: 10102032Abstract: Embodiments relate to facilitating quick and graceful transitions for massively parallel computing applications. A computer-implemented method for facilitating termination of a plurality of threads of a process is provided. The method maintains information about open communications between one or more of the threads of the process and one or more of other processes. In response to receiving a command to terminate one or more of the threads of the process, the method completes the open communications on behalf of the threads after terminating the threads.Type: GrantFiled: May 29, 2014Date of Patent: October 16, 2018Assignee: Raytheon CompanyInventors: Benjamin M. Howe, Jacob L. Sanders
-
Patent number: 10091092Abstract: This invention provides systems and methods to make communication networks more resilient, stealthier and robust. This invention discloses systems and methods wherein either a communications user equipment (UE) with multiple types of wireless links, potentially operating in different frequency bands, or an apparatus which performs communications routing functions, changes the communications routing in pseudo-random manner.Type: GrantFiled: February 3, 2017Date of Patent: October 2, 2018Assignee: The United States of America as represented by the Secretary of the Air ForceInventor: Amjad Soomro
-
Patent number: 10078512Abstract: A microprocessor includes FMA execution logic that determines whether to accumulate an accumulator operand C to the partial products of multiplier and multiplicand operands A and B in the partial product adder or in a second accumulation stage. The logic calculates an exponent delta of Aexp+Bexp?Cexp and determines the number of leading zeroes in C, if C is denormal. The microprocessor accumulates C with the partial products of A and B when the accumulation of C to the product of A and B could result in mass cancellation, when ExpDelta is greater than or equal to ?K (where K is related to a width of a datapath in the partial product adder), and when a C is denormal and its number of leading zeroes plus K exceeds ?ExpDelta. The strategic use of resources in the partial product adder and second accumulation stage reduces latency.Type: GrantFiled: October 3, 2016Date of Patent: September 18, 2018Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Thomas Elmer
-
Patent number: 10067742Abstract: Arithmetic operation circuits and a verification circuit are formed by loading configuration information into a configuration memory in an FPGA. Arithmetic operation circuits have the same arithmetic operation function, but are different from each other in combination of the circuit blocks. The arithmetic operation circuits are formed by combining the circuit blocks to make the maximum use of the DSP block, while the arithmetic operation circuit is formed by combining the circuit blocks other than DSP block. The arithmetic operation circuits each are configured to use a block RAM as the data hold memory, while the arithmetic operation circuit is configured to use a distributed RAM as the data hold memory. Each of the arithmetic operation circuits receives the input data, and outputs arithmetic operation result data (V1 to V3). A verification circuit compares the arithmetic operation result data to verify whether errors occur.Type: GrantFiled: February 24, 2016Date of Patent: September 4, 2018Assignee: Control System Laboratory Ltd.Inventor: Kenichi Morimoto
-
Patent number: 10037804Abstract: Examples disclosed herein relate to programming a first conductance of a first resistive memory device based on a first target value. The first conductance of the first resistive memory device is measured to determine a deviation of the first resistive memory device from the first target value. A second target value of a second resistive memory device is adjusted based on the deviation, and a second conductance of the second resistive memory device is programmed based on the adjusted second target value.Type: GrantFiled: January 27, 2017Date of Patent: July 31, 2018Assignee: Hewlett Packard Enterprise Development LPInventors: Brent Buchanan, Le Zheng, John Paul Strachan
-
Patent number: 9910896Abstract: In an embodiment, a method comprises processing an input data stream as the data stream is streamed and producing a derived stream therefrom; storing the input data stream in an input archive; suspending processing of the input data stream; subsequent to suspending processing, resuming processing of the input data stream, wherein resuming comprises: storing newly received data in the input data stream in a buffer, as the input data stream is streamed; determining a first timestamp; determining a second timestamp; searching the input archive to find a data item that matches the first timestamp of the last processed data item; processing data in the input archive having timestamps that are greater than the first timestamp until arriving at data with a third timestamp that is greater than the second timestamp; processing the input data stream from the buffer; continuing processing the input data stream as the input stream is streamed.Type: GrantFiled: March 15, 2013Date of Patent: March 6, 2018Assignee: CISCO TECHNOLOGY, INC.Inventors: Sailesh Krishnamurthy, Chris Metz, Rex E. Fernando, Jisu Bhattacharya
-
Patent number: 9851972Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.Type: GrantFiled: January 23, 2017Date of Patent: December 26, 2017Assignee: Intel CorporationInventors: Tal Uliel, Robert Valentine
-
Patent number: 9830154Abstract: Techniques and mechanisms for programming an accelerator device to enable performance of a data processing algorithm. In an embodiment, an accelerator of a computer platform is programmed based on programming information received from a host processor of the computer platform. In another embodiment, programming of the accelerator is to enable data driven execution of an instruction by a data stream processing engine of the accelerator.Type: GrantFiled: December 29, 2011Date of Patent: November 28, 2017Assignee: Intel CorporationInventor: Vladimir Ivanov
-
Patent number: 9690586Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provides instruction processing flexibility. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.Type: GrantFiled: June 12, 2014Date of Patent: June 27, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 9672043Abstract: Techniques for managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provide flexibility in execution of program instructions by a processor core. An event is detected indicating that either resource requirement or resource availability will not be met by the execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.Type: GrantFiled: May 12, 2014Date of Patent: June 6, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 9658856Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 21, 2015Date of Patent: May 23, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9645826Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 18, 2015Date of Patent: May 9, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9632792Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 21, 2015Date of Patent: April 25, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9626193Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 21, 2015Date of Patent: April 18, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9626192Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 18, 2015Date of Patent: April 18, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9612842Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 21, 2015Date of Patent: April 4, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9575765Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 18, 2015Date of Patent: February 21, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9563429Abstract: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.Type: GrantFiled: December 18, 2015Date of Patent: February 7, 2017Assignee: Intel CorporationInventors: Andrew T. Forsyth, Brian J. Hickmann, Jonathan C. Hall, Christopher J. Hughes
-
Patent number: 9552209Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.Type: GrantFiled: December 27, 2013Date of Patent: January 24, 2017Assignee: Intel CorporationInventors: Tal Uliel, Robert Valentine
-
Patent number: 9529653Abstract: Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.Type: GrantFiled: October 9, 2014Date of Patent: December 27, 2016Assignee: International Business Machines CorporationInventors: Pradip Bose, Chen-Yong Cher, Meeta S. Gupta
-
Patent number: 9513918Abstract: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.Type: GrantFiled: December 22, 2011Date of Patent: December 6, 2016Assignee: INTEL CORPORATIONInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mostafa Hagog, Jesus Corbal, Bret L Toll, Mark J Charney, Tal Uliel, Zeev Sperber, Amit Gradstein
-
Patent number: 9513916Abstract: A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization, where the two or more instructions include a memory load instruction and a data processing instruction to process data based on the memory load instruction. The method includes merging, by a processor, the two or more instructions into a single optimized internal instruction and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.Type: GrantFiled: March 8, 2013Date of Patent: December 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9514790Abstract: A data transmission circuit may include data line groups and pass sections arranged among the data line groups to allow the data line groups to form one line. The data transmission circuit may include an input/output unit configured to be coupled to the data line groups and to process write data to be transmitted to the data line groups or read data transmitted from the data line groups. The data transmission circuit may include a pass control unit configured to selectively enable the pass sections in response to an address for specifying a target data line group of the data line groups.Type: GrantFiled: May 13, 2015Date of Patent: December 6, 2016Assignee: SK HYNIX INC.Inventor: Sang Oh Lim
-
Patent number: 9513915Abstract: A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction.Type: GrantFiled: March 28, 2012Date of Patent: December 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9495162Abstract: An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from the destination operand and a second source operand based on index values stored in a first source operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the destination operand and the second source operand may be copied to any one of the data element positions within the destination operand; if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element.Type: GrantFiled: December 23, 2011Date of Patent: November 15, 2016Assignee: INTEL CORPORATIONInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Mostafa Hagog, Jesus Corbal, Tal Uliel, Zeev Sperber, Amit Gradstein
-
Patent number: 9465611Abstract: Methods and systems for executing SIMD instructions that efficiently implement new SIMD instructions and conventional existing SIMD MAC-type instructions, while avoiding replication of functions in order to keep the size of the logic circuit size to as low a level as can reasonably be achieved. An instruction unit executes Single Instruction Multiple Data instructions, including instructions acting on operands representing complex numbers. The instruction unit includes functional blocks that are commonly utilized to execute a plurality of the instructions, wherein the plurality of instructions utilize various individual functional blocks in various combinations with one another. The plurality of instructions is optionally executed in a pipeline fashion.Type: GrantFiled: October 4, 2004Date of Patent: October 11, 2016Assignee: Broadcom CorporationInventors: Mark Taunton, Andrew Jon Dawson
-
Patent number: 9424038Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.Type: GrantFiled: December 10, 2012Date of Patent: August 23, 2016Assignee: NVIDIA CorporationInventors: Gregory Diamos, Mojtaba Mehrara
-
Patent number: 9411586Abstract: A method is described that includes performing the following with a single instruction: receiving a first input operand V; receiving a second input operand S; calculating V?S; determining if V?S is positive or negative; and, providing as a resultant: V if V?S is negative; V?S if V?S is positive.Type: GrantFiled: December 23, 2011Date of Patent: August 9, 2016Assignee: Intel CorporationInventors: Thomas R. Craver, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 9389854Abstract: An apparatus includes memory storing an instruction that identifies a first register, a second register, and a third register. Upon execution of the instruction by a processor, a vector addition operation is performed by the processor to add first values from the first register to second values from the second register. A vector subtraction operation is also performed upon execution of the instruction to subtract the second value from third values from the third register. A vector compare operation is also performed upon execution of the instruction to compare results of the vector addition operation to results of the vector subtraction operation.Type: GrantFiled: March 15, 2013Date of Patent: July 12, 2016Assignee: Qualcomm IncorporatedInventor: Nico De Laurentiis
-
Patent number: 9389861Abstract: Embodiments of systems, apparatuses, and methods for performing a range mapping instruction in a computer processor are described. In some embodiments, the execution of a range mapping instruction maps a data element having a source data range to a destination data element having a destination data range and storage of the of the destination data element.Type: GrantFiled: December 22, 2011Date of Patent: July 12, 2016Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Thomas Ray Carver
-
Patent number: 9298464Abstract: A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization. Eligibility is based on a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The method includes merging the two or more machine instructions into a single optimized internal instruction that is configured to perform first and second functions of two or more machine instructions employing operands specified by the two or more machine instructions. The single optimized internal instruction specifies the first target register only as a single target register and the single optimized internal instruction specifies the first and second functions to be performed. The method includes executing the single optimized internal instruction to perform the first and second functions of the two or more instructions.Type: GrantFiled: March 8, 2013Date of Patent: March 29, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9292291Abstract: A computer system for optimizing instructions is configured to identify two or more machine instructions as being eligible for optimization, to merge the two or more machine instructions into a single optimized internal instruction that is configured to perform functions of the two or more machine instructions, and to execute the single optimized internal instruction to perform the functions of the two or more machine instructions. Being eligible includes determining that the two or more machine instructions include a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The second instruction is a next sequential instruction of the first instruction in program order, wherein the first instruction specifies a first function to be performed, and the second instruction specifies a second function to be performed.Type: GrantFiled: March 28, 2012Date of Patent: March 22, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9274792Abstract: A compiler-controlled technique for scheduling threads to execute different regions of a program. A compiler analyzes program code to determine a control flow graph for the program code. The control flow graph contains regions and directed edges between regions. The regions have associated execution priorities. The directed edges indicate the direction of program control flow. Each region has a thread frontier which contains one or more regions. The compiler inserts one or more update predicate mask variable instructions at the end of a region. The compiler also inserts one or more conditional branch instructions at the end of the region. The conditional branch instructions are arranged in order of execution priority of the regions in the thread frontier of the region, to enforce execution priority of the regions at runtime.Type: GrantFiled: December 10, 2012Date of Patent: March 1, 2016Assignee: NVIDIA CorporationInventors: Gregory Diamos, Mojtaba Mehrara
-
Patent number: 9241164Abstract: In a high-speed mode, a software processing unit notifies a hardware processing unit of settings information about output pictures before the hardware processing unit starts to encode an input picture, and the hardware processing unit performs continuous encoding for the output pictures, based on the settings information notified of by the software processing unit, without a notification signifying a completion for every picture, and upon completion of encoding for all of a specified number of the output pictures, sends an interrupt notification signifying a completion of encoding to the software processing unit.Type: GrantFiled: May 22, 2014Date of Patent: January 19, 2016Assignee: MegaChips CorporationInventor: Kazuhiro Saito
-
Patent number: 9195466Abstract: Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false.Type: GrantFiled: November 14, 2012Date of Patent: November 24, 2015Assignee: QUALCOMM IncorporatedInventors: Melinda J. Brown, James Norris Dieffenderfer, Michael Scott McIlvaine, Brian Michael Stempel, Rodney Wayne Smith, Jeffery M. Schottmiller, Andrew S. Irwin, Michael William Morrow
-
Patent number: 9183614Abstract: Systems, processors and methods are disclosed for organizing processing datapaths to perform operations in parallel while executing a single program. Each datapath executes the same sequence of instructions, using a novel instruction sequencing method. Each datapath is implemented through a processor having a data memory partitioned into identical regions. A master processor fetches instructions and conveys them to the datapath processors. All processors are connected serially by an instruction pipeline, such that instructions are executed in parallel datapaths, with execution in each datapath offset in time by one clock cycle from execution in adjacent datapaths. The system includes an interconnection network that enables full sharing of data in both horizontal and vertical dimensions, with the effect of coupling any datapath to the memory of any other datapath without adding processing cycles in common usage.Type: GrantFiled: September 4, 2012Date of Patent: November 10, 2015Assignee: Mireplica Technology, LLCInventor: William M. Johnson
-
Patent number: 9171161Abstract: A trusted device having virtualized registers provides an extensible amount of storage for hash values and other information stored within a trusted device. The trusted device includes a buffer to which registers are virtualized to and from external storage, by encrypting the register values using a private device key. The registers may be platform control registers (PCRs) or other storage of the trusted device, which may be a trusted platform module (TPM). The registers are accessed in accordance with a register number. When the externally stored values are retrieved, they are decrypted and placed in the buffer. The buffer may implement a cache mechanism, such as a most recently used algorithm, so that encryption/decryption and fetch overhead is reduced. A register shadowing technique may be employed at boot time, to ensure that the trusted device is not compromised by tampering with the externally stored virtualized registers.Type: GrantFiled: November 9, 2006Date of Patent: October 27, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Arun P. Anbalagan, Pruthvi P. Nataraj, Bipin Tomar
-
Patent number: 9164762Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.Type: GrantFiled: July 22, 2013Date of Patent: October 20, 2015Assignee: Intel CorporationInventors: Vinodh Gopal, James D. Gulilford, Gilbert M. Wolrich, Waidi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
-
Patent number: 9135006Abstract: In accordance with the teachings described herein, systems and methods are provided for advanced execution of branch instructions in a microprocessor pipeline. In one embodiment, a branch instruction of an assembly language program code is executed that includes (i) a condition operand, (ii) a branch destination operand, and (iii) a program count operand. It is determined whether a current program count matches a stored program count operand. After determining that a condition was met when the branch instruction was executed, and in response to determining that the current program count matches the stored program count operand, a destination instruction specified by the stored branch destination operand is fetched.Type: GrantFiled: September 5, 2012Date of Patent: September 15, 2015Assignee: MARVELL INTERNATIONAL LTD.Inventors: Li Sha, Ching-Han Tsai, Chi-Kuang Chen, Tzun-Wei Lee
-
Patent number: 9128701Abstract: An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline.Type: GrantFiled: March 9, 2012Date of Patent: September 8, 2015Assignee: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Patent number: 9110713Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.Type: GrantFiled: August 30, 2012Date of Patent: August 18, 2015Assignee: QUALCOMM IncorporatedInventor: Liang-Kai Wang
-
Publication number: 20150149746Abstract: An arithmetic processing device promotes transmission efficiency between a processor and a memory. The arithmetic processing device has an arithmetic processing unit which issues an instruction accompanying with data which is sent to the memory, a judgment unit which judges whether or not a redundancy degree of the data which is accompanied with the instruction is more than a predetermined value, a compression unit which judges whether or not compress the data based on an waiting time and a compression time when the redundancy degree of the data is more than the predetermined value, and compress the data when judging that performs the compression, and an instruction arbitration unit which transfers the instruction accompanying with the compressed data to the memory when the compression unit performs the compression and transfers the instruction accompanying with the non-compressed data to the memory when the compression unit does not perform the compression.Type: ApplicationFiled: November 10, 2014Publication date: May 28, 2015Inventors: Makoto SUGA, AKIO TOKOYODA, Koji HOSOE, Masatoshi Aihara, Yuta Toyoda
-
Patent number: 9043581Abstract: A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.Type: GrantFiled: November 14, 2011Date of Patent: May 26, 2015Assignee: FUJITSU LIMITEDInventor: Masaki Ukai
-
Publication number: 20150134936Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.Type: ApplicationFiled: January 20, 2015Publication date: May 14, 2015Inventors: Corey GEE, Bapiraju VINNAKOTA, Saleem MOHAMMADALI, Carl A. ALBEROLA
-
Publication number: 20150121042Abstract: According to an embodiment, an arithmetic device includes an arithmetic processing unit, an address generating unit, and a control unit. The arithmetic processing unit performs a plurality of arithmetic processing used in an encryption method. Based on an upper bit of the address of the first piece of data and based on an offset which is a value corresponding to a counter value and which is based on the address of the first piece of data, the address generating unit generates addresses of the memory device. The control unit controls the arithmetic processing unit in such a way that the arithmetic processing is done in a sequence determined in the encryption method, and that specifies an update of the counter value at a timing of modifying the type of data and at a timing of modifying data.Type: ApplicationFiled: January 5, 2015Publication date: April 30, 2015Applicant: Kabushiki Kaisha ToshibaInventor: Hideo SHIMIZU
-
Publication number: 20150121043Abstract: Computers and methods for performing mathematical functions are disclosed. An embodiment of a computer includes an operations level and a driver level. The operations level performs mathematical operations. The driver level includes a first lookup table and a second lookup table, wherein the first lookup table includes first data for calculating at least one mathematical function using a first level of accuracy. The second lookup table includes second data for calculating the at least one mathematical function using a second level of accuracy, wherein the first level of accuracy is greater than the second level of accuracy. A driver executes either the first data or the second data depending on a selected level of accuracy.Type: ApplicationFiled: October 30, 2013Publication date: April 30, 2015Applicant: Texas Instruments IncorporatedInventors: Kyong Ho Lee, Seok-Jun Lee, Manish Goel
-
Patent number: 9015452Abstract: A digital signal processor (DSP) includes an instruction fetch unit, an instruction decode unit, a register set and a plurality of work units in communication with the instruction decode unit. A first embodiment calculates two divisions on packed numerators and packed denominators. The DSP work units calculate indexes into a 1/d look-up table and make a final sign correction. A second embodiment calculates an approximation of a vector magnitude of a complex number x+jy. The approximation is based upon ?(x2+y2)??*max(|x|, |y|)+?*min(|x|, |y|). The DSP work units calculate the absolute values, find the maxima and minima, and form the packed results of two vector magnitude calculations.Type: GrantFiled: February 18, 2010Date of Patent: April 21, 2015Assignee: Texas Instruments IncorporatedInventor: Udayan Dasgupta