Floating Point Or Vector Patents (Class 712/222)
  • Patent number: 10228909
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10223113
    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10209989
    Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
    Type: Grant
    Filed: March 7, 2017
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
  • Patent number: 10114650
    Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive an instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.
    Type: Grant
    Filed: February 23, 2015
    Date of Patent: October 30, 2018
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Liang-Kai Wang
  • Patent number: 10089076
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: October 2, 2018
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10078593
    Abstract: A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores, wherein at least one of a number of the processor cores, a size of each of the plurality of caches, or a size of each of the plurality of memories is configured for performing a reverse-time-migration (RTM) computation.
    Type: Grant
    Filed: October 26, 2012
    Date of Patent: September 18, 2018
    Assignee: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
    Inventors: John Shalf, David Donofrio, Leonid Oliker, Jens Kruger, Samuel Williams
  • Patent number: 10061560
    Abstract: A method and system are disclosed for executing a machine instruction in a central processing unit. The method comprises the steps of obtaining a perform floating-point operation instruction; obtaining a test bit; and determining a value of the test bit. If the test bit has a first value, (a) a specified floating-point operation function is performed, and (b) a condition code is set to a value determined by said specified function. If the test bit has a second value, (c) a check is made to determine if said specified function is valid and installed on the machine, (d) if said specified function is valid and installed on the machine, the condition code is set to one code value, and (e) if said specified function is either not valid or not installed on the machine, the condition code is set to a second code value.
    Type: Grant
    Filed: April 25, 2016
    Date of Patent: August 28, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michel H. T. Hack, Ronald M. Smith, Sr.
  • Patent number: 9996346
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: June 12, 2018
    Assignee: International Business Machines Corporation
    Inventors: Eric M. Schwarz, Ronald M. Smith, Sr.
  • Patent number: 9921832
    Abstract: A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially sets one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: March 20, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Jayashankar Bharadwaj, Nalini Vasudevan, Sara S. Baghsorkhi, Victor W. Lee, Daehyun Kim
  • Patent number: 9804850
    Abstract: Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
    Type: Grant
    Filed: June 21, 2016
    Date of Patent: October 31, 2017
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Mikhail Plotnikov, Andrey Naraikin, Robert Valentine
  • Patent number: 9734224
    Abstract: A synchronization infrastructure that synchronizes data stored between components in a cloud infrastructure system is described. A first component in the cloud infrastructure system may store subscription information related to a subscription order which may in turn be utilized by a second component in the cloud infrastructure system to orchestrate the provisioning of services and resources for the order placed by the customer. The synchronization architecture utilizes transactionally consistent checkpoints that describe the state of the data stored in the components to synchronize the data between these components.
    Type: Grant
    Filed: March 20, 2015
    Date of Patent: August 15, 2017
    Assignee: Oracle International Corporation
    Inventors: Ramkrishna Chatterjee, Ramesh Vasudevan, Anjani Kalyan Prathipati, Gopalan Arun
  • Patent number: 9733935
    Abstract: A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: August 15, 2017
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Roger Espasa, Manel Fernandez, Thomas D. Fletcher
  • Patent number: 9710270
    Abstract: A method for programmably controlling an exception includes performing, by a processor, a step of executing a control specification instruction for exception control specification that indicates whether an exception is enabled or not and setting a control specification value for the exception in a register and a step of executing a control execution instruction for exception control execution that indicates whether the exception is to be raised or not, determining whether the control specification value set in the register is a value for enabling the exception, and, when the control specification value is the value for enabling the exception, raising the exception. The method further includes performing a step of not raising the exception when the control specification value set in the register is not the value for enabling the exception.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: July 18, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Noriaki Asamoto
  • Patent number: 9672034
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 9619233
    Abstract: A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed.
    Type: Grant
    Filed: February 19, 2016
    Date of Patent: April 11, 2017
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Jaikrishnan Menon, Marc Asher De Kruijf, Karthikeyan Sankaralingam
  • Patent number: 9613350
    Abstract: A payment reader includes a contactless interface for communicating with a contactless device. The payment reader has a processor that executes instructions stored in memory, and the instructions include instructions for a plurality of firmware modules including a message dispatcher module and a plurality of functional modules. The functional modules generate messages and the message dispatcher module stores the messages in a queued data structure such as a stack or a queue. The messages are provided to the functional modules from the queued data structure. Some of the messages are timed messages that are returned to the queued data structure.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: April 4, 2017
    Inventor: Kshitiz Vadera
  • Patent number: 9600280
    Abstract: A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, at least one of the vector memory operations has addresses specified using a scalar address in the operands (and a vector attribute associated with the vector). In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: March 21, 2017
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 9588952
    Abstract: Reconstituting an attribute associated with data. Data in a tabular form may be received. The data is analyzed for a field that is likely to be determined by a formula. Responsive to identifying the field likely to be determined by the formula, An indication of the field and the formula with the data are stored in a repository. The indication of the field and the formula with the data from the repository may be retrieved to facilitate incorporating the data in an application with the formula for the field integrated into the application.
    Type: Grant
    Filed: June 22, 2015
    Date of Patent: March 7, 2017
    Assignee: International Business Machines Corporation
    Inventors: Al Chakra, Liam Harpur, John Rice
  • Patent number: 9588766
    Abstract: A vector reduction instruction is executed by a processor to provide efficient reduction operations on an array of data elements. The processor includes vector registers. Each vector register is divided into a plurality of lanes, and each lane stores the same number of data elements. The processor also includes execution circuitry that receives the vector reduction instruction to reduce the array of data elements stored in a source operand into a result in a destination operand using a reduction operator. Each of the source operand and the destination operand is one of the vector registers. Responsive to the vector reduction instruction, the execution circuitry applies the reduction operator to two of the data elements in each lane, and shifts one or more remaining data elements when there is at least one of the data elements remaining in each lane.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: March 7, 2017
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Abhay S. Kanhere, Jeffrey J. Cook, Muawya M. Al-Otoom
  • Patent number: 9569188
    Abstract: Compiling source code to reduce run-time execution of vector element reverse operations, includes: identifying, by a compiler, a first loop nested within a second loop in a computer program; identifying, by the compiler, a vector element reverse operation within the first loop; moving, by the compiler, the vector element reverse operation from the first loop to the second loop.
    Type: Grant
    Filed: July 25, 2016
    Date of Patent: February 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 9569190
    Abstract: Compiling source code to reduce run-time execution of vector element reverse operations, includes: identifying, by a compiler, a first loop nested within a second loop in a computer program; identifying, by the compiler, a vector element reverse operation within the first loop; moving, by the compiler, the vector element reverse operation from the first loop to the second loop.
    Type: Grant
    Filed: August 4, 2015
    Date of Patent: February 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 9501277
    Abstract: A first operation of comparison of the first initial operand with the second initial operand uses at least one comparison operator in such a way as to obtain a first final result word. A second operation of comparison of the second initial operand with the first initial operand uses the at least one comparison operator in such a way as to obtain a second final result word. Another operation checks the values of the bits of the two final result words in relation to a part at least of r combinations of reference values taken from possible combinations of values of these two final result words. These reference combinations represent a valid result of comparison of the two operands including an equality, a relationship of inferiority and a relationship of superiority between the two operands.
    Type: Grant
    Filed: June 12, 2014
    Date of Patent: November 22, 2016
    Assignee: STMICROELECTRONICS (ROUSSET) SAS
    Inventors: Pierre Guillemin, Yannick Teglia
  • Patent number: 9495159
    Abstract: In response to detecting one or more conditions are met, a checkpoint of a current state of a thread may be created. One or more incomplete instructions may be moved from a first level of a re-order buffer to a second level of the re-order buffer. Each incomplete instruction may be currently executing or awaiting execution.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: November 15, 2016
    Assignee: Intel Corporation
    Inventors: Mark J. Dechene, Srikanth T. Srinivasan, Matthew C. Merten, Tong Li, Christine E. Wang
  • Patent number: 9495153
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 15, 2016
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Patent number: 9489344
    Abstract: A data processor of a processing system, such as a graphics processing system, converts an input data value into an output data value by approximating a function which maps input values to output values. The data processor approximates the function using first and second predetermined ranges of values which are quantized into plural corresponding pairs of range sections, a predetermined gradient for each pair of range sections, and predetermined section end values for each pair of range sections. By using these predetermined parameters, the approximation of the function can be implemented efficiently by the data processor of the processing system.
    Type: Grant
    Filed: June 27, 2013
    Date of Patent: November 8, 2016
    Assignee: ARM LIMITED
    Inventors: Jorn Nystad, Sean Tristram Ellis
  • Patent number: 9448797
    Abstract: Restricted instructions are prohibited from execution within a transaction. There are classes of instructions that are restricted regardless of type of transaction: constrained or nonconstrained. There are instructions only restricted in constrained transactions, and there are instructions that are selectively restricted for given transactions based on controls specified on instructions used to initiate the transactions.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: September 20, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Timothy J. Slegel
  • Patent number: 9448796
    Abstract: Restricted instructions are prohibited from execution within a transaction. There are classes of instructions that are restricted regardless of type of transaction: constrained or nonconstrained. There are instructions only restricted in constrained transactions, and there are instructions that are selectively restricted for given transactions based on controls specified on instructions used to initiate the transactions.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: September 20, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Timothy J. Slegel
  • Patent number: 9355061
    Abstract: A data processing apparatus and method are provided for executing a vector scan instruction. The data processing apparatus comprises a vector register store configured to store vector operands, and processing circuitry configured to perform operations on vector operands retrieved from said vector register store. Further, control circuitry is configured to control the processing circuitry to perform the operations required by one or more instructions, said one or more instructions including a vector scan instruction specifying a vector operand comprising N vector elements and defining a scan operation to be performed on a sequence of vector elements within the vector operand.
    Type: Grant
    Filed: January 28, 2014
    Date of Patent: May 31, 2016
    Assignee: ARM Limited
    Inventors: Matthias Lothar Boettcher, Mbou Eyole-Monono, Giacomo Gabrielli
  • Patent number: 9329868
    Abstract: Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand.
    Type: Grant
    Filed: October 22, 2014
    Date of Patent: May 3, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind
  • Patent number: 9323529
    Abstract: Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: April 26, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind
  • Patent number: 9323532
    Abstract: Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. The executing of the instruction includes determining whether a pairing indicator associated with the pair of registers has a first value, a second value or a third value. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value the wide operand is read from the pair of registers. Based on the pairing indicator having the third value, the wide operand is speculatively read from a predetermined register. The predetermined register consists of the wide register or the pair of registers.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: April 26, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind
  • Patent number: 9317251
    Abstract: A method for correcting a shift error in a fused multiply add operation. The method comprises adjusting a normalized floating-point number before performing a shift error correction to produce an adjusted normalized floating-point number, and correcting a shift error in the adjusted normalized floating-point number. The correcting the shift error comprises shifting a mantissa of the adjusted normalized floating-point number in one direction. A fused multiply add module comprising a normalizer module, a compensation logic, and a round. The normalizer module is operable to normalize a floating-point number to produce a normalized floating-point number. The floating-point number is normalized based upon an estimated quantity of leading zeros. The compensation logic is operable to manage a correction of a shift error in the normalized floating-point number. The rounder is operable to correct the shift error with a mantissa shift in only one direction.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: April 19, 2016
    Assignee: NVIDIA CORPORATION
    Inventors: Charles Tsen, Adam Dreyer
  • Patent number: 9298497
    Abstract: A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed.
    Type: Grant
    Filed: July 13, 2012
    Date of Patent: March 29, 2016
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Jaikrishnan Menon, Marc Asher De Kruijf, Karthikeyan Sankaralingam
  • Patent number: 9275014
    Abstract: Vector processing engines (VPEs) having programmable data path configurations for providing multi-mode Radix-2X butterfly vector processing circuits. Related vector processors, systems, and methods are also disclosed. The VPEs disclosed herein include a plurality of vector processing stages each having vector processing blocks that have programmable data path configurations for performing Radix-2X butterfly vector operations to perform Fast Fourier Transform (FFT) vector processing operations efficiently. The data path configurations of the vector processing blocks can be programmed to provide different types of Radix-2X butterfly vector operations as well as other arithmetic logic vector operations.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: March 1, 2016
    Assignee: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Patent number: 9268527
    Abstract: A floating point value can represent a number or something that is not a number (NaN). A floating point value that is a NaN having data field that stores information, such as a propagation count that indicates the number of times a NaN value has been propagated through instructions. A NaN evaluation instruction can determine whether one or more operands is a NaN operand of a particular type, and if so can generate a result that is a NaN of a different type. An exception can be generated based upon the NaN of the different type being provided as a resultant.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: February 23, 2016
    Assignee: FREESCALE SEMICONDUCTOR, INC.
    Inventor: William C. Moyer
  • Patent number: 9239702
    Abstract: A programmable data processing apparatus having a bit-plane extraction operation is described, for extracting data from a value of, for example, 32 bits containing 4 bytes, 1a to 1d. Each byte 1a to 1d comprises 8 bits, (a0-a7, b0-b7, c0-c7 and d0-d7, respectively). The bit-plane extraction operation retrieves one bit from each of these bytes, for example the second bit (a1, b1, c1, d1), which is specified by an argument. The operation involves concatenating these bits (a1, b1, c1, d1) and returning a result value 5. Depending on the particular data processing application, the result value may be bit-reversed to provide a result value 7 (for example, if a bit-reversal is required to deal with endianness, or other reasons). The bit-plane extraction operation can be used as a pre-processing operation in data processing operations such as “sum-of-absolute-differences” in the processing of video data.
    Type: Grant
    Filed: June 8, 2005
    Date of Patent: January 19, 2016
    Assignee: Intel Corporation
    Inventor: Antonius Adrianus Maria Van Wel
  • Patent number: 9223543
    Abstract: An arithmetic unit which includes: a data supply section which supplies floating-point type object data to which a sign is to be added and condition data which includes a condition under which the sign is added; a sign data generating section which extracts the condition included in the condition data and generates sign data for adding the sign to the object data on the basis of the extracted condition; and an integer arithmetic operation section which performs an integer arithmetic operation while treating the object data as integer type data so as to add the sign to the object data on the basis of the sign data and the object data.
    Type: Grant
    Filed: January 4, 2010
    Date of Patent: December 29, 2015
    Assignee: Sony Corporation
    Inventors: Yukihiko Mogi, Masato Kamata, Yuki Kawaguchi
  • Patent number: 9223544
    Abstract: A method, device and system for representing numbers in a computer including storing a floating-point number M in a computer memory; representing the floating-point number M as an interval with lower and upper bounds A and B when it is accessed by using at least two floating-point numbers in the memory; and then representing M as an interval with lower and upper bounds A and B when it is used in a calculation by using at least three floating-point numbers in the memory. Calculations are performed using the interval and when the data is written back to the memory it may be stored as an interval if the size of the interval is significant, i.e. larger than a first threshold value. A warning regarding the suspect accuracy of any data stored as an interval may be issued if the interval is too large, i.e. larger than a second threshold value.
    Type: Grant
    Filed: September 7, 2012
    Date of Patent: December 29, 2015
    Assignee: Intel Corporation
    Inventors: Helia Naeimi, Ralph Nathan, Shih-Lien L. Lu, John L. Gustafson
  • Patent number: 9223575
    Abstract: The present application relates to the field of processors and in particular to the carrying out of arithmetic operations. Many of the computations performed by processors consist of a large number of simple operations. As a result, a multiplication operation may take a significant number of clock cycles to complete. The present application provides a processor having a trivial operand register, which is used in the carrying out of arithmetic or storage operations for data values stored in a data store.
    Type: Grant
    Filed: March 16, 2008
    Date of Patent: December 29, 2015
    Assignee: LINEAR ALGEBRA TECHNOLOGIES LIMITED
    Inventor: David Moloney
  • Patent number: 9213523
    Abstract: Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: December 15, 2015
    Assignee: Intel Corporation
    Inventors: Sridhar Samudrala, Grigorios Magklis, Marc Lupon, David R. Ditzel
  • Patent number: 9208093
    Abstract: Techniques are generally described for a multi-core processor with a plurality of processor cores. At least one cache is accessible to at least two of the plurality of processor cores. The multi-core processor can be configured for separately allocating a memory space within the cache to the individual processor cores accessing the cache.
    Type: Grant
    Filed: April 21, 2009
    Date of Patent: December 8, 2015
    Assignee: Empire Technology Development LLC
    Inventors: Thomas Martin Conte, Andrew Wolfe
  • Patent number: 9146706
    Abstract: A controlled-precision Iterative Arithmetic Logic Unit (IALU) included in a processor produces sub-precision results, i.e. results having a bit precision less than full precision. In one embodiment, the controlled-precision IALU comprises an arithmetic logic circuit and a precision control circuit. The arithmetic logic circuit is configured to iteratively process operands of a first bit precision to obtain a result. The precision control circuit is configured to end the iterative operand processing when the result achieves a programmed second bit precision less than the first bit precision. In one embodiment, the precision control circuit causes the arithmetic logic circuit to end the iterative operand processing in response to an indicator received by the control circuit. The controlled-precision IALU further comprises rounding logic configured to round the sub-precision result.
    Type: Grant
    Filed: May 5, 2006
    Date of Patent: September 29, 2015
    Assignee: QUALCOMM Incorporated
    Inventor: Kenneth Alan Dockser
  • Patent number: 9146901
    Abstract: A processing apparatus is provided with processing circuitry 6, 8 and decoder circuitry 10 responsive to a received argument reduction instruction FREDUCE4, FDOT3R to generate control signals 16 for controlling the processing circuitry 6, 8. The action of the argument reduction instruction is to subject each component of an input vector to a scaling which adds or subtracts an exponent shift value C to the exponent of the input vector component. The exponent shift value C is selected such that a sum of this exponent shift value C with the maximum exponent value B of any of the input vector components lies within a range between a first predetermined value and a second predetermined value. A consequence of execution of this argument reduction instruction is that the result vector when subject to a dot-product operation will be resistant to floating point underflows or overflows.
    Type: Grant
    Filed: August 26, 2011
    Date of Patent: September 29, 2015
    Assignee: ARM Limited
    Inventor: Jorn Nystad
  • Publication number: 20150121044
    Abstract: A first floating-point operation unit receives first and second variables and performs a first operation generating a first output. A first rounding unit receives and rounds the first output to generate a second output if a control bit is in a first state. A second floating-point operation unit receives a third variable and either the first output or the second output and performs a second operation on the third variable and either the first output or the second output, to generate a third output. The second floating-point operation unit receives and operates on the first output if the control bit is in the first state, or the second output if the control bit is in the second state. A second rounding unit receives and rounds the third output.
    Type: Application
    Filed: December 29, 2014
    Publication date: April 30, 2015
    Inventor: David Yiu-Man Lau
  • Publication number: 20150121043
    Abstract: Computers and methods for performing mathematical functions are disclosed. An embodiment of a computer includes an operations level and a driver level. The operations level performs mathematical operations. The driver level includes a first lookup table and a second lookup table, wherein the first lookup table includes first data for calculating at least one mathematical function using a first level of accuracy. The second lookup table includes second data for calculating the at least one mathematical function using a second level of accuracy, wherein the first level of accuracy is greater than the second level of accuracy. A driver executes either the first data or the second data depending on a selected level of accuracy.
    Type: Application
    Filed: October 30, 2013
    Publication date: April 30, 2015
    Applicant: Texas Instruments Incorporated
    Inventors: Kyong Ho Lee, Seok-Jun Lee, Manish Goel
  • Patent number: 9015452
    Abstract: A digital signal processor (DSP) includes an instruction fetch unit, an instruction decode unit, a register set and a plurality of work units in communication with the instruction decode unit. A first embodiment calculates two divisions on packed numerators and packed denominators. The DSP work units calculate indexes into a 1/d look-up table and make a final sign correction. A second embodiment calculates an approximation of a vector magnitude of a complex number x+jy. The approximation is based upon ?(x2+y2)??*max(|x|, |y|)+?*min(|x|, |y|). The DSP work units calculate the absolute values, find the maxima and minima, and form the packed results of two vector magnitude calculations.
    Type: Grant
    Filed: February 18, 2010
    Date of Patent: April 21, 2015
    Assignee: Texas Instruments Incorporated
    Inventor: Udayan Dasgupta
  • Patent number: 9015453
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: April 21, 2015
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Publication number: 20150095623
    Abstract: A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Inventors: Igor Ermolaev, Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Gautam B. Doshi, Rama Kishan V. Malladi, Prasenjit Chakraborty
  • Publication number: 20150095624
    Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.
    Type: Application
    Filed: December 5, 2014
    Publication date: April 2, 2015
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Publication number: 20150089206
    Abstract: Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.
    Type: Application
    Filed: December 4, 2014
    Publication date: March 26, 2015
    Inventors: Steven R. Carlough, Reid T. Copeland, Charles W. Gainey, JR., Marcel Mitran, Eric M. Schwarz, Timothy J. Slegel