Floating Point Or Vector Patents (Class 712/222)
  • Publication number: 20140052968
    Abstract: A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.
    Type: Application
    Filed: December 23, 2011
    Publication date: February 20, 2014
    Applicant: Intel Corporation
    Inventors: Jesus Corbal, Andrew T. Forsyth, Roger Espasa, Manel Fernandez, Thomas D. Fletcher
  • Patent number: 8650383
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: February 11, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Publication number: 20140040603
    Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.
    Type: Application
    Filed: August 3, 2012
    Publication date: February 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
  • Patent number: 8639914
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: January 28, 2014
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Publication number: 20140019728
    Abstract: A data processing apparatus includes a register bank having a plurality of registers for storing vectors being processed; a pipelined processor for processing the stream of vector instructions; the pipelined processor comprising circuitry configured to detect data dependencies for the vectors processed by the stream of vector instructions and stored in the plurality of registers and to determine constraints on timing of execution for the vector instructions such that no register data hazards arise. Register data hazards arise where two accesses to a same register, at least one of said accesses being a write, occur in an order different to an order of said instruction stream such that an access occurring later in said instruction stream starts before an access occurring earlier in said instruction stream has completed. The pipelined processor includes data element hazard determination circuitry.
    Type: Application
    Filed: July 11, 2012
    Publication date: January 16, 2014
    Inventor: Alastair David REID
  • Publication number: 20140006755
    Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Publication number: 20140006756
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Igor Ermolaev, Elmoustapha Ould-Ahmed-Vall, Brett Toll, Andrey Naraikin, Jesus Corbal
  • Publication number: 20130339678
    Abstract: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 19, 2013
    Inventors: Mikhail Plotnikov, Andrey Naraikan, Elmoustapha Ould-Ahmed-vall, Robert Valentine, Bret L. Toll, Jesus Corbal
  • Publication number: 20130332707
    Abstract: A processing apparatus may be configured to include logic to generate a first set of vectors based on a first integer and a second set of vectors based on a second integer, logic to calculate sub products by multiplying the first set of vectors to the second set of vectors, logic to split each sub product into a first half and a second half and logic to generate a final result by adding together all first and second halves at respective digit positions.
    Type: Application
    Filed: June 7, 2012
    Publication date: December 12, 2013
    Applicant: INTEL CORPORATION
    Inventors: Shay GUERON, Vlad KRASNOV
  • Publication number: 20130326199
    Abstract: Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 5, 2013
    Inventors: Grigorios Magklis, Josep M. Codina, Craig B. Zilles, Michael Neilly, Sridhar Samudrala, Alejandro Martinez Vicente, Polychronis Xekalakis, F. Jesus Sanchez, Marc Lupon, Georgios Tournavitis, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Carlos Madriles Gimeno, Pedro Marcuello, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou
  • Publication number: 20130305020
    Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
    Type: Application
    Filed: September 30, 2011
    Publication date: November 14, 2013
    Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
  • Patent number: 8583904
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: November 12, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130290685
    Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 31, 2013
    Inventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
  • Publication number: 20130290686
    Abstract: An integrated circuit device comprises at least one instruction processing module arranged to perform branch predication. The at least one instruction processing module comprises at least one predicate calculation module arranged to receive as an input at least one result vector for a predicate function and at least one conditional parameter value therefor and output a predicate result value from the at least one result vector based at least partly on the at least one received conditional parameter value.
    Type: Application
    Filed: January 21, 2011
    Publication date: October 31, 2013
    Applicant: Freescale Semiconductor, Inc.
    Inventors: Yuval Peled, Itzhak Barak, Idan Rozenberg, Doron Schupper, Lev Vaskevich
  • Publication number: 20130275730
    Abstract: An apparatus is described that includes instruction execution logic circuitry to execute first, second, third and fourth instructions. Both the first instruction and the second instruction select a first group of input vector elements from one of multiple first non overlapping sections of respective first and second input vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Application
    Filed: December 23, 2011
    Publication date: October 17, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Publication number: 20130275731
    Abstract: An apparatus is described having a semiconductor chip that has an instruction execution pipeline. The instruction execution pipeline has an execution unit with logic circuitry to perform the following for an instruction: accept input vector elements representing real and imaginary parts of a plurality of complex numbers; and, present the complex conjugates of the complex numbers.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 17, 2013
    Inventors: Suleyman Sair, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 8555037
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: October 8, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130262837
    Abstract: A processor includes an execution unit to execute instructions, where each operand of each executed instruction has one or more elements of an element size and at least one operand of the instruction corresponds to a register of a register size. The processor further includes a counter configured to count a number of instructions that have been executed by the execution unit associated with a particular combination of register size and element size.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Applicant: INTEL CORPORATION
    Inventors: Laura A. Knauth, Matthew C. Merten, Ronak Singhal, Hugh M. Caffey
  • Patent number: 8549265
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: October 1, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130246757
    Abstract: Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.
    Type: Application
    Filed: March 3, 2013
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
  • Publication number: 20130246758
    Abstract: Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements.
    Type: Application
    Filed: March 3, 2013
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz, Timothy J. Slegel
  • Publication number: 20130246759
    Abstract: Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.
    Type: Application
    Filed: March 4, 2013
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Patent number: 8539205
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: September 17, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130238880
    Abstract: An operation processing device for executing a plurality of operations for aligned data by one vector instruction includes a first mask storage unit and a second mask storage unit. The first mask storage unit stores first mask data to designate each of the plurality of operations a true or false operation, and the second mask storage unit stores second mask data to designate a number to be true continuously, in the plurality of operations.
    Type: Application
    Filed: January 14, 2013
    Publication date: September 12, 2013
    Inventor: Masahiko TOICHI
  • Patent number: 8527742
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: September 3, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130219153
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Application
    Filed: March 15, 2013
    Publication date: August 22, 2013
    Inventor: Patrice Roussel
  • Patent number: 8504805
    Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).
    Type: Grant
    Filed: April 22, 2009
    Date of Patent: August 6, 2013
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittle, Yuan C. Chou, Jared C. Smolens
  • Patent number: 8504806
    Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: August 6, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130191619
    Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 25, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Patent number: 8495343
    Abstract: A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed.
    Type: Grant
    Filed: June 4, 2010
    Date of Patent: July 23, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Gerard M. Col, Timothy A. Elliott, Rodney E. Hooker, Terry Parks
  • Patent number: 8495342
    Abstract: A processor having multiple cores coordinates functions performed on the cores to automatically, dynamically and repeatedly reconfigure the cores for optimal performance based on characteristics of currently executing software. A core running a thread detects a multi-core characteristic of the thread and assigns one or more other cores to the thread to dynamically combine the cores into what functionally amounts to a common core for more efficient execution of the thread.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: July 23, 2013
    Assignee: International Business Machines Corporation
    Inventors: Louis B. Capps, Jr., Michael J. Shapiro, Robert H. Bell, Jr., Thomas E. Cook, William E. Burky
  • Patent number: 8495602
    Abstract: The present disclosure includes a shader compiler system and method. In an embodiment, a shader compiler includes a decoder to translate an instruction having a vector representation to a unified instruction representation. The shader compiler also includes an encoder to translate an instruction having a unified instruction representation to a processor executable instruction.
    Type: Grant
    Filed: September 28, 2007
    Date of Patent: July 23, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Lin Chen, Guofang Jiao, Chihong Zhang, Junhong Sun
  • Publication number: 20130179661
    Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.
    Type: Application
    Filed: March 4, 2013
    Publication date: July 11, 2013
    Inventor: Eric Sprangle
  • Patent number: 8484266
    Abstract: An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. According to an embedded control system in the present invention, when discrete data in the floating-point format is stored in a read-only memory, the discrete data in the floating-point format is converted into data in a significand-reduced floating-point format before being stored. Here, a significand-reduced floating-point number is a number obtained by deleting low-order bits of the significand of a floating-point number. Further, an interpolation search is performed using discrete data, the discrete data in the significand-reduced floating-point format stored in the read-only memory is brought back to the discrete data in the floating-point format before an interpolation search being performed.
    Type: Grant
    Filed: February 19, 2009
    Date of Patent: July 9, 2013
    Assignee: Hitachi, Ltd.
    Inventors: Shinya Fujimoto, Keiichiro Ohkawa
  • Patent number: 8484443
    Abstract: The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector.
    Type: Grant
    Filed: May 3, 2012
    Date of Patent: July 9, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130173891
    Abstract: Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.
    Type: Application
    Filed: December 29, 2011
    Publication date: July 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Steven R. Carlough, Reid T. Copeland, Charles W. Gainey, JR., Marcel Mitran, Eric M. Schwarz, Timothy J. Slegel
  • Patent number: 8478969
    Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: July 2, 2013
    Assignee: Intel Corporation
    Inventor: Eric S. Sprangle
  • Publication number: 20130159682
    Abstract: A method for operating a decimal-floating point (DFP) processor. The method includes identifying a first op-code requiring read access to a first plurality of DFP operands in a vector register of the DFP processor; granting read access from a first port of the vector register to a first execution unit of the DFP processor selected to execute the first op-code; initializing a read pointer of the first port; reading out, from the first port and based on the read pointer, a first DFP operand of the plurality of DFP operands in response to a read request from the first execution unit; and adjusting the read pointer of the first port in response to reading out the first DFP operand.
    Type: Application
    Filed: December 19, 2012
    Publication date: June 20, 2013
    Applicant: SILMINDS, LLC.
    Inventor: SILMINDS, LLC.
  • Publication number: 20130159681
    Abstract: Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists.
    Type: Application
    Filed: December 10, 2012
    Publication date: June 20, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130159680
    Abstract: Methods, systems, and computer program products for the performance of arithmetic operations on large numbers. The addition of large numbers may be parallelized by adding corresponding sections of the numbers in parallel. The multiplication of large numbers may be accomplished by applying a multiplier to a multiplicand after the latter is divided into sections, where the multiplication of the sections is performed in parallel. Products for each section are saved in high and low order vectors, which may then be aligned and added. The comparison of two large numbers may be performed by comparing the numbers, section by section, in parallel. In an embodiment, these processes may be performed in a graphics processing unit (GPU) having multiple cores. In an embodiment, such a GPU may be integrated into a larger die that also incorporates one or more conventional central processing unit (CPU) cores.
    Type: Application
    Filed: December 19, 2011
    Publication date: June 20, 2013
    Inventors: Wei-yu Chen, Guei-yuan Lueh, Kaiyu Chen, Xiaozhu Kang
  • Patent number: 8464031
    Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: June 11, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8458444
    Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.
    Type: Grant
    Filed: April 22, 2009
    Date of Patent: June 4, 2013
    Assignee: Oracle America, Inc.
    Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
  • Patent number: 8447956
    Abstract: The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector.
    Type: Grant
    Filed: July 22, 2011
    Date of Patent: May 21, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20130103932
    Abstract: A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register.
    Type: Application
    Filed: December 17, 2012
    Publication date: April 25, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORP
  • Patent number: 8417921
    Abstract: The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: April 9, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20130086367
    Abstract: Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.
    Type: Application
    Filed: October 3, 2011
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20130073836
    Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
    Type: Application
    Filed: September 16, 2011
    Publication date: March 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson, Valentina Salapura
  • Publication number: 20130073837
    Abstract: A function's purity may be estimated by comparing a new input vector to previously analyzed input vectors. When a new input vector is within a confidence boundary, the new input vector may be treated as a known vector, even when that vector has not been evaluated. The input vector may reflect the input parameters passed to a function, and the function may be analyzed to determine whether to memoize with the input vector. The function may be a function that behaves as a pure function in some circumstances and with some input vectors, but not with others. By memoizing the function when possible, the function may be executed much faster, thereby improving performance.
    Type: Application
    Filed: November 8, 2012
    Publication date: March 21, 2013
    Applicant: CONCURIX CORPORATION
    Inventor: Concurix Corporation
  • Publication number: 20130067204
    Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. Other embodiments are also described.
    Type: Application
    Filed: November 6, 2012
    Publication date: March 14, 2013
    Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan