Floating Point Or Vector Patents (Class 712/222)
-
Publication number: 20140052968Abstract: A method of processing an instruction is described that includes fetching and decoding the instruction. The instruction has separate destination address, first operand source address and second operand source address components. The first operand source address identifies a location of a first mask pattern in mask register space. The second operand source address identifies a location of a second mask pattern in the mask register space. The method further includes fetching the first mask pattern from the mask register space; fetching the second mask pattern from the mask register space; merging the first and second mask patterns into a merged mask pattern; and, storing the merged mask pattern at a storage location identified by the destination address.Type: ApplicationFiled: December 23, 2011Publication date: February 20, 2014Applicant: Intel CorporationInventors: Jesus Corbal, Andrew T. Forsyth, Roger Espasa, Manel Fernandez, Thomas D. Fletcher
-
Patent number: 8650383Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector.Type: GrantFiled: December 23, 2010Date of Patent: February 11, 2014Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Publication number: 20140040603Abstract: Embodiments relate to vector processing in an active memory device. An aspect includes a method for vector processing in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Based on the iteration count, execution of the sub-instructions in parallel is repeated for multiple iterations by the processing element. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions.Type: ApplicationFiled: August 3, 2012Publication date: February 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Ravi Nair, Daniel A. Prener
-
Patent number: 8639914Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.Type: GrantFiled: December 29, 2012Date of Patent: January 28, 2014Assignee: Intel CorporationInventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
-
Publication number: 20140019728Abstract: A data processing apparatus includes a register bank having a plurality of registers for storing vectors being processed; a pipelined processor for processing the stream of vector instructions; the pipelined processor comprising circuitry configured to detect data dependencies for the vectors processed by the stream of vector instructions and stored in the plurality of registers and to determine constraints on timing of execution for the vector instructions such that no register data hazards arise. Register data hazards arise where two accesses to a same register, at least one of said accesses being a write, occur in an order different to an order of said instruction stream such that an access occurring later in said instruction stream starts before an access occurring earlier in said instruction stream has completed. The pipelined processor includes data element hazard determination circuitry.Type: ApplicationFiled: July 11, 2012Publication date: January 16, 2014Inventor: Alastair David REID
-
Publication number: 20140006755Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Inventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Publication number: 20140006756Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Inventors: Igor Ermolaev, Elmoustapha Ould-Ahmed-Vall, Brett Toll, Andrey Naraikin, Jesus Corbal
-
Publication number: 20130339678Abstract: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.Type: ApplicationFiled: December 23, 2011Publication date: December 19, 2013Inventors: Mikhail Plotnikov, Andrey Naraikan, Elmoustapha Ould-Ahmed-vall, Robert Valentine, Bret L. Toll, Jesus Corbal
-
Publication number: 20130332707Abstract: A processing apparatus may be configured to include logic to generate a first set of vectors based on a first integer and a second set of vectors based on a second integer, logic to calculate sub products by multiplying the first set of vectors to the second set of vectors, logic to split each sub product into a first half and a second half and logic to generate a final result by adding together all first and second halves at respective digit positions.Type: ApplicationFiled: June 7, 2012Publication date: December 12, 2013Applicant: INTEL CORPORATIONInventors: Shay GUERON, Vlad KRASNOV
-
Publication number: 20130326199Abstract: Disclosed is an apparatus and method generally related to controlling a multimedia extension control and status register (MXCSR). A processor core may include a floating point unit (FPU) to perform arithmetic functions; and a multimedia extension control register (MXCR) to provide control bits to the FPU. Further an optimizer may be used to select a speculative multimedia extension status register (SPEC_MXSR) from a plurality of SPEC_MXSRs to update a multimedia extension status register (MXSR) based upon an instruction.Type: ApplicationFiled: December 29, 2011Publication date: December 5, 2013Inventors: Grigorios Magklis, Josep M. Codina, Craig B. Zilles, Michael Neilly, Sridhar Samudrala, Alejandro Martinez Vicente, Polychronis Xekalakis, F. Jesus Sanchez, Marc Lupon, Georgios Tournavitis, Enric Gibert Codina, Crispin Gomez Requena, Antonio Gonzalez, Mirem Hyuseinova, Christos E. Kotselidis, Fernando Latorre, Pedro Lopez, Carlos Madriles Gimeno, Pedro Marcuello, Raul Martinez, Daniel Ortega, Demos Pavlou, Kyriakos A. Stavrou
-
Publication number: 20130305020Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.Type: ApplicationFiled: September 30, 2011Publication date: November 14, 2013Inventors: Robert C. Valentine, Jesus Corbal San Adrian, Roger Espasa Sans, Robert D. Cavin, Bret L. Toll, Santiago Galan Duran, Jeffrey G. Wiedemeier, Sridhar Samudrala, Milind Baburao Girkar, Edward Thomas Grochowski, Jonathan Cannon Hall, Dennis R. Bradford, Elmoustapha Ould-Ahmed-Vall, James C. Abel, Mark Charney, Seth Abraham, Suleyman Sair, Andrew Thomas Forsyth, Lisa Wu, Charles Yount
-
Patent number: 8583904Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector.Type: GrantFiled: September 27, 2012Date of Patent: November 12, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130290685Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.Type: ApplicationFiled: December 22, 2011Publication date: October 31, 2013Inventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
-
Publication number: 20130290686Abstract: An integrated circuit device comprises at least one instruction processing module arranged to perform branch predication. The at least one instruction processing module comprises at least one predicate calculation module arranged to receive as an input at least one result vector for a predicate function and at least one conditional parameter value therefor and output a predicate result value from the at least one result vector based at least partly on the at least one received conditional parameter value.Type: ApplicationFiled: January 21, 2011Publication date: October 31, 2013Applicant: Freescale Semiconductor, Inc.Inventors: Yuval Peled, Itzhak Barak, Idan Rozenberg, Doron Schupper, Lev Vaskevich
-
Publication number: 20130275730Abstract: An apparatus is described that includes instruction execution logic circuitry to execute first, second, third and fourth instructions. Both the first instruction and the second instruction select a first group of input vector elements from one of multiple first non overlapping sections of respective first and second input vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.Type: ApplicationFiled: December 23, 2011Publication date: October 17, 2013Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
-
Publication number: 20130275731Abstract: An apparatus is described having a semiconductor chip that has an instruction execution pipeline. The instruction execution pipeline has an execution unit with logic circuitry to perform the following for an instruction: accept input vector elements representing real and imaginary parts of a plurality of complex numbers; and, present the complex conjugates of the complex numbers.Type: ApplicationFiled: December 22, 2011Publication date: October 17, 2013Inventors: Suleyman Sair, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 8555037Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.Type: GrantFiled: September 24, 2012Date of Patent: October 8, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130262837Abstract: A processor includes an execution unit to execute instructions, where each operand of each executed instruction has one or more elements of an element size and at least one operand of the instruction corresponds to a register of a register size. The processor further includes a counter configured to count a number of instructions that have been executed by the execution unit associated with a particular combination of register size and element size.Type: ApplicationFiled: March 29, 2012Publication date: October 3, 2013Applicant: INTEL CORPORATIONInventors: Laura A. Knauth, Matthew C. Merten, Ronak Singhal, Hugh M. Caffey
-
Patent number: 8549265Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.Type: GrantFiled: September 24, 2012Date of Patent: October 1, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130246757Abstract: Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.Type: ApplicationFiled: March 3, 2013Publication date: September 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
-
Publication number: 20130246758Abstract: Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements.Type: ApplicationFiled: March 3, 2013Publication date: September 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Eric M. Schwarz, Timothy J. Slegel
-
Publication number: 20130246759Abstract: Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.Type: ApplicationFiled: March 4, 2013Publication date: September 19, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
-
Patent number: 8539205Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.Type: GrantFiled: September 24, 2012Date of Patent: September 17, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130238880Abstract: An operation processing device for executing a plurality of operations for aligned data by one vector instruction includes a first mask storage unit and a second mask storage unit. The first mask storage unit stores first mask data to designate each of the plurality of operations a true or false operation, and the second mask storage unit stores second mask data to designate a number to be true continuously, in the plurality of operations.Type: ApplicationFiled: January 14, 2013Publication date: September 12, 2013Inventor: Masahiko TOICHI
-
Patent number: 8527742Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.Type: GrantFiled: September 11, 2012Date of Patent: September 3, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130219153Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: ApplicationFiled: March 15, 2013Publication date: August 22, 2013Inventor: Patrice Roussel
-
Patent number: 8504805Abstract: Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes).Type: GrantFiled: April 22, 2009Date of Patent: August 6, 2013Assignee: Oracle America, Inc.Inventors: Robert T. Golla, Paul J. Jordan, Jama I. Barreh, Matthew B. Smittle, Yuan C. Chou, Jared C. Smolens
-
Patent number: 8504806Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.Type: GrantFiled: May 31, 2012Date of Patent: August 6, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130191619Abstract: A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.Type: ApplicationFiled: January 23, 2013Publication date: July 25, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Patent number: 8495343Abstract: A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed.Type: GrantFiled: June 4, 2010Date of Patent: July 23, 2013Assignee: VIA Technologies, Inc.Inventors: G. Glenn Henry, Gerard M. Col, Timothy A. Elliott, Rodney E. Hooker, Terry Parks
-
Patent number: 8495342Abstract: A processor having multiple cores coordinates functions performed on the cores to automatically, dynamically and repeatedly reconfigure the cores for optimal performance based on characteristics of currently executing software. A core running a thread detects a multi-core characteristic of the thread and assigns one or more other cores to the thread to dynamically combine the cores into what functionally amounts to a common core for more efficient execution of the thread.Type: GrantFiled: December 16, 2008Date of Patent: July 23, 2013Assignee: International Business Machines CorporationInventors: Louis B. Capps, Jr., Michael J. Shapiro, Robert H. Bell, Jr., Thomas E. Cook, William E. Burky
-
Patent number: 8495602Abstract: The present disclosure includes a shader compiler system and method. In an embodiment, a shader compiler includes a decoder to translate an instruction having a vector representation to a unified instruction representation. The shader compiler also includes an encoder to translate an instruction having a unified instruction representation to a processor executable instruction.Type: GrantFiled: September 28, 2007Date of Patent: July 23, 2013Assignee: QUALCOMM IncorporatedInventors: Lin Chen, Guofang Jiao, Chihong Zhang, Junhong Sun
-
Publication number: 20130179661Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.Type: ApplicationFiled: March 4, 2013Publication date: July 11, 2013Inventor: Eric Sprangle
-
Patent number: 8484266Abstract: An embedded control system capable of ensuring precision in arithmetic with data in the floating-point format and also avoiding a shortage of the storage area of a memory is provided. According to an embedded control system in the present invention, when discrete data in the floating-point format is stored in a read-only memory, the discrete data in the floating-point format is converted into data in a significand-reduced floating-point format before being stored. Here, a significand-reduced floating-point number is a number obtained by deleting low-order bits of the significand of a floating-point number. Further, an interpolation search is performed using discrete data, the discrete data in the significand-reduced floating-point format stored in the read-only memory is brought back to the discrete data in the floating-point format before an interpolation search being performed.Type: GrantFiled: February 19, 2009Date of Patent: July 9, 2013Assignee: Hitachi, Ltd.Inventors: Shinya Fujimoto, Keiichiro Ohkawa
-
Patent number: 8484443Abstract: The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector.Type: GrantFiled: May 3, 2012Date of Patent: July 9, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130173891Abstract: Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.Type: ApplicationFiled: December 29, 2011Publication date: July 4, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Steven R. Carlough, Reid T. Copeland, Charles W. Gainey, JR., Marcel Mitran, Eric M. Schwarz, Timothy J. Slegel
-
Patent number: 8478969Abstract: In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.Type: GrantFiled: September 24, 2010Date of Patent: July 2, 2013Assignee: Intel CorporationInventor: Eric S. Sprangle
-
Publication number: 20130159682Abstract: A method for operating a decimal-floating point (DFP) processor. The method includes identifying a first op-code requiring read access to a first plurality of DFP operands in a vector register of the DFP processor; granting read access from a first port of the vector register to a first execution unit of the DFP processor selected to execute the first op-code; initializing a read pointer of the first port; reading out, from the first port and based on the read pointer, a first DFP operand of the plurality of DFP operands in response to a read request from the first execution unit; and adjusting the read pointer of the first port in response to reading out the first DFP operand.Type: ApplicationFiled: December 19, 2012Publication date: June 20, 2013Applicant: SILMINDS, LLC.Inventor: SILMINDS, LLC.
-
Publication number: 20130159681Abstract: Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists.Type: ApplicationFiled: December 10, 2012Publication date: June 20, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Publication number: 20130159680Abstract: Methods, systems, and computer program products for the performance of arithmetic operations on large numbers. The addition of large numbers may be parallelized by adding corresponding sections of the numbers in parallel. The multiplication of large numbers may be accomplished by applying a multiplier to a multiplicand after the latter is divided into sections, where the multiplication of the sections is performed in parallel. Products for each section are saved in high and low order vectors, which may then be aligned and added. The comparison of two large numbers may be performed by comparing the numbers, section by section, in parallel. In an embodiment, these processes may be performed in a graphics processing unit (GPU) having multiple cores. In an embodiment, such a GPU may be integrated into a larger die that also incorporates one or more conventional central processing unit (CPU) cores.Type: ApplicationFiled: December 19, 2011Publication date: June 20, 2013Inventors: Wei-yu Chen, Guei-yuan Lueh, Kaiyu Chen, Xiaozhu Kang
-
Patent number: 8464031Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.Type: GrantFiled: April 26, 2012Date of Patent: June 11, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8458444Abstract: Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency.Type: GrantFiled: April 22, 2009Date of Patent: June 4, 2013Assignee: Oracle America, Inc.Inventors: Yuan C. Chou, Jared C. Smolens, Jeffrey S. Brooks
-
Patent number: 8447956Abstract: The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector.Type: GrantFiled: July 22, 2011Date of Patent: May 21, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20130103932Abstract: A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register.Type: ApplicationFiled: December 17, 2012Publication date: April 25, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: INTERNATIONAL BUSINESS MACHINES CORP
-
Patent number: 8417921Abstract: The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector.Type: GrantFiled: August 31, 2010Date of Patent: April 9, 2013Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20130086367Abstract: Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.Type: ApplicationFiled: October 3, 2011Publication date: April 4, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Publication number: 20130073836Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).Type: ApplicationFiled: September 16, 2011Publication date: March 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Brett Olsson, Valentina Salapura
-
Publication number: 20130073837Abstract: A function's purity may be estimated by comparing a new input vector to previously analyzed input vectors. When a new input vector is within a confidence boundary, the new input vector may be treated as a known vector, even when that vector has not been evaluated. The input vector may reflect the input parameters passed to a function, and the function may be analyzed to determine whether to memoize with the input vector. The function may be a function that behaves as a pure function in some circumstances and with some input vectors, but not with others. By memoizing the function when possible, the function may be executed much faster, thereby improving performance.Type: ApplicationFiled: November 8, 2012Publication date: March 21, 2013Applicant: CONCURIX CORPORATIONInventor: Concurix Corporation
-
Publication number: 20130067204Abstract: Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. Other embodiments are also described.Type: ApplicationFiled: November 6, 2012Publication date: March 14, 2013Inventors: Cristina S. Anderson, Simon Rubanovich, Benny Eitan