Vector Processor Operation Patents (Class 712/7)
  • Publication number: 20140189292
    Abstract: An apparatus is described having a functional unit of an instruction execution pipeline. The functional unit has a plurality of compare-and-exchange circuits coupled to network circuitry to implement a vector sorting tree for a vector sorting instruction. Each of the compare-and-exchange circuits has a respective comparison circuit that compares a pair of inputs. Each of the compare-and-exchange circuits have a same sided first output for presenting a higher of the two inputs and a same sided second output for presenting a lower of the two inputs, said comparison circuit to also support said functional unit's execution of a prefix min and/or prefix add instruction.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Robert M. IOFFE, Nicholas C. GALOPPO VON BORRIES
  • Publication number: 20140189291
    Abstract: A method is described that performing an image integral calculation by creating a second vector and creating a third vector. The second vector is created by executing a first instruction that adds alternating elements of a first vector to respective neighboring elements of the first vector and presents resulting summations into said second vector. The first instruction also passes through the respective neighboring elements to said second vector. The third vector is created by executing a second instruction that adds elements of one side of the second vector to an element of another side of the second vector and passes through the another side of the second vector.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Liu YANG, Bin Robin WANG
  • Publication number: 20140189294
    Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Matt WALSH, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Bret TOLL
  • Publication number: 20140189293
    Abstract: A processor is described having an instruction execution pipeline having a functional unit to execute an instruction that compares vector elements against an input value. Each of the vector elements and the input value have a first respective section identifying a location within data and a second respective section having a byte sequence of the data. The functional unit has comparison circuitry to compare respective byte sequences of the input vector elements against the input value's byte sequence to identify a number of matching bytes for each comparison. The functional unit also has difference circuitry to determine respective distances between the input vector ‘s elements’ byte sequences and the input value's byte sequence within the data.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Vinodh GOPAl, James GUILFORD
  • Publication number: 20140181466
    Abstract: An apparatus includes a decode unit to decode a permute instruction and a vector conflict instruction. A vector execution unit is coupled with the decode unit and includes a fully-connected interconnect. The fully-connected interconnect has at least four inputs to receive at least four corresponding data elements of at least one source vector. The fully-connected interconnect has at least four outputs. Each of the at least four inputs is coupled with each of the at least four outputs. The execution unit also includes a permute instruction execution logic coupled with the at least four outputs and operable to store a first vector result in response to the permute instruction. The execution unit also includes a vector conflict instruction execution logic coupled with the at least four outputs and operable to store a second vector result in a destination storage location in response to the vector conflict instruction.
    Type: Application
    Filed: December 29, 2011
    Publication date: June 26, 2014
    Inventors: Andrew Thomas Forsyth, Dennis R. Bradford
  • Patent number: 8750285
    Abstract: Embodiments describe a system and/or method for efficient classification of network packets. According to an aspect a method includes describing a packet as a feature vector and mapping the feature vector to a feature space. The method can further include defining a feature prism, classifying the packet relative to the feature prism, and determining if the feature vector matches the feature prism. If the feature vector matches the feature prism the packet is passed to a data recipient, if not, the packet is blocked. Another embodiment is an apparatus that includes an identification component that defines at least one feature of a packet and a classification component that classifies the packet based at least in part upon the at least one defined feature.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: June 10, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Michael Paddon, Gregory Gordon Rose, Philip Michael Hawkes
  • Patent number: 8745360
    Abstract: Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: June 3, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20140129803
    Abstract: Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.
    Type: Application
    Filed: January 14, 2014
    Publication date: May 8, 2014
    Applicant: A-Life Medical, LLC.
    Inventors: Daniel T. Heinze, Mark L. Morsch
  • Patent number: 8719549
    Abstract: To provide a device to reconfigure multi-level logic networks, which enable logic modification and reconfiguration of a multi-level logic network with small circuit area and low-power dissipation in a simple manner. For example, in the case of reconfiguring a multi-level logic network following logic modification for deleting an output vector F(b) of an objective logic function F(X) corresponding to an input vector b, unmodified pq elements are selected one by one from the nearest pq element EG to an output side. At this time, among output values of pq elements closer to an input side than selected pq elements, output values corresponding to the input vector, which equal an output value corresponding to any input variable X other than the input vector b are considered modified and thus not selected. Then, a selected output value corresponding to the input vector b is rewritten to an “invalid value”.
    Type: Grant
    Filed: March 2, 2007
    Date of Patent: May 6, 2014
    Assignee: Kyushu Institute of Technology
    Inventor: Tsutomu Sasao
  • Publication number: 20140122832
    Abstract: Generally, this disclosure provides technologies for generating and executing partially vectorized code that may include backward dependencies within a loop body of the code to be vectorized. The method may include identifying backward dependencies within a loop body of the code; selecting one or more ranges of iterations within the loop body, wherein the selected ranges exclude the identified backward dependencies; and vectorizing the selected ranges. The system may include a vector processor configured to provide predicated vector instruction execution, loop iteration range enabling, and dynamic loop dependence checking.
    Type: Application
    Filed: October 25, 2012
    Publication date: May 1, 2014
    Inventors: Tin-Fook Ngai, Chunxiao Lin, Yingzhe Shen, Chao Zhang
  • Patent number: 8707012
    Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.
    Type: Grant
    Filed: October 12, 2012
    Date of Patent: April 22, 2014
    Assignee: Intel Corporation
    Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
  • Publication number: 20140075153
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Application
    Filed: November 14, 2013
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Publication number: 20140068226
    Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
    Type: Application
    Filed: March 12, 2013
    Publication date: March 6, 2014
    Inventors: MIKHAIL SMELYANSKIY, VICTOR LEE, CHRISTOPHER HUGHES, DAEHYUN KIM, YEN-KUANG CHEN, CHANGKYU KIM, JATIN CHHUGANI, ANTHONY D. NGUYEN, SANJEEV KUMAR
  • Patent number: 8667042
    Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
  • Publication number: 20140059323
    Abstract: Systems and methods of data extraction in a vector processor are disclosed. In a particular embodiment a method of data extraction in a vector processor includes copying at least one data element to a source register of a permutation network. The method includes reordering multiple data elements of the source register, populating a destination register of the permutation network with the reordered data elements, and copying the reordered data elements from the destination register to a memory.
    Type: Application
    Filed: August 23, 2012
    Publication date: February 27, 2014
    Applicant: Qualcomm Incorporated
    Inventors: Jose Fridman, Ajay Anant Ingle, Deepak Mathew, Marc M. Hoffman, Michael John Lopez
  • Publication number: 20140052959
    Abstract: A method is provided for reducing the data set used in creating an optimization algorithm, thus to permit the use of microprocessors, that in turn permits embedding the optimization algorithm at the point of performance, in which a subset of data points in a performance window is used to derive a vector that is utilized to create an initial optimization algorithm.
    Type: Application
    Filed: September 16, 2010
    Publication date: February 20, 2014
    Inventor: Ronald E. Wagner
  • Patent number: 8650383
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: February 11, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Publication number: 20140032877
    Abstract: A method is described that includes performing the following with a single instruction: receiving a first input operand V; receiving a second input operand S; calculating V?S; determining if V?S is positive or negative; and, providing as a resultant: V if V?S is negative; V?S if V?S is positive.
    Type: Application
    Filed: December 23, 2011
    Publication date: January 30, 2014
    Inventors: Thomas R. Craver, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20130339661
    Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.
    Type: Application
    Filed: December 30, 2011
    Publication date: December 19, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
  • Publication number: 20130332701
    Abstract: An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register.
    Type: Application
    Filed: December 23, 2011
    Publication date: December 12, 2013
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 8601236
    Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.
    Type: Grant
    Filed: February 29, 2012
    Date of Patent: December 3, 2013
    Assignee: Cray Inc.
    Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
  • Publication number: 20130297908
    Abstract: A processing architecture uses stationary operands and opcodes common on a plurality of processors. Only data moves through the processors. The same opcode and operand is used by each processor assigned to operate, for example, on one row of pixels, one row of numbers, or one row of points in space.
    Type: Application
    Filed: December 30, 2011
    Publication date: November 7, 2013
    Inventor: Scott A. Krig
  • Publication number: 20130290672
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Application
    Filed: December 23, 2011
    Publication date: October 31, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
  • Patent number: 8560815
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a Boolean operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: October 15, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8554421
    Abstract: A vehicle having a dynamics control system, the vehicle comprising: a first set comprising multiple adjustable sub-systems that affect the performance of the vehicle's powertrain; a second set comprising multiple adjustable sub-systems that affect the vehicle's handling; a dynamics user interface including a first input device and a second input device; and a dynamics controller coupled to the user interface and configured to adjust the operation of the sub-systems of the first set in dependence on the first input device and to adjust the operation of the sub-systems of the second set in dependence on the second input device.
    Type: Grant
    Filed: September 8, 2010
    Date of Patent: October 8, 2013
    Assignee: McLaren Automotive Limited
    Inventors: Antony Sheriff, Mark Vinnels, Richard Felton
  • Patent number: 8555037
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: October 8, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8549265
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: October 1, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8539205
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: September 17, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8532288
    Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
  • Patent number: 8527742
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: September 3, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8504806
    Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: August 6, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8495346
    Abstract: A processor. The processor includes a first register for storing a first packed data, a decoder, and a functional unit. The decoder has a control signal input. The control signal input is for receiving a first control signal and a second control signal. The first control signal is for indicating a pack operation. The second control signal is for indicating an unpack operation. The functional unit is coupled to the decoder and the register. The functional unit is for performing the pack operation and the unpack operation using the first packed data. The processor also supports a move operation.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: July 23, 2013
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 8495342
    Abstract: A processor having multiple cores coordinates functions performed on the cores to automatically, dynamically and repeatedly reconfigure the cores for optimal performance based on characteristics of currently executing software. A core running a thread detects a multi-core characteristic of the thread and assigns one or more other cores to the thread to dynamically combine the cores into what functionally amounts to a common core for more efficient execution of the thread.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: July 23, 2013
    Assignee: International Business Machines Corporation
    Inventors: Louis B. Capps, Jr., Michael J. Shapiro, Robert H. Bell, Jr., Thomas E. Cook, William E. Burky
  • Publication number: 20130185540
    Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core includes a program memory interface through which the scalar processor retrieves instructions from a program memory. The instructions include scalar instructions executable by the scalar processor and vector instructions executable by the vector coprocessor core. The vector coprocessor core includes a plurality of execution units and a vector command buffer. The vector command buffer is configured to decode vector instructions passed by the scalar processor core, to determine whether vector instructions defining an instruction loop have been decoded, and to initiate execution of the instruction loop by one or more of the execution units based on a determination that all of the vector instructions of the instruction loop have been decoded.
    Type: Application
    Filed: July 13, 2012
    Publication date: July 18, 2013
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Ching-Yu HUNG, Shinri INAMORI, Jagadeesh SANKARAN, Peter CHANG
  • Publication number: 20130159667
    Abstract: A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Applicant: MIPS TECHNOLOGIES, INC.
    Inventor: Ilie Garbacea
  • Publication number: 20130159666
    Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.
    Type: Application
    Filed: December 14, 2011
    Publication date: June 20, 2013
    Applicant: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
  • Publication number: 20130159668
    Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
    Type: Application
    Filed: December 20, 2011
    Publication date: June 20, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8464031
    Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.
    Type: Grant
    Filed: April 26, 2012
    Date of Patent: June 11, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8464026
    Abstract: A CPU may select a variable from a variable set as a dependent variable. The variable set may be part of the data structure that includes a plurality of vector values, a vector value associated with a variable set of n number of variables, and each variable of the variable set having a variable value. The number of dependent variable steps for the dependent variable may be determined. The number of the vector values in a dependent variable step is determined as being number of independent variables. A function is mapped to a plurality of thread processors, and each thread processor is assigned for the function to be performed on each one of the independent variables for each of the dependent variable steps.
    Type: Grant
    Filed: February 17, 2010
    Date of Patent: June 11, 2013
    Assignee: International Business Machines Corporation
    Inventors: Rajesh Ramkrishna Bordawekar, Ravishankar Rao
  • Publication number: 20130132707
    Abstract: A system, method, and computer program product are provided for assigning elements of a matrix to processing threads. In use, a matrix is received to be processed by a parallel processing architecture. Such parallel processing architecture includes a plurality of processors each capable of processing a plurality of threads. Elements of the matrix are assigned to each of the threads for processing, utilizing an algorithm that increases a contiguousness of the elements being processed by each thread.
    Type: Application
    Filed: January 15, 2013
    Publication date: May 23, 2013
    Applicant: NVIDIA CORPORATION
    Inventor: NVIDIA CORPORATION
  • Patent number: 8417921
    Abstract: The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: April 9, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20130086566
    Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.
    Type: Application
    Filed: September 29, 2011
    Publication date: April 4, 2013
    Inventors: Benedict R. Gaster, Lee W. Howes
  • Patent number: 8412914
    Abstract: A method for aggregating a program loop in a Macroscalar architecture includes identifying one or more instructions of the program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation. The method also includes modifying at least one of the one or more instructions to cause a processor executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: April 2, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130080737
    Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by the elements to the data store, and configured in response to receipt of at least two decoded vector data access instructions, and one of the instructions being a write instruction. Data accesses are performed in the instructed order to determine an element indicating the next data access for each of said vector data access instructions. One of the next data accesses is selected to be issued to the data store in dependence upon an order in which the at least two vector data instructions were received. The position of the elements indicates the next data accesses relative to each other within their respective plurality of elements. A numerical position of the element indicating the next data access within the plurality of elements of an earlier instruction is less than a predetermined value.
    Type: Application
    Filed: September 28, 2011
    Publication date: March 28, 2013
    Applicant: ARM Limited
    Inventor: Alastair David Reid
  • Publication number: 20130067196
    Abstract: A method of operating a computer processor includes storing at least one machine level vector instruction in a memory and replacing a plurality of machine level scalar instructions in a computer program with the at least one machine level vector instruction during execution of the computer program based on execution addresses associated with the plurality of machine level scalar instructions and/or instruction opcodes associated with the plurality of machine level scalar instructions.
    Type: Application
    Filed: September 13, 2011
    Publication date: March 14, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Gerald Paul Michalak, Charles Dave Estes
  • Publication number: 20130044117
    Abstract: Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.
    Type: Application
    Filed: August 18, 2011
    Publication date: February 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8375196
    Abstract: A data processing apparatus includes a vector register bank having a plurality of vector registers, each register including a plurality of storage cells, each cell storing a data element. A vector processing unit is provided for executing a sequence of vector instructions. The processing unit is arranged to issue a set rearrangement enable signal to the vector register bank. The write interface of the vector register bank is modified to provide not only a first input for receiving the data elements generated by the vector processing unit during normal execution, but also has a second input coupled via a data rearrangement path to the matrix of storage cells via which the data elements currently stored in the matrix of storage cells are provided to the write interface in a rearranged form representing the arrangement of data elements that would be obtained by performance of the predetermined rearrangement operation.
    Type: Grant
    Filed: January 19, 2010
    Date of Patent: February 12, 2013
    Assignee: ARM Limited
    Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
  • Publication number: 20130036293
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 24, 2012
    Publication date: February 7, 2013
    Applicant: Apple Inc.
    Inventor: Apple Inc.
  • Publication number: 20130024655
    Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a fixed-value addition operation dependent upon the input vector and the control vector.
    Type: Application
    Filed: September 27, 2012
    Publication date: January 24, 2013
    Applicant: APPLE INC.
    Inventor: APPLE INC.