Vector Processor Operation Patents (Class 712/7)

Sequential (Class 712/8)

Concurrent (Class 712/9)

Functional Unit Having Tree Structure To Support Vector Sorting Algorithm and Other Algorithms

Publication number: 20140189292

Abstract: An apparatus is described having a functional unit of an instruction execution pipeline. The functional unit has a plurality of compare-and-exchange circuits coupled to network circuitry to implement a vector sorting tree for a vector sorting instruction. Each of the compare-and-exchange circuits has a respective comparison circuit that compares a pair of inputs. Each of the compare-and-exchange circuits have a same sided first output for presenting a higher of the two inputs and a same sided second output for presenting a lower of the two inputs, said comparison circuit to also support said functional unit's execution of a prefix min and/or prefix add instruction.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Robert M. IOFFE, Nicholas C. GALOPPO VON BORRIES
Method And Apparatus For Integral Image Computation Instructions

Publication number: 20140189291

Abstract: A method is described that performing an image integral calculation by creating a second vector and creating a third vector. The second vector is created by executing a first instruction that adds alternating elements of a first vector to respective neighboring elements of the first vector and presents resulting summations into said second vector. The first instruction also passes through the respective neighboring elements to said second vector. The third vector is created by executing a second instruction that adds elements of one side of the second vector to an element of another side of the second vector and passes through the another side of the second vector.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Liu YANG, Bin Robin WANG
SYSTEMS, APPARATUSES, AND METHODS FOR DETERMINING DATA ELEMENT EQUALITY OR SEQUENTIALITY

Publication number: 20140189294

Abstract: Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Matt WALSH, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Bret TOLL
Instructions for Sliding Window Encoding Algorithms

Publication number: 20140189293

Abstract: A processor is described having an instruction execution pipeline having a functional unit to execute an instruction that compares vector elements against an input value. Each of the vector elements and the input value have a first respective section identifying a location within data and a second respective section having a byte sequence of the data. The functional unit has comparison circuitry to compare respective byte sequences of the input vector elements against the input value's byte sequence to identify a number of matching bytes for each comparison. The functional unit also has difference circuitry to determine respective distances between the input vector ‘s elements’ byte sequences and the input value's byte sequence within the data.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Inventors: Vinodh GOPAl, James GUILFORD
PROCESSORS HAVING FULLY-CONNECTED INTERCONNECTS SHARED BY VECTOR CONFLICT INSTRUCTIONS AND PERMUTE INSTRUCTIONS

Publication number: 20140181466

Abstract: An apparatus includes a decode unit to decode a permute instruction and a vector conflict instruction. A vector execution unit is coupled with the decode unit and includes a fully-connected interconnect. The fully-connected interconnect has at least four inputs to receive at least four corresponding data elements of at least one source vector. The fully-connected interconnect has at least four outputs. Each of the at least four inputs is coupled with each of the at least four outputs. The execution unit also includes a permute instruction execution logic coupled with the at least four outputs and operable to store a first vector result in response to the permute instruction. The execution unit also includes a vector conflict instruction execution logic coupled with the at least four outputs and operable to store a second vector result in a destination storage location in response to the vector conflict instruction.

Type: Application

Filed: December 29, 2011

Publication date: June 26, 2014

Inventors: Andrew Thomas Forsyth, Dennis R. Bradford
Efficient classification of network packets

Patent number: 8750285

Abstract: Embodiments describe a system and/or method for efficient classification of network packets. According to an aspect a method includes describing a packet as a feature vector and mapping the feature vector to a feature space. The method can further include defining a feature prism, classifying the packet relative to the feature prism, and determining if the feature vector matches the feature prism. If the feature vector matches the feature prism the packet is passed to a data recipient, if not, the packet is blocked. Another embodiment is an apparatus that includes an identification component that defines at least one feature of a packet and a classification component that classifies the packet based at least in part upon the at least one defined feature.

Type: Grant

Filed: September 26, 2011

Date of Patent: June 10, 2014

Assignee: QUALCOMM Incorporated

Inventors: Michael Paddon, Gregory Gordon Rose, Philip Michael Hawkes
Generating predicate values based on conditional data dependency in vector processors

Patent number: 8745360

Abstract: Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.

Type: Grant

Filed: September 24, 2008

Date of Patent: June 3, 2014

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
MULTI-MAGNITUDINAL VECTORS WITH RESOLUTION BASED ON SOURCE VECTOR FEATURES

Publication number: 20140129803

Abstract: Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.

Type: Application

Filed: January 14, 2014

Publication date: May 8, 2014

Applicant: A-Life Medical, LLC.

Inventors: Daniel T. Heinze, Mark L. Morsch
Device to reconfigure multi-level logic networks, method to reconfigure multi-level logic networks, device to modify logic networks, and reconfigurable multi-level logic network

Patent number: 8719549

Abstract: To provide a device to reconfigure multi-level logic networks, which enable logic modification and reconfiguration of a multi-level logic network with small circuit area and low-power dissipation in a simple manner. For example, in the case of reconfiguring a multi-level logic network following logic modification for deleting an output vector F(b) of an objective logic function F(X) corresponding to an input vector b, unmodified pq elements are selected one by one from the nearest pq element EG to an output side. At this time, among output values of pq elements closer to an input side than selected pq elements, output values corresponding to the input vector, which equal an output value corresponding to any input variable X other than the input vector b are considered modified and thus not selected. Then, a selected output value corresponding to the input vector b is rewritten to an “invalid value”.

Type: Grant

Filed: March 2, 2007

Date of Patent: May 6, 2014

Assignee: Kyushu Institute of Technology

Inventor: Tsutomu Sasao
PARTIAL VECTORIZATION COMPILATION SYSTEM

Publication number: 20140122832

Abstract: Generally, this disclosure provides technologies for generating and executing partially vectorized code that may include backward dependencies within a loop body of the code to be vectorized. The method may include identifying backward dependencies within a loop body of the code; selecting one or more ranges of iterations within the loop body, wherein the selected ranges exclude the identified backward dependencies; and vectorizing the selected ranges. The system may include a vector processor configured to provide predicated vector instruction execution, loop iteration range enabling, and dynamic loop dependence checking.

Type: Application

Filed: October 25, 2012

Publication date: May 1, 2014

Inventors: Tin-Fook Ngai, Chunxiao Lin, Yingzhe Shen, Chao Zhang
Implementing vector memory operations

Patent number: 8707012

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Grant

Filed: October 12, 2012

Date of Patent: April 22, 2014

Assignee: Intel Corporation

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS

Publication number: 20140075153

Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

Type: Application

Filed: November 14, 2013

Publication date: March 13, 2014

Applicant: International Business Machines Corporation

Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
VECTOR INSTRUCTIONS TO ENABLE EFFICIENT SYNCHRONIZATION AND PARALLEL REDUCTION OPERATIONS

Publication number: 20140068226

Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.

Type: Application

Filed: March 12, 2013

Publication date: March 6, 2014

Inventors: MIKHAIL SMELYANSKIY, VICTOR LEE, CHRISTOPHER HUGHES, DAEHYUN KIM, YEN-KUANG CHEN, CHANGKYU KIM, JATIN CHHUGANI, ANTHONY D. NGUYEN, SANJEEV KUMAR
Functional unit for vector integer multiply add instruction

Patent number: 8667042

Abstract: A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation.

Type: Grant

Filed: September 24, 2010

Date of Patent: March 4, 2014

Assignee: Intel Corporation

Inventors: Jeff Wiedemeier, Sridhar Samudrala, Roger Golliver
SYSTEMS AND METHODS OF DATA EXTRACTION IN A VECTOR PROCESSOR

Publication number: 20140059323

Abstract: Systems and methods of data extraction in a vector processor are disclosed. In a particular embodiment a method of data extraction in a vector processor includes copying at least one data element to a source register of a permutation network. The method includes reordering multiple data elements of the source register, populating a destination register of the permutation network with the reordered data elements, and copying the reordered data elements from the destination register to a memory.

Type: Application

Filed: August 23, 2012

Publication date: February 27, 2014

Applicant: Qualcomm Incorporated

Inventors: Jose Fridman, Ajay Anant Ingle, Deepak Mathew, Marc M. Hoffman, Michael John Lopez
Experimental engineering optimization algorithm at point of performance

Publication number: 20140052959

Abstract: A method is provided for reducing the data set used in creating an optimization algorithm, thus to permit the use of microprocessors, that in turn permits embedding the optimization algorithm at the point of performance, in which a subset of data points in a performance window is used to derive a vector that is utilized to create an initial optimization algorithm.

Type: Application

Filed: September 16, 2010

Publication date: February 20, 2014

Inventor: Ronald E. Wagner
Vector processing with predicate vector for setting element values based on key element position by executing remaining instruction

Patent number: 8650383

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector.

Type: Grant

Filed: December 23, 2010

Date of Patent: February 11, 2014

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
System and method for implementing elliptic curve scalar multiplication in cryptography

Patent number: 8649508

Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.

Type: Grant

Filed: September 29, 2008

Date of Patent: February 11, 2014

Assignee: Tata Consultancy Services Ltd.

Inventor: Natarajan Vijayarangan
APPARATUS AND METHOD FOR AN INSTRUCTION THAT DETERMINES WHETHER A VALUE IS WITHIN A RANGE

Publication number: 20140032877

Abstract: A method is described that includes performing the following with a single instruction: receiving a first input operand V; receiving a second input operand S; calculating V?S; determining if V?S is positive or negative; and, providing as a resultant: V if V?S is negative; V?S if V?S is positive.

Type: Application

Filed: December 23, 2011

Publication date: January 30, 2014

Inventors: Thomas R. Craver, Elmoustapha Ould-Ahmed-Vall
EFFICIENT ZERO-BASED DECOMPRESSION

Publication number: 20130339661

Abstract: A processor core including a hardware decode unit to decode vector instructions for decompressing a run length encoded (RLE) set of source data elements and an execution unit to execute the decoded instructions. The execution unit generates a first mask by comparing set of source data elements with a set of zeros and then counts the trailing zeros in the mask. A second mask is made based on the count of trailing zeros. The execution unit then copies the set of source data elements to a buffer using the second mask and then reads the number of RLE zeros from the set of source data elements. The buffer is shifted and copied to a result and the set of source data elements is shifted to the right. If more valid data elements are in the set of source data elements this is repeated until all valid data is processed.

Type: Application

Filed: December 30, 2011

Publication date: December 19, 2013

Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Kshitij A. Doshi, Charles R. Yount, Bret L. Toll
APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION

Publication number: 20130332701

Abstract: An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register.

Type: Application

Filed: December 23, 2011

Publication date: December 12, 2013

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
Configurable vector length computer processor

Patent number: 8601236

Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.

Type: Grant

Filed: February 29, 2012

Date of Patent: December 3, 2013

Assignee: Cray Inc.

Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
Decomposing Operations in More than One Dimension into One Dimensional Point Operations

Publication number: 20130297908

Abstract: A processing architecture uses stationary operands and opcodes common on a plurality of processors. Only data moves through the processors. The same opcode and operand is used by each processor assigned to operate, for example, on one row of pixels, one row of numbers, or one row of points in space.

Type: Application

Filed: December 30, 2011

Publication date: November 7, 2013

Inventor: Scott A. Krig
APPARATUS AND METHOD OF MASK PERMUTE INSTRUCTIONS

Publication number: 20130290672

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Application

Filed: December 23, 2011

Publication date: October 31, 2013

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Suleyman Sair
Processing vectors using wrapping boolean instructions in the macroscalar architecture

Patent number: 8560815

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a Boolean operation on another input vector dependent upon the input vector and the control vector.

Type: Grant

Filed: September 27, 2012

Date of Patent: October 15, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Dynamics control

Patent number: 8554421

Abstract: A vehicle having a dynamics control system, the vehicle comprising: a first set comprising multiple adjustable sub-systems that affect the performance of the vehicle's powertrain; a second set comprising multiple adjustable sub-systems that affect the vehicle's handling; a dynamics user interface including a first input device and a second input device; and a dynamics controller coupled to the user interface and configured to adjust the operation of the sub-systems of the first set in dependence on the first input device and to adjust the operation of the sub-systems of the second set in dependence on the second input device.

Type: Grant

Filed: September 8, 2010

Date of Patent: October 8, 2013

Assignee: McLaren Automotive Limited

Inventors: Antony Sheriff, Mark Vinnels, Richard Felton
Processing vectors using wrapping minima and maxima instructions in the macroscalar architecture

Patent number: 8555037

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.

Type: Grant

Filed: September 24, 2012

Date of Patent: October 8, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Processing vectors using wrapping shift instructions in the macroscalar architecture

Patent number: 8549265

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector.

Type: Grant

Filed: September 24, 2012

Date of Patent: October 1, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Processing vectors using wrapping multiply and divide instructions in the macroscalar architecture

Patent number: 8539205

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector.

Type: Grant

Filed: September 24, 2012

Date of Patent: September 17, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
Processing vectors using wrapping add and subtract instructions in the macroscalar architecture

Patent number: 8527742

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector.

Type: Grant

Filed: September 11, 2012

Date of Patent: September 3, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Instruction for comparing active vector elements to preceding active elements to determine value differences

Patent number: 8504806

Abstract: The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector.

Type: Grant

Filed: May 31, 2012

Date of Patent: August 6, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Processor executing pack and unpack instructions

Patent number: 8495346

Abstract: A processor. The processor includes a first register for storing a first packed data, a decoder, and a functional unit. The decoder has a control signal input. The control signal input is for receiving a first control signal and a second control signal. The first control signal is for indicating a pack operation. The second control signal is for indicating an unpack operation. The functional unit is coupled to the decoder and the register. The functional unit is for performing the pack operation and the unpack operation using the first packed data. The processor also supports a move operation.

Type: Grant

Filed: April 11, 2012

Date of Patent: July 23, 2013

Assignee: Intel Corporation

Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
Configuring plural cores to perform an instruction having a multi-core characteristic

Patent number: 8495342

Abstract: A processor having multiple cores coordinates functions performed on the cores to automatically, dynamically and repeatedly reconfigure the cores for optimal performance based on characteristics of currently executing software. A core running a thread detects a multi-core characteristic of the thread and assigns one or more other cores to the thread to dynamically combine the cores into what functionally amounts to a common core for more efficient execution of the thread.

Type: Grant

Filed: December 16, 2008

Date of Patent: July 23, 2013

Assignee: International Business Machines Corporation

Inventors: Louis B. Capps, Jr., Michael J. Shapiro, Robert H. Bell, Jr., Thomas E. Cook, William E. Burky
PROCESSOR WITH MULTI-LEVEL LOOPING VECTOR COPROCESSOR

Publication number: 20130185540

Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core includes a program memory interface through which the scalar processor retrieves instructions from a program memory. The instructions include scalar instructions executable by the scalar processor and vector instructions executable by the vector coprocessor core. The vector coprocessor core includes a plurality of execution units and a vector command buffer. The vector command buffer is configured to decode vector instructions passed by the scalar processor core, to determine whether vector instructions defining an instruction loop have been decoded, and to initiate execution of the instruction loop by one or more of the execution units based on a determination that all of the vector instructions of the instruction loop have been decoded.

Type: Application

Filed: July 13, 2012

Publication date: July 18, 2013

Applicant: TEXAS INSTRUMENTS INCORPORATED

Inventors: Ching-Yu HUNG, Shinri INAMORI, Jagadeesh SANKARAN, Peter CHANG
Vector Size Agnostic Single Instruction Multiple Data (SIMD) Processor Architecture

Publication number: 20130159667

Abstract: A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Applicant: MIPS TECHNOLOGIES, INC.

Inventor: Ilie Garbacea
REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS

Publication number: 20130159666

Abstract: Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.

Type: Application

Filed: December 14, 2011

Publication date: June 20, 2013

Applicant: International Business Machines Corporation

Inventors: Maarten J. Boersma, Markus Kaltenbach, Christophe J. Layer, Jens Leenstra, Silvia M. Mueller
PREDECODE LOGIC FOR AUTOVECTORIZING SCALAR INSTRUCTIONS IN AN INSTRUCTION BUFFER

Publication number: 20130159668

Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.

Type: Application

Filed: December 20, 2011

Publication date: June 20, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Running unary operation instructions for processing vectors

Patent number: 8464031

Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.

Type: Grant

Filed: April 26, 2012

Date of Patent: June 11, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Method and apparatus for computing massive spatio-temporal correlations using a hybrid CPU-GPU approach

Patent number: 8464026

Abstract: A CPU may select a variable from a variable set as a dependent variable. The variable set may be part of the data structure that includes a plurality of vector values, a vector value associated with a variable set of n number of variables, and each variable of the variable set having a variable value. The number of dependent variable steps for the dependent variable may be determined. The number of the vector values in a dependent variable step is determined as being number of independent variables. A function is mapped to a plurality of thread processors, and each thread processor is assigned for the function to be performed on each one of the independent variables for each of the dependent variable steps.

Type: Grant

Filed: February 17, 2010

Date of Patent: June 11, 2013

Assignee: International Business Machines Corporation

Inventors: Rajesh Ramkrishna Bordawekar, Ravishankar Rao
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR ASSIGNING ELEMENTS OF A MATRIX TO PROCESSING THREADS WITH INCREASED CONTIGUOUSNESS

Publication number: 20130132707

Abstract: A system, method, and computer program product are provided for assigning elements of a matrix to processing threads. In use, a matrix is received to be processed by a parallel processing architecture. Such parallel processing architecture includes a plurality of processors each capable of processing a plurality of threads. Elements of the matrix are assigned to each of the threads for processing, utilizing an algorithm that increases a contiguousness of the elements being processed by each thread.

Type: Application

Filed: January 15, 2013

Publication date: May 23, 2013

Applicant: NVIDIA CORPORATION

Inventor: NVIDIA CORPORATION
Running-min and running-max instructions for processing vectors using a base value from a key element of an input vector

Patent number: 8417921

Abstract: The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector.

Type: Grant

Filed: August 31, 2010

Date of Patent: April 9, 2013

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
VECTOR WIDTH-AWARE SYNCHRONIZATION-ELISION FOR VECTOR PROCESSORS

Publication number: 20130086566

Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.

Type: Application

Filed: September 29, 2011

Publication date: April 4, 2013

Inventors: Benedict R. Gaster, Lee W. Howes
Macroscalar processor architecture

Patent number: 8412914

Abstract: A method for aggregating a program loop in a Macroscalar architecture includes identifying one or more instructions of the program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation. The method also includes modifying at least one of the one or more instructions to cause a processor executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied.

Type: Grant

Filed: November 17, 2011

Date of Patent: April 2, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Interleaving data accesses issued in response to vector access instructions

Publication number: 20130080737

Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by the elements to the data store, and configured in response to receipt of at least two decoded vector data access instructions, and one of the instructions being a write instruction. Data accesses are performed in the instructed order to determine an element indicating the next data access for each of said vector data access instructions. One of the next data accesses is selected to be issued to the data store in dependence upon an order in which the at least two vector data instructions were received. The position of the elements indicates the next data accesses relative to each other within their respective plurality of elements. A numerical position of the element indicating the next data access within the plurality of elements of an earlier instruction is less than a predetermined value.

Type: Application

Filed: September 28, 2011

Publication date: March 28, 2013

Applicant: ARM Limited

Inventor: Alastair David Reid
VECTORIZATION OF MACHINE LEVEL SCALAR INSTRUCTIONS IN A COMPUTER PROGRAM DURING EXECUTION OF THE COMPUTER PROGRAM

Publication number: 20130067196

Abstract: A method of operating a computer processor includes storing at least one machine level vector instruction in a memory and replacing a plurality of machine level scalar instructions in a computer program with the at least one machine level vector instruction during execution of the computer program based on execution addresses associated with the plurality of machine level scalar instructions and/or instruction opcodes associated with the plurality of machine level scalar instructions.

Type: Application

Filed: September 13, 2011

Publication date: March 14, 2013

Applicant: QUALCOMM Incorporated

Inventors: Gerald Paul Michalak, Charles Dave Estes
VECTOR REGISTER FILE CACHING OF CONTEXT DATA STRUCTURE FOR MAINTAINING STATE DATA IN A MULTITHREADED IMAGE PROCESSING PIPELINE

Publication number: 20130044117

Abstract: Frequently accessed state data used in a multithreaded graphics processing architecture is cached within a vector register file of a processing unit to optimize accesses to the state data and minimize memory bus utilization associated therewith. A processing unit may include a fixed point execution unit as well as a vector floating point execution unit, and a vector register file utilized by the vector floating point execution unit may be used to cache state data used by the fixed point execution unit and transferred as needed into the general purpose registers accessible by the fixed point execution unit, thereby reducing the need to repeatedly retrieve and write back the state data from and to an L1 or lower level cache accessed by the fixed point execution unit.

Type: Application

Filed: August 18, 2011

Publication date: February 21, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
Vector processor with vector register file configured as matrix of data cells each selecting input from generated vector data or data from other cell via predetermined rearrangement path

Patent number: 8375196

Abstract: A data processing apparatus includes a vector register bank having a plurality of vector registers, each register including a plurality of storage cells, each cell storing a data element. A vector processing unit is provided for executing a sequence of vector instructions. The processing unit is arranged to issue a set rearrangement enable signal to the vector register bank. The write interface of the vector register bank is modified to provide not only a first input for receiving the data elements generated by the vector processing unit during normal execution, but also has a second input coupled via a data rearrangement path to the matrix of storage cells via which the data elements currently stored in the matrix of storage cells are provided to the write interface in a rearranged form representing the arrangement of data elements that would be obtained by performance of the predetermined rearrangement operation.

Type: Grant

Filed: January 19, 2010

Date of Patent: February 12, 2013

Assignee: ARM Limited

Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
PROCESSING VECTORS USING WRAPPING MINIMA AND MAXIMA INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130036293

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector.

Type: Application

Filed: September 24, 2012

Publication date: February 7, 2013

Applicant: Apple Inc.

Inventor: Apple Inc.
PROCESSING VECTORS USING WRAPPING INCREMENT AND DECREMENT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024655

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a fixed-value addition operation dependent upon the input vector and the control vector.

Type: Application

Filed: September 27, 2012

Publication date: January 24, 2013

Applicant: APPLE INC.

Inventor: APPLE INC.

prev 1 2 3 4 5 6 next