Vector Processor Operation Patents (Class 712/7)
-
Patent number: 11531549Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.Type: GrantFiled: March 31, 2021Date of Patent: December 20, 2022Assignee: Marvell Asia Pte, Ltd.Inventor: David A. Carlson
-
Patent number: 11455171Abstract: A fast and frugal item-state tracking scoreboard circuit is disclosed. The scoreboard maintains per-item partial states across multiple memory circuits, enabling multiple lookups per clock cycle and multiple state updates per clock cycle. In an embodiment a scoreboard is used to schedule instructions in an out-of-order processor. Each clock cycle the scoreboard indicates the busy state of an instruction's registers and may update the busy state of the destination registers of issuing instructions and completing instructions. Applications include register tracking, function-unit tracking, and cache-line state tracking, in embodiments including processor cores (including superscalar, superpipelined, and multithreaded processors), accelerators, memory systems, and networks. In an embodiment, a register-busy scoreboard circuit is implemented using FPGA LUT RAM memory.Type: GrantFiled: May 29, 2020Date of Patent: September 27, 2022Assignee: Gray Research LLCInventor: Jan Stephen Gray
-
Patent number: 11442713Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes memory, and first processor circuitry to execute first instructions to at least identify one or more optimizations to convert a first loop into a second loop based on converting PRMRs of the first loop into loop-invariant PRMRs, the converting of the PRMRs in response to a quantity of the PRMRs satisfying a threshold, the second loop to execute in a single iteration corresponding to a quantity of iterations of the first loop, determine one or more optimization parameters based on the one or more optimizations, and compile second instructions based on the first processor circuitry processing the first loop based on the one or more optimization parameters associated with the one or more optimizations, the second instructions to be executed by the first or second processor circuitry.Type: GrantFiled: October 19, 2020Date of Patent: September 13, 2022Assignee: Intel CorporationInventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
-
Patent number: 11367160Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.Type: GrantFiled: August 2, 2018Date of Patent: June 21, 2022Assignee: NVIDIA CORPORATIONInventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
-
Patent number: 11354126Abstract: Data processing apparatus comprises vector processing circuitry to selectively apply vector processing operations defined by vector processing instructions to generate one or more data elements of a data vector comprising a plurality of data elements at respective data element positions of the data vector, according to the state of respective predicate flags associated with the positions of the data vector; and generator circuitry to generate instruction sample data indicative of processing activities of the vector processing circuitry for selected ones of the vector processing instructions, instruction sample data indicating at least the state of the predicate flags at execution of the selected vector processing instructions.Type: GrantFiled: February 15, 2019Date of Patent: June 7, 2022Assignee: Arm LimitedInventors: Michael John Williams, Nigel John Stephens
-
Patent number: 11219426Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.Type: GrantFiled: July 18, 2018Date of Patent: January 11, 2022Assignees: Tumor Hospital of Shandong First Medical University (Shandong Cancer Hospital and InstituteInventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
-
Patent number: 11141127Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.Type: GrantFiled: July 18, 2018Date of Patent: October 12, 2021Assignees: Tumor Hospital of Shandong First Medical UniversityInventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
-
Patent number: 11126430Abstract: A vector processor includes a grouping memory functional unit coupled to grouping memory having multiple bins. The vector processor also includes a bitformatting functional unit that performs bit-level data arrangements using any suitable technique or network, such as a Benes network. The vector processor receives and reads an input vector of data that includes portions (e.g., bits) of multiple data streams, and writes each portion corresponding to a respective data stream to a respective bin in parallel using the bitformatting functional unit to align the data. The vector processor also or alternatively receives and reads multiple outgoing data streams, writes portions of the data streams in respective bins of the grouping memory, and intersperses the portions in an outgoing vector of data in parallel, using the bitformatting functional unit to align the data.Type: GrantFiled: December 27, 2019Date of Patent: September 21, 2021Assignee: Intel CorporationInventors: Parakalan Venkataraghavan, Thomas W. Smith, Silpa Naidu Chirumavilla, Ravi Shekhar
-
Patent number: 11042378Abstract: Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inacType: GrantFiled: July 28, 2016Date of Patent: June 22, 2021Assignee: ARM LimitedInventors: Nigel John Stephens, Mbou Eyole, Alejandro Martinez Vicente
-
Patent number: 11003449Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.Type: GrantFiled: January 24, 2019Date of Patent: May 11, 2021Assignee: Samsung Electronics Co., Ltd.Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
-
Patent number: 10990396Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.Type: GrantFiled: September 27, 2018Date of Patent: April 27, 2021Assignee: Intel CorporationInventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
-
Patent number: 10942847Abstract: Technologies for efficiently performing scatter-gather operations include a device with circuitry configured to associate, with a template identifier, a set of non-contiguous memory locations of a memory having a cross point architecture. The circuitry is additionally configured to access, in response to a request that identifies the non-contiguous memory locations by the template identifier, the memory locations.Type: GrantFiled: December 18, 2018Date of Patent: March 9, 2021Assignee: Intel CorporationInventors: Jawad B. Khan, Richard Coulson
-
Patent number: 10936315Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.Type: GrantFiled: December 31, 2018Date of Patent: March 2, 2021Assignee: Texas Instruments IncorporatedInventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak
-
Patent number: 10929133Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.Type: GrantFiled: January 16, 2019Date of Patent: February 23, 2021Assignee: Intel CorporationInventors: Mikhail Plotnikov, Igor Ermolaev
-
Patent number: 10896042Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.Type: GrantFiled: December 3, 2018Date of Patent: January 19, 2021Assignee: Micron Technology, Inc.Inventor: Sanjay Tiwari
-
Patent number: 10877925Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.Type: GrantFiled: March 18, 2019Date of Patent: December 29, 2020Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach
-
Patent number: 10853043Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes an optimizer including an optimization scenario manager to generate an optimization plan associated with a loop and corresponding optimization parameters, the optimization plan including a set of one or more optimizations, an optimization scenario analyzer to identify the optimization plan as a candidate optimization plan when a quantity of PRMRs included in the loop is greater than a threshold, and a parameter calculator to determine the optimization parameters based on the candidate optimization plan, and a code generator to generate instructions to be executed by a processor, the instructions based on processing the loop with the one or more optimizations included in the candidate optimization plan.Type: GrantFiled: September 11, 2018Date of Patent: December 1, 2020Assignee: INTEL CORPORATIONInventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
-
Patent number: 10817291Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.Type: GrantFiled: March 30, 2019Date of Patent: October 27, 2020Assignee: Intel CorporationInventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
-
Patent number: 10684860Abstract: This invention provides a high performance processor system and a method based on a common general purpose unit, it may be configured into a variety of different processor architectures; before the processor executes instructions, the instruction is filled into the instruction read buffer, which is directly accessed by the processor core, then instruction read buffer actively provides instructions to processor core to execute, achieving a high cache hit rate.Type: GrantFiled: July 6, 2018Date of Patent: June 16, 2020Assignee: SHANGHAI XINHAO MICROELECTRONICS CO. LTD.Inventor: Kenneth Chenghao Lin
-
Patent number: 10671583Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.Type: GrantFiled: August 24, 2017Date of Patent: June 2, 2020Assignee: Oracle International CorporationInventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
-
Patent number: 10572409Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.Type: GrantFiled: May 10, 2018Date of Patent: February 25, 2020Assignee: XILINX, INC.Inventors: Jindrich Zejda, Ling Liu, Yifei Zhou, Ashish Sirasao
-
Patent number: 10564964Abstract: Systems and methods are provided for executing an instruction. The method may include loading a first vector into a first location, the first vector including a plurality of first data elements and loading a second vector into a second location, the second vector including a plurality of second data elements. The method may further include comparing the plurality of first data elements of the first vector to the plurality of data elements of the second vector and performing one or more operations on the plurality of first and second data elements based on at least one vector cross-compare instruction. The one or more operations include counting a number of data elements of the plurality of first and second data elements that satisfy at least one condition, counting a number of times specified values occur in the plurality of first and second data elements, and generating sequence counts for duplicated values.Type: GrantFiled: August 23, 2016Date of Patent: February 18, 2020Assignee: International Business Machines CorporationInventors: Jeffrey H. Derby, Robert K. Montoye, Dheeraj Sreedhar
-
Patent number: 10459843Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. An element duplication unit optionally duplicates data element an instruction specified number of times. A vector masking unit limits data elements received from the element duplication unit to least significant bits within an instruction specified vector length. If the vector length is less than a stream head register size, the vector masking unit stores all 0's in excess lanes of the stream head register (group duplication disabled) or stores duplicate copies of the least significant bits in excess lanes of the stream head register.Type: GrantFiled: December 30, 2016Date of Patent: October 29, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventor: Joseph Zbiciak
-
Patent number: 10394891Abstract: A novel distributed graph database is provided that is designed for efficient graph data storage and processing on modern computing architectures. In particular a single node graph database and a runtime & communication layer allows for composing a distributed graph database from multiple single node instances.Type: GrantFiled: August 5, 2016Date of Patent: August 27, 2019Assignee: International Business Machines CorporationInventors: Chun-Fu Chen, Jason L. Crawford, Ching-Yung Lin, Jie Lu, Mark R. Nutter, Toyotaro Suzumura, Ilie G. Tanase, Danny L. Yeh
-
Patent number: 10235398Abstract: An object of the present invention is to efficiently perform a data load process or a data store process between a memory and a storage unit in a processor. The processor includes: a plurality of storage units associated with a plurality of data elements included in a data set; and a control unit that reads the plurality of data elements stored in adjacent storage areas from a memory, in which a plurality of the data sets is stored, collectively for respective data sets, sorts the respective read data elements to a storage unit corresponding to the data element among the plurality of storage units, and writes the data elements to the respective data sets.Type: GrantFiled: April 23, 2015Date of Patent: March 19, 2019Assignee: Renesas Electronics CorporationInventor: Masayuki Kimura
-
Patent number: 10026458Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.Type: GrantFiled: October 21, 2010Date of Patent: July 17, 2018Assignee: Micron Technology, Inc.Inventor: David Resnick
-
Patent number: 9766858Abstract: A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.Type: GrantFiled: December 24, 2014Date of Patent: September 19, 2017Assignee: ARM LimitedInventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds
-
Patent number: 9696994Abstract: A data processing apparatus includes a comparison unit configured to perform an element comparison process performing a comparison of a first data element at a first index in the first vector with a second data element at a second index in the second vector. A hazard vector generation unit is configured to populate a hazard vector at an index determined by the first index with a value determined by the second index. The comparison unit performs the element comparison process by iteratively comparing data elements of the first vector with each element of a subset of the second vector. It then determines the subset of the second vector as those data elements at indices in the second vector which are less than a current index of the first vector and which are greater than previously determined values of the second index for which the match condition was true.Type: GrantFiled: December 23, 2011Date of Patent: July 4, 2017Assignee: ARM LIMITEDInventor: Alastair David Reid
-
Patent number: 9411842Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.Type: GrantFiled: August 1, 2013Date of Patent: August 9, 2016Assignee: Oracle International CorporationInventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
-
Patent number: 9335997Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping rotate previous operation dependent upon the input vectors.Type: GrantFiled: September 28, 2012Date of Patent: May 10, 2016Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 9218182Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.Type: GrantFiled: June 29, 2012Date of Patent: December 22, 2015Assignee: Intel CorporationInventors: Igor Ermolaev, Elmoustapha Ould-Ahmed-Vall, Bret Toll, Jesus Corbal, Andrey Naraikin
-
Patent number: 9210480Abstract: A video processing system includes a video encoder that encodes a video stream into an independent video layer stream and a first dependent video layer stream based on a motion vector data or grayscale and color data.Type: GrantFiled: December 20, 2007Date of Patent: December 8, 2015Assignee: BROADCOM CORPORATIONInventors: Stephen E. Gordon, Sherman Chen, Michael Dove, David Rosmann, Thomas J. Quigley, Jeyhan Karaoguz
-
Patent number: 9164767Abstract: In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur.Type: GrantFiled: December 13, 2012Date of Patent: October 20, 2015Assignee: SOCIONEXT INC.Inventor: Takashi Nishikawa
-
Patent number: 9160338Abstract: A method for implementing an adaptive interface between at least one FPGA with at least one FPGA application and at least one I/O module, which are designed as the corresponding sender side or receiver side, for connection to the FPGA, whereby a serial interface is formed between the at least one FPGA and the at least one I/O module, comprising the steps of configuring a maximum number of registers to be transmitted for each FPGA application, configuring a shared, fixed register width for all registers, setting an enable signal on the sender side for the registers to be transmitted out of the maximum number of registers to be transmitted, transmitting the enable signal from the sender side to the receiver side, and transmitting the registers, for which the enable signal is set, from the sender side to the receiver side.Type: GrantFiled: May 12, 2014Date of Patent: October 13, 2015Assignee: dSPACE digital signal processing and control engineering GmbHInventors: Dirk Hasse, Robert Polnau
-
Patent number: 9092257Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.Type: GrantFiled: March 11, 2013Date of Patent: July 28, 2015Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 9092256Abstract: A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction.Type: GrantFiled: December 6, 2012Date of Patent: July 28, 2015Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 9047094Abstract: According to embodiments of the invention, there is disclosed a computer processor architecture; and in particular a computer processor, a method of operating the same, and a computer program product that makes use of an instruction set for the computer. In one embodiment according to the invention, there is provided a computer processor, the processor comprising: a decode unit for decoding instruction packets fetched from a memory holding a sequence of instruction packets; and first and second processing channels, each channel comprising a plurality of functional units, wherein the first processing channel is capable of performing control operations and comprises a control register file having a relatively narrower bit width, and the second processing channel is capable of performing data processing operations at least one input of which is a vector and comprises a data register file having a relatively wider bit width.Type: GrantFiled: March 31, 2004Date of Patent: June 2, 2015Assignee: Icera Inc.Inventor: Simon Knowles
-
Publication number: 20150149744Abstract: A data processing apparatus and method are provided for processing execution threads, where each execution thread specifies at least one instruction. The data processing apparatus has a vector processing unit providing a plurality M of lanes of parallel processing, within each lane the vector processing unit being configured to perform a processing operation on a data element input to that lane for each of one or more input operands. A vector instruction is received that is specified by a group of the execution threads, that vector instruction identifying an associated processing operation and also providing an indication of the data elements of each input operand that are to be subjected to that associated processing operation. Vector merge circuitry then determines, based on that information, a required number of lanes of parallel processing for performing the associated processing operation.Type: ApplicationFiled: October 2, 2014Publication date: May 28, 2015Inventor: Ronny PEDERSEN
-
Publication number: 20150143078Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision filter vector processing operations with reduced sample re-fetching and power consumption are disclosed. Related vector processor systems and methods are also disclosed. The VPEs are configured to provide filter vector processing operations. To minimize re-fetching of input vector data samples from memory to reduce power consumption, a tapped-delay line(s) is included in the data flow paths between a vector data file and execution units in the VPE. The tapped-delay line(s) is configured to receive and provide input vector data sample sets to execution units for performing filter vector processing operations. The tapped-delay line(s) is also configured to shift the input vector data sample set for filter delay taps and provide the shifted input vector data sample set to execution units, so the shifted input vector data sample set does not have to be re-fetched during filter vector processing operations.Type: ApplicationFiled: November 15, 2013Publication date: May 21, 2015Applicant: QUALCOMM IncorporatedInventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
-
Publication number: 20150143077Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.Type: ApplicationFiled: November 15, 2013Publication date: May 21, 2015Applicant: QUALCOMM IncorporatedInventor: Raheel Khan
-
Publication number: 20150143080Abstract: Elements from a second operand are added together one-by-one to obtain a first result. The adding includes performing one or more end around carry add operations. The first result is placed in an element of a first operand of the instruction. After each addition of an element, a carry out of a chosen position of the sum, if any, is added to a selected position in an element of the first operand.Type: ApplicationFiled: December 9, 2014Publication date: May 21, 2015Inventors: Jonathan D. Bradbury, Eric M. Schwarz
-
Publication number: 20150143076Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.Type: ApplicationFiled: November 15, 2013Publication date: May 21, 2015Applicant: QUALCOMM IncorporatedInventor: Raheel Khan
-
Publication number: 20150143079Abstract: Vector processing engines (VPEs) employing a tapped-delay line(s) for providing precision correlation/covariance vector processing operations with reduced sample re-fetching and/or power consumption are disclosed. The VPEs disclosed herein are configured to provide correlation/covariance vector processing operations, such as code division multiple access (CDMA) correlation/covariance vector processing operations as a non-limiting example. A tapped-delay line(s) is included in the data flow paths between memory and execution units in the VPE. The tapped-delay line (s) is configured to receive and provide an input vector data sample set to execution units for performing correlation/covariance vector processing operations.Type: ApplicationFiled: November 15, 2013Publication date: May 21, 2015Applicant: QUALCOMM IncorporatedInventors: Raheel Khan, Fahad Ali Mujahid, Afshin Shiravi
-
Publication number: 20150121036Abstract: A data processing system 2 includes a single instruction multiple data register file 12 and single instruction multiple processing circuitry 14. The single instruction multiple data processing circuitry 14 supports execution of cryptographic processing instructions for performing parts of a hash algorithm. The operands are stored within the single instruction multiple data register file 12. The cryptographic support instructions do not follow normal lane-based processing and generate output operands in which the different portions of the output operand depend upon multiple different elements within the input operand.Type: ApplicationFiled: December 30, 2014Publication date: April 30, 2015Inventors: Matthew James HORSNELL, Richard Roy GRISENTHWAITE, Stuart David BILES, Daniel KERSHAW
-
Publication number: 20150113246Abstract: Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location.Type: ApplicationFiled: November 25, 2011Publication date: April 23, 2015Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Robert Valentine, Bret L. Toll, Mark J. Charney
-
Patent number: 9009447Abstract: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.Type: GrantFiled: July 18, 2011Date of Patent: April 14, 2015Assignee: Oracle International CorporationInventor: Darryl J. Gove
-
Publication number: 20150100755Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.Type: ApplicationFiled: August 18, 2014Publication date: April 9, 2015Inventors: Alastair David REID, Daniel KERSHAW
-
Patent number: 8996845Abstract: A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.Type: GrantFiled: December 22, 2009Date of Patent: March 31, 2015Assignee: Intel CorporationInventors: Ravi Rajwar, Andrew T. Forsyth
-
Publication number: 20150089190Abstract: In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker.Type: ApplicationFiled: September 24, 2013Publication date: March 26, 2015Applicant: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20150089191Abstract: In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved.Type: ApplicationFiled: September 24, 2013Publication date: March 26, 2015Applicant: Apple Inc.Inventor: Jeffry E. Gonion