Masking Patents (Class 712/224)
  • Patent number: 11354124
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: August 3, 2017
    Date of Patent: June 7, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 11347502
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 11275583
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: August 3, 2017
    Date of Patent: March 15, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 11048481
    Abstract: A method of assembling software includes: enabling a user to code a Computation Function (CF) (S101); determining whether the user is done coding the CFs (S102); enabling a user to code a Part Function (PF) using one or more of the available CFs (S103); determining when the user is done creating all the PFs (S104); and enabling a user to create software using one or more of the available PFs.
    Type: Grant
    Filed: March 19, 2020
    Date of Patent: June 29, 2021
    Assignee: ELEMENT SOFTWARE, INC.
    Inventor: Yi Young
  • Patent number: 10943006
    Abstract: A computer-implemented method, non-transitory, computer-readable medium, and computer-implemented system are provided for data transmission in a trusted execution environment (TEE) system. The method is executed by a first thread in multiple threads on a TEE side. The method includes obtaining first data; obtaining a TEE side thread lock; calling a predetermined function by using the first data as an input parameter to switch to a non-TEE side; obtaining a write offset address and a read offset address respectively by reading a first address and a second address; determining whether a quantity of bytes of the first data is less than or equal to a quantity of writable bytes; if so, writing the first data into third addresses starting from the write offset address; updating the write offset address in the first address; returning to the TEE side; and releasing the TEE side thread lock.
    Type: Grant
    Filed: February 7, 2020
    Date of Patent: March 9, 2021
    Assignee: Advanced New Technologies Co., Ltd.
    Inventors: Qi Liu, Boran Zhao, Ying Yan, Changzheng Wei
  • Patent number: 10699015
    Abstract: A computer-implemented method, non-transitory, computer-readable medium, and computer-implemented system are provided for data transmission in a trusted execution environment (TEE) system. The method can be executed by a thread on a TEE side of the TEE system. The method includes obtaining first data; calling a predetermined function using the first data as an input parameter to switch to a non-TEE side; obtaining a write offset address by reading a first address; obtaining a read offset address by reading a second address; determining whether a quantity of bytes of the first data is less than or equal to a quantity of writable bytes; if so, writing the first data into third addresses starting from the write offset address; updating the write offset address in the first address; and returning to the TEE side.
    Type: Grant
    Filed: February 7, 2020
    Date of Patent: June 30, 2020
    Assignee: Alibaba Group Holding Limited
    Inventors: Qi Liu, Boran Zhao, Ying Yan, Changzheng Wei
  • Patent number: 10691454
    Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processor can store a first bitmap and generate a second bitmap with each cell identifying a mask bit. The mask bit is set when 1) a corresponding cell in a first bitmap is not in conflict with other elements in the first bitmap or 2) a corresponding cell is in conflict with one or more other cells in the first bitmap and is a last cell in a sequential order of the first bitmap that conflicts with the one or more other cells, wherein a position of each cell in the second bitmap maps to a same position of the corresponding cell in the first bitmap. The processor can store the second bitmap as a mask for a scatter operation to avoid lane conflicts.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: June 23, 2020
    Assignee: Intel Corporation
    Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10678540
    Abstract: An apparatus and method are provided for efficiently performing arithmetic operations that include at least a multiplication operation. The apparatus comprises processing circuitry to perform data processing operations, and instruction decode circuitry responsive to program instructions to generate control signals to control the processing circuitry to perform the data processing operations. In response to an arithmetic operation with shift instruction specifying performance of an arithmetic operation comprising at least a multiplication operation, and having a field which provides a programmable shift indication, the instruction decode circuitry is configured to control the processing circuitry to perform the arithmetic operation during which an intermediate value is produced, and to select a target portion of the intermediate value based on an output window determined from the programmable shift indication.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: June 9, 2020
    Assignee: Arm Limited
    Inventors: Jacob Eapen, Mbou Eyole, Neil Burgess
  • Patent number: 10599401
    Abstract: A method of assembling software includes: enabling a user to code a Computation Function (CF) (S101); determining whether the user is done coding the CFs (S102); enabling a user to code a Part Function (PF) using one or more of the available CFs (S103); determining when the user is done creating all the PFs (S104); and enabling a user to create software using one or more of the available PFs.
    Type: Grant
    Filed: January 15, 2018
    Date of Patent: March 24, 2020
    Inventor: Yi Young
  • Patent number: 10521271
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: April 1, 2017
    Date of Patent: December 31, 2019
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R Appu, Altug Koker, Balaji Vembu, Joydeep Ray, Kamal Sinha, Prasoonkumar Surti, Kiran C. Veernapu, Subramaniam Maiyuran, Sanjeev S. Jahagirdar, Eric J. Asperheim, Guei-Yuan Lueh, David Puffer, Wenyin Fu, Nikos Kaburlasos, Bhushan M. Borole, Josh B. Mastronarde, Linda L. Hurd, Travis T. Schluessler, Tomasz Janczak, Abhishek Venkatesh, Kai Xiao, Slawomir Grajewski
  • Patent number: 10372455
    Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, Elmoustapha Ould-Ahmed-Vall, Vinodh Gopal
  • Patent number: 10360039
    Abstract: A mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes is disclosed. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.
    Type: Grant
    Filed: September 27, 2010
    Date of Patent: July 23, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Richard Craig Johnson, John R. Nickolls, Robert Steven Glanville
  • Patent number: 10223119
    Abstract: A processor of an aspect includes a decode unit to decode an instruction that indicates a first source packed data operand including a first plurality of data elements, a source mask including a plurality of mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand. The result packed data operand has at least two unmasked result data elements corresponding to unmasked mask elements of the source mask. Each of the unmasked result data elements has a value of a corresponding data element of the first source packed data operand in a same relative position. All masked result data elements, between each nearest pair of unmasked result data elements, have a same value as an unmasked result data element of the pair closest to a first end of the result packed data operand.
    Type: Grant
    Filed: March 28, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10185562
    Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can generate a first bitmap comprising a plurality of bits, where the plurality of bits includes a first bit that represents a first memory location. The processor core can determine that the value of the first bit is equal to the value of a second bit in the first bitmap. The processor core can determine the location of the second bit in relation to the first bit in the first bitmap. The processor core can generate a second bitmap including a third bit indicating that the first bit is the last bit in the first bitmap with the same value as the second bit.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: January 22, 2019
    Assignee: Intel Corporation
    Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10042694
    Abstract: A method for validating control blocks in memory includes monitoring for operations configured to obtain storage space in memory. The method examines the storage space that has been obtained to identify control blocks stored in the storage space. These control blocks are then analyzed to determine whether the control blocks are valid. In certain embodiments, this may be accomplished by comparing the content of the control blocks to information in a validation table that indicates possible values and ranges of values for fields in the control blocks. If a control block is valid, the method records a date and time when the control block was validated. If a control block is not valid, the method generates a message indicating that the control block is not valid. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: September 6, 2016
    Date of Patent: August 7, 2018
    Assignee: International Business Machines Corporation
    Inventors: Philip R. Chauvet, Franklin E. McCune, David C. Reed, Esteban Rios
  • Patent number: 9990202
    Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
    Type: Grant
    Filed: June 28, 2013
    Date of Patent: June 5, 2018
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Ronak Singhal, Buford M. Guy, Mishali Naik
  • Patent number: 9940131
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: April 10, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D Guilford, Gilbert M Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Patent number: 9940130
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: April 10, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D Guilford, Gilbert M Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Patent number: 9935727
    Abstract: One embodiment provides an apparatus for coupling between a trunk passive optical network (PON) and a leaf PON. The apparatus includes a trunk-side optical transceiver coupled to the trunk PON, a leaf-side optical transceiver coupled to the leaf PON, and an integrated circuit chip that includes an optical network unit (ONU) media access control (MAC) module, an optical line terminal (OLT) MAC module, and an on-chip memory.
    Type: Grant
    Filed: January 11, 2017
    Date of Patent: April 3, 2018
    Assignee: TIBIT COMMUNICATIONS, INC.
    Inventor: Edward W. Boyd
  • Patent number: 9916160
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: March 13, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K Feghali, Erdinc Ozturk, Martin G Dixon, Sean Mirkes, Bret L Toll, Maxim Loktyukhin, Mark C Davis, Alexandre J Farcy
  • Patent number: 9904545
    Abstract: According to one general aspect, an apparatus may include a monolithic shifter configured to receive a plurality of bytes of data, and, for each byte of data, a number of bits to shift the respective byte of data, wherein the number of bits for each byte of data need not be the same as for any other byte of data. The monolithic shifter may be configured to shift each byte of data by the respective number of bits. The apparatus may include a mask generator configured to compute a mask for each byte of data, wherein each mask indicates which bits, if any, are to be prevented from being polluted by a neighboring shifted byte of data. The apparatus may include a masking circuit configured to combine the shifted byte of data with a respective mask to create an unpolluted shifted byte of data.
    Type: Grant
    Filed: September 16, 2015
    Date of Patent: February 27, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Eric C. Quinnell
  • Patent number: 9564904
    Abstract: A method of dividing a clock signal by an input signal of N bits with M most significant bits is described herein. The method includes dividing the clock signal by the most significant bits of the input signal 2N-M?1 times out of 2N-M divisions of the clock signal, using a divider. The clock signal is divided by a sum of the most significant bits and the least significant bits one time out of 2N-M divisions of the clock signal, using the divider. The clock signal is also divided by 2N-M, 2N-M times, using the divider.
    Type: Grant
    Filed: April 21, 2015
    Date of Patent: February 7, 2017
    Assignee: STMICROELECTRONICS INTERNATIONAL N.V.
    Inventors: Jeet Narayan Tiwari, Nitin Gupta
  • Patent number: 9396056
    Abstract: In some disclosed embodiments instruction execution logic provides conditional memory fault assist suppression. Some embodiments of processors comprise a decode stage to decode one or more instruction specifying: a set of memory operations, one or more register, and one or more memory address. One or more execution units, responsive to the one or more decoded instruction, generate said one or more memory address for the set of memory operations. Instruction execution logic records one or more fault suppress bits to indicate whether one or more portion of the set of memory operations are masked. Fault generation logic is suppressed from considering a memory fault corresponding to a faulting one of the set of memory operations when said faulting one of the set of memory operations corresponds to a portion of the set of memory operations that is indicated as masked by said one or more fault suppress bits.
    Type: Grant
    Filed: March 15, 2014
    Date of Patent: July 19, 2016
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Offer Levy, Michael Mishaeli, Gal Ofir
  • Patent number: 9154801
    Abstract: A method and apparatus for encoding video data. At least a portion of a two dimensional array of transform coefficients representing a portion of video data is re-ordered to a one dimensional array of data by diagonally scanning the portion in at least two scan lines. Each scan line directed in a single common diagonal direction. Syntax elements representing at least a portion of the one dimensional array of data are then coded and transmitted.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: October 6, 2015
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Vivienne Sze, Madhukar Budagavi
  • Patent number: 9135004
    Abstract: A rotate then operate instruction having a T bit is fetched and executed wherein a first operand in a first register is rotated by an amount and a Boolean operation is performed on a selected portion of the rotated first operand and a second operand in of a second register. If the T bit is ‘0’ the selected portion of the result of the Boolean operation is inserted into corresponding bits of a second operand of a second register. If the T bit is ‘1’, in addition to the inserted bits, the bits other than the selected portion of the rotated first operand are saved in the second register.
    Type: Grant
    Filed: September 12, 2014
    Date of Patent: September 15, 2015
    Assignee: International Business Machines Corporation
    Inventors: Dan F. Greiner, Timothy J. Slegel, Joachim von Buttlar
  • Patent number: 9124644
    Abstract: An egress packet modifier includes a script parser and a pipeline of processing stages. Rather than performing egress modifications using a processor that fetches and decodes and executes instructions in a classic processor fashion, and rather than storing a packet in memory and reading it out and modifying it and writing it back, the packet modifier pipeline processes the packet by passing parts of the packet through the pipeline. A processor identifies particular egress modifications to be performed by placing a script code at the beginning of the packet. The script parser then uses the code to identify a specific script of opcodes, where each opcode defines a modification. As a part passes through a stage, the stage can carry out the modification of such an opcode. As realized using current semiconductor fabrication process, the packet modifier can modify 200M packets/second at a sustained rate of up to 100 gigabits/second.
    Type: Grant
    Filed: July 14, 2013
    Date of Patent: September 1, 2015
    Assignee: NETRONOME SYSTEMS, INC.
    Inventors: Chirag P. Patel, Gavin J. Stark
  • Patent number: 9002122
    Abstract: A codec includes an encoder having a quantization level generator that defines a quantization level specific to a block of values (e.g., transform coefficients), a quantizer that quantizes the block of transform coefficients according to the block-specific quantization level, a run-length encoder, and an entropy encoder. The quantization level is defined to result in at least a predetermined number (k) of quantized coefficients having a predetermined value. The amount of data compression by the encoder is proportional to (k). The codec also includes a decoder having entropy and run-length decoding sections whose throughputs are proportional to (k). The decoder takes advantage of this increased throughput by further decoding coefficients in parallel using a plurality of decoding channels. Methods for encoding and decoding data are also disclosed. The invention is well-suited to quantization, entropy, and/or run-length-based codecs, such as JPEG.
    Type: Grant
    Filed: July 19, 2012
    Date of Patent: April 7, 2015
    Assignee: OmniVision Technologies, Inc.
    Inventor: Xuanming Du
  • Patent number: 8990816
    Abstract: Techniques for simulating exclusive use of a processor core amongst multiple logical partitions (LPARs) include providing hardware thread-dependent status information in response to access requests by the LPARs that is reflective of exclusive use of the processor by the LPAR accessing the hardware thread-dependent information. The information returned in response to the access requests is transformed if the requestor is a program executing at a privilege level lower than the hypervisor privilege level, so that each logical partition views the processor as though it has exclusive use of the processor. The techniques may be implemented by a logical circuit block within the processor core that transforms the hardware thread-specific information to a logical representation of the hardware thread-specific information or the transformation may be performed by program instructions of an interrupt handler that traps access to the physical register containing the information.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: March 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Giles R. Frazier, Bruce Mealy, Naresh Nayar
  • Patent number: 8959316
    Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a vector instruction that optionally receives a predicate vector (which has N elements) as an input. The processor then executes the vector instruction. In the described embodiments, executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor determines element positions for which a fault was masked during a prior operation. The processor then updates elements in the result vector to identify a leftmost element for which a fault was masked.
    Type: Grant
    Filed: October 19, 2010
    Date of Patent: February 17, 2015
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8954941
    Abstract: Method of generating respective instruction compaction schemes for subsets of instructions to be processed by a programmable processor, comprising the steps of a) receiving at least one input code sample representative for software to be executed on the programmable processor, the input code comprising a plurality of instructions defining a first set of instructions (S1), b) initializing a set of removed instructions as empty (S3), c) determining the most compact representation of the first set of instructions (S4) d) comparing the size of said most compact representation with a threshold value (S5), e) carrying out steps e1 to e3 if the size is larger than said threshold value, e1) determining which instruction of the first set of instructions has a highest coding cost (S6), e2) removing said instruction having the highest coding cost from the first set of instructions and (S7), e3) adding said instruction to the set of removed instructions (S8), f) repeating steps b-f, wherein the first set of instructions
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: February 10, 2015
    Assignee: Intel Corporation
    Inventors: Hendrik Tjeerd Joannes Zwartenkot, Alexander Augusteijn, Yuanging Guo, Jürgen Von Oerthel, Jeroen Anton Johan Leijten, Erwan Yann Maurice Le Thenaff
  • Patent number: 8924694
    Abstract: A programmable processor configured to perform one or more packet modifications through execution of one or more commands. A pipelined processor core comprises a first stage configured to selectively shift and mask data in each of a plurality of categories in response to one or more decoded commands, and combine the selectively shifted and masked data in each of the categories. The pipelined processor core further comprises a second stage configured to selectively perform one or more operations on the combined data from the first stage and other data responsive to the one or more decoded commands. In one implementation, the processor is implemented as an application specific integrated circuit (ASIC).
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: December 30, 2014
    Assignee: Extreme Networks, Inc.
    Inventors: David K. Parker, Erik R. Swenson, Christopher J. Young
  • Publication number: 20140365751
    Abstract: A data processing apparatus has at least one processing pipeline having first, second and third pipeline stages. The first pipeline stage detects whether a stream of instructions to be processed includes a predetermined instruction sequence comprising first and second instructions for performing first and second operand generation operations, where the second operand generation operation is dependent on an outcome of the first. In response to detecting this instruction sequence, the first pipeline stage generates a modified stream of instructions in which at least the second instruction is replaced with a third instruction for performing a combined operand generation operation having the same effect as the first and second operand generation operations. As the third instruction can be scheduled independently of the first instruction, processing performance of the pipeline can be improved.
    Type: Application
    Filed: May 9, 2014
    Publication date: December 11, 2014
    Applicant: ARM LIMITED
    Inventors: Ian Michael CAULFIELD, Max BATLEY, Peter Richard GREENHALGH
  • Patent number: 8909905
    Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: December 9, 2014
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
  • Publication number: 20140281396
    Abstract: An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventor: Ashish Jha
  • Patent number: 8838943
    Abstract: A rotate then operate instruction having a T bit is fetched and executed wherein a first operand in a first register is rotated by an amount and a Boolean operation is performed on a selected portion of the rotated first operand and a second operand in of a second register. If the T bit is ‘0’ the selected portion of the result of the Boolean operation is inserted into corresponding bits of a second operand of a second register. If the T bit is ‘1’, in addition to the inserted bits, the bits other than the selected portion of the rotated first operand are saved in the second register.
    Type: Grant
    Filed: July 21, 2010
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dan F. Greiner, Timothy J. Slegel, Joachim von Buttlar
  • Patent number: 8819399
    Abstract: Some embodiments provide a system that executes a native code module. During operation, the system obtains the native code module. Next, the system loads the native code module into a secure runtime environment. Finally, the system safely executes the native code module in the secure runtime environment by using a set of software fault isolation (SFI) mechanisms that use predicated store instructions and predicated control flow instructions, wherein each predicated instruction from the predicated store instructions and the predicated control flow instructions is executed if a mask condition associated with the predicated instruction is met.
    Type: Grant
    Filed: November 20, 2009
    Date of Patent: August 26, 2014
    Assignee: Google Inc.
    Inventors: Robert Muth, Karl Schmipf, David C. Sehr, Clifford L. Biffle
  • Patent number: 8782380
    Abstract: A processor and a method for privilege escalation in a processor are provided. The method may comprise fetching an instruction from a fetch address, where the instruction requires the processor to be in supervisor mode for execution, and determining whether the fetch address is within a predetermined address range. The instruction is filtered through an instruction mask and then it is determined whether the instruction, after being filtered through the mask, equals the value in an instruction value compare register. The processor privilege is raised to supervisor mode for execution of the instruction in response to the fetch address being within the predetermined address range and the filtered instruction equaling the value in the instruction value compare register, wherein the processor privilege is raised to supervisor mode without use of an interrupt. The processor privilege returns to its previous level after execution of the instruction.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: July 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Anthony J. Bybell, Anup Wadia
  • Publication number: 20140189323
    Abstract: An apparatus and method for propagating conditionally evaluated values. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register.
    Type: Application
    Filed: December 23, 2011
    Publication date: July 3, 2014
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Victor W. Lee, Daehyun Kim, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 8762690
    Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: June 24, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Patent number: 8732440
    Abstract: A method for generating a digital signal pattern at M outputs involves retrieving an instruction from memory comprising a first set of bits identifying a first group of N outputs that includes fewer than all of the M outputs, and a second set of N bits each corresponding to a respective output included in the identified first group of N outputs. For each of the M outputs that is included in the identified first group of N outputs, the signal at the output is toggled if the one of the N bits corresponding to that output is in a first state and is kept in the same state if the one of the N bits corresponding to that output is in a second state. For each of the M outputs that is not included in the identified first group of N outputs, the signal at that output is kept in the same state.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: May 20, 2014
    Assignee: Analog Devices, Inc.
    Inventors: Christopher Jacobs, Andreas D. Olofsson, Paul Kettle
  • Patent number: 8725989
    Abstract: In one embodiment, a processor can perform a function call from a main program to a function that is to operate on at least one vector-type operand, in which only scalar values are passed to the function, and input values to the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of a vector register file, and output values from the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of the vector register file. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: May 13, 2014
    Assignee: Intel Corporation
    Inventor: Tomasz Madajczak
  • Patent number: 8719827
    Abstract: A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: May 6, 2014
    Assignee: Panasonic Corporation
    Inventors: Kazushi Kurata, Tetsuya Tanaka, Nobuo Higaki, Kunihiko Hayashi, Hiroshi Kadota, Tokuzo Kiyohara, Kozo Kimura, Hideshi Nishida, Kazuya Furukawa, Shigeki Fujii, Toshio Sugimura
  • Publication number: 20130290687
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Application
    Filed: December 23, 2011
    Publication date: October 31, 2013
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 8549251
    Abstract: In some embodiments, an apparatus includes a register having a first portion and a second portion. The first portion of the register has multiple bits and the second portion of the register has multiple bits. Each bit from the multiple bits of the first portion of the register is associated with a bit from the multiple bits of the second portion of the register such that a bit from the multiple bits of the first portion of the register is set for its associated bit from the multiple bits of the second portion of the register to be written.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: October 1, 2013
    Assignee: Juniper Networks, Inc.
    Inventors: Murali Vemula, Sathish Shenoy
  • Publication number: 20130246760
    Abstract: A data processor comprising a plurality of registers, and instruction execution circuitry having an associated instruction set, wherein the instruction set includes an instruction specifying at least a mask operand, a register operand and an immediate value operand, and the instruction execution circuitry, in response to an instance of the instruction, determines a Boolean value based on the mask operand and sets a respective one of a plurality of registers specified by the register operand of the instance to a value of the immediate value operand if the Boolean value is true. The instruction execution circuitry, in response to the instance of the instruction, may set the respective one of the plurality of registers specified by the register operand of the instance to zero if the Boolean value is false.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Patent number: 8392666
    Abstract: An apparatus detects a load-store collision within a microprocessor between a load operation and an older store operation each of which accesses data in the same cache line. Load and store byte masks specify which bytes contain the data specified by the load and store operation within a word of the cache line in which the load and data begins, respectively. Load and store word masks specify which words contain the data specified by the load and store operations within the cache line, respectively. Combinatorial logic uses the load and store byte masks to detect the load-store collision if the data specified by the load and store operations begin in the same cache line word, and uses the load and store word masks to detect the load-store collision if the data specified by the load and store operations do not begin in the same cache line word.
    Type: Grant
    Filed: October 20, 2009
    Date of Patent: March 5, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: Rodney E. Hooker, Colin Eddy
  • Patent number: 8370608
    Abstract: The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: February 5, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8364938
    Abstract: In the described embodiments, a processor captures a value from an element at a key element position in a second input vector into a base value. The processor then generates a result vector by, if the predicate vector is received, for each element in the result vector to the right of the key element position for which a corresponding element in the predicate vector is active, otherwise, for each element in the result vector to the right of the key element position, setting the element in the result vector equal to a result from an associative Boolean operation or a multiplication operation for which the inputs are the base value and a value in each relevant element of a first input vector from an element at the key element position to and including a predetermined element in the first input vector.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: January 29, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
  • Patent number: 8356164
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: January 15, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8356162
    Abstract: An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms.
    Type: Grant
    Filed: March 18, 2008
    Date of Patent: January 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Adam James Muff, Matthew Ray Tubbs