Of Multiple Instructions Simultaneously Patents (Class 712/206)

Multi-table instruction prefetch unit for microprocessor

Patent number: 11960893

Abstract: A method, programming product, and/or system for prefetching instructions includes an instruction prefetch table that has a plurality of entries, each entry for storing a first portion of an indirect branch instruction address and a target address, wherein the indirect branch instruction has multiple target addresses and the instruction prefetch table is accessed by an index obtained by hashing a second portion of bits of the indirect branch instruction address with an information vector of the indirect branch instruction. A further embodiment includes a first prefetch table for uni-target branch instructions and a second prefetch table for multi-target branch instructions. In operation it is determined whether a branch instruction hits in one of the multiple prefetch tables; a target address for the branch instruction is read from the respective prefetch table in which the branch instruction hit; and the branch instruction is prefetched to an instruction cache.

Type: Grant

Filed: December 29, 2021

Date of Patent: April 16, 2024

Assignee: International Business Machines Corporation

Inventors: Naga P. Gorti, Mohit Karve
Implementing fundamental computational primitives using a matrix multiplication accelerator (MMA)

Patent number: 11960567

Abstract: A method for performing a fundamental computational primitive in a device is provided, where the device includes a processor and a matrix multiplication accelerator (MMA). The method includes configuring a streaming engine in the device to stream data for the fundamental computational primitive from memory, configuring the MMA to format the data, and executing the fundamental computational primitive by the device.

Type: Grant

Filed: July 4, 2021

Date of Patent: April 16, 2024

Assignee: Texas Instruments Incorporated

Inventors: Arthur John Redfern, Timothy David Anderson, Kai Chirca, Chenchi Luo, Zhenhua Yu
Processing data through a storage system in a data pipeline

Patent number: 11860820

Abstract: Processing data through a storage system in a data pipeline including receiving, by the storage system, a dataset from a collector on a data producer, wherein the dataset is disaggregated from metadata for the dataset by the collector; storing the dataset on the storage system; receiving, by the storage system from a data indexer, a request for data from the dataset, wherein the request for the data comprises the metadata gathered by the collector on the data producer; servicing, by the storage system, the request for the data by locating the data using the metadata gathered by the collector on the data producer and received in the request for the data; and receiving, from the data indexer, indexed data indexed using the metadata gathered by the collector on the data producer.

Type: Grant

Filed: April 3, 2019

Date of Patent: January 2, 2024

Assignee: PURE STORAGE, INC.

Inventors: Ivan Jibaja, Curtis Pullen, Stefan Dorsett, Srinivas Chellappa, Prashant Jaikumar
Instruction interrupt suppression of overflow exception

Patent number: 11620153

Abstract: Instruction interrupt suppression for an overflow condition. An instruction is executed, and a determination is made that an overflow condition occurred. Based on a per-instruction overflow interrupt indicator being set to a defined value, interrupt processing for the overflow condition is performed, and based on the per-instruction overflow interrupt indicator being set to another defined value, the interrupt processing for the overflow condition is bypassed.

Type: Grant

Filed: February 4, 2019

Date of Patent: April 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Cedric Lichtenau, Jonathan D. Bradbury, Reid Copeland, Petra Leber
Integrated circuit chip apparatus

Patent number: 11562219

Abstract: An integrated circuit chip apparatus and a processing method performed by an integrated circuit chip apparatus are disclosed. The disclosed integrated circuit chip apparatus and processing method are used for executing a multiplication operation, a convolution operation, or a training operation of a neural network. The present technical solution has the advantages of a reduced computational cost and low power consumption.

Type: Grant

Filed: September 2, 2020

Date of Patent: January 24, 2023

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
Integrated circuit chip apparatus

Patent number: 11562216

Abstract: Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.

Type: Grant

Filed: December 19, 2019

Date of Patent: January 24, 2023

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
System and method for populating multiple instruction words

Patent number: 11403254

Abstract: A methodology for populating multiple instruction words is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first assigning a first instruction node to a first instruction word; identifying a dependent instruction node that is directly dependent upon a result of the first instruction node; first determining whether the dependent instruction node requires any input from two or more sources that are outside of a predefined physical range of each other, the range being smaller than the full extent of the data path; and second assigning, in response to satisfaction of at least one predetermined criteria including a negative result of the first determining, the dependent instruction node to the first instruction word.

Type: Grant

Filed: August 14, 2019

Date of Patent: August 2, 2022

Assignee: TACHYUM LTD.

Inventor: Radoslav Danilak
Efficient thread group scheduling

Patent number: 11360808

Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.

Type: Grant

Filed: April 9, 2017

Date of Patent: June 14, 2022

Assignee: Intel Corporation

Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
Cache preload operations using streaming engine

Patent number: 11307858

Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.

Type: Grant

Filed: March 24, 2020

Date of Patent: April 19, 2022

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
Runtime class recompilation during mutation testing

Patent number: 11237952

Abstract: The present disclosure provides a mutation test manager configured to initialize multiple computing threads configuring a computing host to perform parallel computation; mutate class files within context of each computing thread; recompile mutated class files independently in each respective computing thread to generate heterogeneous mutants; and execute pending unit tests against heterogeneous mutants independently in each respective computing thread. Consequently, the mutation testing process is decoupled from computational bottlenecks which would result from linear, sequential generation, compilation, and testing of each mutation, especially in the context of JVM® programming languages configured to generate class-rich object code.

Type: Grant

Filed: April 7, 2021

Date of Patent: February 1, 2022

Assignee: State Farm Mutual Automobile Insurance Company

Inventors: Andrew L Pearson, Nate Shepherd
Pairing issue queues for complex instructions and instruction fusion

Patent number: 11182164

Abstract: Support for instruction fusion is provided. An indication whether an instruction is a paired instruction is received from an instruction decoder. Based on the indication, one dispatch slot or a paired dispatch slot is allocated in the instruction dispatcher queue. A mapper converts logical addresses of sources and targets of the instruction to physical addresses. Either one issue slot or a paired issue slot is allocated in an issue queue based on the indication from the instruction decoder. The instruction execution environment is loaded into the issue queue and issued to an execution unit.

Type: Grant

Filed: July 23, 2020

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Brian D. Barrick, John B. Griswell, Jr., Dung Q. Nguyen, Brian W. Thompto
Multiplexed resource allocation architecture

Patent number: 11171881

Abstract: A device configured to receive a data set and instructions for processing the data set from a network device. The device is further configured to parse the data set into a plurality of data segments to be processed, and generate a plurality of instruction segments from the received instructions. The device is further configured to assign each instruction segment to a resource unit, and to generate control information with instructions for combining processed data segments from the resource units. The device is further configured to receive processed data segments from the resource units, to generate the processed data set, and to output the processed data set to the network device.

Type: Grant

Filed: January 28, 2021

Date of Patent: November 9, 2021

Assignee: Bank of America Corporation

Inventors: Manu J. Kurian, Sasidhar Purushothaman, Rajesh Narayanan
Processor with a full instruction set decoder and a partial instruction set decoder

Patent number: 11150906

Abstract: An apparatus and method system and method for increasing performance in a processor or other instruction execution device while minimizing energy consumption. A processor includes a first execution pipeline and a second execution pipeline. The first execution pipeline includes a first decode unit and a first execution control unit coupled to the first decode unit. The first execution control unit is configured to control execution of all instructions executable by the processor. The second execution pipeline includes a second decode unit, and a second execution control unit coupled to the second decode unit. The second execution control unit is configured to control execution of a subset of the instructions executable via the first execution control unit.

Type: Grant

Filed: October 7, 2019

Date of Patent: October 19, 2021

Assignee: Texas Instmments Incorporated

Inventors: Christian Wiencke, Shrey Bhatia
Retire queue compression

Patent number: 11144324

Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.

Type: Grant

Filed: September 27, 2019

Date of Patent: October 12, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
Soft watermarking in thread shared resources implemented through thread mediation

Patent number: 11144353

Abstract: Techniques for use in a microprocessor core for soft watermarking in thread shared resources implemented through thread mediation. A thread is removed from a thread mediation decision involving multiple threads competing or requesting to use a shared resource at a current clock cycle based on a number of entries in the shared resource that the thread is estimated to have allocated to it at the current clock cycle. By removing the thread from the thread mediation decision, the thread is stalled from allocating additional entries in the shared resource.

Type: Grant

Filed: September 27, 2019

Date of Patent: October 12, 2021

Assignee: Advanced Micro Devices, Inc.

Inventor: Kai Troester
System, method, and apparatus for enhanced pointer identification and prefetching

Patent number: 11080194

Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.

Type: Grant

Filed: December 27, 2018

Date of Patent: August 3, 2021

Assignee: Intel Corporation

Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
Buffered instruction dispatching to an issue queue

Patent number: 10901744

Abstract: Aspects of the invention include buffered instruction dispatching to an issue queue. A non-limiting example includes dispatching from a dispatch unit of a processor a first group of instructions selected from a first plurality of instructions to a first issue queue partition of the processor in a first cycle. A second group of instructions selected from the first plurality of instructions is passed to an issue queue buffer of the processor in the first cycle. The second group of instructions is passed from the issue queue buffer to the first issue queue partition in a second cycle. A third group of instructions selected from a second plurality of instructions is dispatched to a second issue queue partition in the second cycle.

Type: Grant

Filed: November 30, 2017

Date of Patent: January 26, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mohit S. Karve, Joel A. Silberman, Balaram Sinharoy
Mispredict recovery apparatus and method for branch and fetch pipelines

Patent number: 10846097

Abstract: The present disclosure includes a mispredict recovery apparatus, which may comprise an instruction execution unit, a branch predictor, and a misprediction recovery unit (MRU). The MRU may provide discrete cycle predictions after a misprediction redirect from the instruction execution unit. The MRU may include a branch confidence filter to generate prediction confidence information for predicted branches. The MRU may include a tag content-addressable memory (CAM). The tag CAM may store frequently mispredicting low-confidence branches, probe the misprediction redirect, and obtain the prediction confidence information from the branch confidence filter. The MRU may include a mispredict recovery buffer (MRB) to store an alternate path for frequently mispredicting low-confidence branches present in the tag CAM without storing the instructions themselves. Also disclosed is a method for recovering from mispredicts associated with the instruction fetch pipeline.

Type: Grant

Filed: February 20, 2019

Date of Patent: November 24, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Reshma C. Jumani, Fuzhou Zou, Monika Tkaczyk, Eric C. Quinnell
Method and device for determining branch prediction history for branch prediction by partially combining shifted branch prediction history with branch signature

Patent number: 10698691

Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.

Type: Grant

Filed: August 30, 2016

Date of Patent: June 30, 2020

Assignee: Advanced Micro Devices, Inc.

Inventor: Steven R. Havlir
Lightweight authentication protocol using device tokens

Patent number: 10630682

Abstract: A network protocol provides mutual authentication of network-connected devices that are parties to a communication channel in environments where the amount of memory and processing power available to the network-connected devices is constrained. When a new device is added to a network, the device contacts a registration service and provides authentication information that proves the authenticity of the device. After verifying the authenticity of the device, the registration service generates a token that can be used to by the device to authenticate with other network entities, and provides the token to the device. The registration service publishes the token using a directory service. When the device connects to another network entity, the device provides the token to the other network entity, and the other network entity authenticates the device by verifying the token using the directory service.

Type: Grant

Filed: November 23, 2016

Date of Patent: April 21, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Ramkishore Bhattacharyya, Amit Mhatre, Ashutosh Thakur, Atulya S. Beheray, Rameez Loladia
Electronic apparatus, processor and control method including a compiler scheduling instructions to reduce unused input ports

Patent number: 10606602

Abstract: An electronic apparatus is provided for obtaining compiling data used in an external processor including a function unit including a plurality of input ports. The electronic apparatus includes a storage configured to store a plurality of instructions, and a processor configured to schedule each of the plurality of instructions in a plurality of cycles, assign a plurality of input data corresponding to the plurality of instructions to the plurality of input ports in a corresponding cycle, and if an unassigned input port among the plurality of input ports is present in a first cycle, assign a part of input data corresponding to an instruction scheduled in a second cycle after the first cycle to the unassigned input port in the first cycle, and obtain the compiling data by assigning remaining data of the input data corresponding the instruction to one of the plurality of input ports in the second cycle.

Type: Grant

Filed: July 20, 2017

Date of Patent: March 31, 2020

Assignee: Samsung Electronics Co., Ltd

Inventors: Yeon-bok Lee, Myung-sun Kim, Shin-gyu Kim
Instruction and logic for sorting and retiring stores

Patent number: 10514927

Abstract: A processor includes logic to execute an instruction stream out-of-order. The instruction stream is divided into a plurality of strands and its instructions and those within the streams are ordered by program order (PO). The processor further includes logic to identify an oldest undispatched instruction in the instruction stream and record its associated PO as an executed instruction pointer, identify a most recently committed store instruction in the instruction stream and record its associated PO as a store commitment pointer, a search pointer with PO less than the execution instruction pointer, identify a first set of store instructions in a store buffer with PO less than the search pointer and eligible for commitment, evaluate whether the first set of store instructions is larger than a number of read ports of the store buffer, and adjust the search pointer.

Type: Grant

Filed: March 27, 2014

Date of Patent: December 24, 2019

Assignee: Intel Corporation

Inventors: Anton Lechanka, Andrey Efimov, Sergey Y. Shishlov, Andrey Kluchnikov, Kamil Garifullin, Igor Burovenko, Boris A. Babayan
Load speculation recovery

Patent number: 10514925

Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.

Type: Grant

Filed: January 28, 2016

Date of Patent: December 24, 2019

Assignee: Apple Inc.

Inventor: Sean M. Reynolds
Memory system and operating method thereof

Patent number: 10445017

Abstract: A memory system includes a memory device including a plurality of command registers; and a memory controller configured to determine whether an empty command register exists among the plurality of command registers, and transmit a new command to the memory device, when an empty command register exists, wherein, when the new command is transmitted from the memory controller, the memory device stores the transmitted new command in the empty command register.

Type: Grant

Filed: December 22, 2016

Date of Patent: October 15, 2019

Assignee: SK hynix Inc.

Inventor: Beom Ju Shin
Optimizing thread selection at fetch, select, and commit stages of processor core pipeline

Patent number: 10430342

Abstract: An apparatus includes a buffer configured to store a plurality of instructions previously fetched from a memory, wherein each instruction of the plurality of instructions may be included in a respective thread of a plurality of threads. The apparatus also includes control circuitry configured to select a given thread of the plurality of threads dependent upon a number of instructions in the buffer that are included in the given thread. The control circuitry is also configured to fetch a respective instruction corresponding to the given thread from the memory, and to store the respective instruction in the buffer.

Type: Grant

Filed: November 18, 2015

Date of Patent: October 1, 2019

Assignee: Oracle International Corporation

Inventors: Yuan Chou, Gideon Levinsky, Manish Shah, Robert Golla, Matthew Smittle
Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation

Patent number: 10140138

Abstract: Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.

Type: Grant

Filed: March 17, 2014

Date of Patent: November 27, 2018

Assignee: Intel Corporation

Inventors: Mohammad Abdallah, Ankur Groen, Erika Gunadi, Mandeep Singh, Ravishankar Rao
Expense tracking, electronic ordering, invoice presentment, and payment system and method

Patent number: 10127558

Abstract: Systems and methods for automating an invoice approval process are described herein. Rules are created which are evaluated against a set of attributes. A rules engine is automatically invoked upon receipt of a document in an electronic invoice presentment and payment system. The rules engine determines which rules are applicable to documents received and processed in the system, and applies those applicable rules in a pre-defined sequence.

Type: Grant

Filed: March 12, 2010

Date of Patent: November 13, 2018

Assignee: Altisource S.à r.l.

Inventors: Russell G. Bulman, Suresh Kumar, Sanket Karjagi, Ritwik Bose, Rajesh Kumar, Biswajit Nayak, Vikram Kamath, Bhavana Sumathi
Allowing deletion of a dispatched instruction from an instruction queue when sufficient processor resources are predicted for that instruction

Patent number: 10095518

Abstract: Instruction queue circuitry maintains an instruction queue to store fetched instructions. Instruction decode circuitry decodes instructions dispatched from the queue. The instruction decode circuitry allocates processor resource(s) for use in execution of the decoded instruction. Detection circuitry detect, for an instruction to be dispatched from a given instruction queue, a prediction indicating whether sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry. Dispatch circuitry dispatches an instruction from the queue to the instruction decode circuitry and allows deletion of the dispatched instruction from that instruction queue when the prediction indicates that sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry.

Type: Grant

Filed: November 16, 2015

Date of Patent: October 9, 2018

Assignee: ARM Limited

Inventors: Andrew James Antony Lees, Ian Michael Caulfield, Peter Richard Greenhalgh
Check pointing a shift register using a circular buffer

Patent number: 10025527

Abstract: Hardware structures for check pointing a main shift register one or more times which include a circular buffer used to store the data elements most recently shifted onto the main shift register which has an extra data position for each check point and an extra data position for each restorable point in time; an update history shift register which has a data position for each check point which is used to store information indicating whether the circular buffer was updated in a particular clock cycle; a pointer that identifies a subset of the data positions of the circular buffer as active data positions; and check point generation logic that derives each check point by selecting a subset of the active data positions based on the information stored in the update history shift register.

Type: Grant

Filed: July 8, 2016

Date of Patent: July 17, 2018

Assignee: MIPS Tech, LLC

Inventors: Philip Day, Julian Bailey
Dynamically loading graph-based computations

Patent number: 9753751

Abstract: Processing data includes: receiving units of work that each include one or more work elements, and processing a first unit of work using a first compiled dataflow graph (160) loaded into a data processing system (100) in response to receiving the first unit of work. The processing includes: analysis to determine a characteristic of the first unit of work; identifying one or more compiled dataflow graphs from graphs stored in a data storage system (107) that include at least some that were compiled for processing a unit of work having the determined characteristic; loading one of the identified compiled dataflow graphs into the data processing system (100) as the first compiled dataflow graph (160); and generating one or more output work elements from at least one work element in the first unit of work.

Type: Grant

Filed: October 22, 2014

Date of Patent: September 5, 2017

Assignee: Ab Initio Technology LLC

Inventors: Matthew Darcy Atterbury, H. Mark Bromley, Wayne Mesard, Arkadi Popov, Stephen Schmidt, Craig W. Stanfill, Joseph Skeffington Wholey
Optimizing intermediate representation of script code for fast path execution

Patent number: 9733912

Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for fast path execution. A fast path is typically a path that handles most commonly occurring tasks more efficiently than less commonly occurring ones which are handled by slow paths. The less commonly occurring tasks may include uncommon cases, error handling, and other anomalies. The IR includes checkpoints which evaluate to two possible values resulting in either a fast path or slow path execution. The IR is optimized for fast path execution by regenerating a checkpoint as a labeled checkpoint. The code in the portion of the IR following the checkpoint is optimized assuming the checkpoint evaluates to a value resulting in fast path. The code for handling situations where the checkpoint evaluates to a value resulting in slow path is transferred to a portion of the IR identified by the label.

Type: Grant

Filed: January 27, 2016

Date of Patent: August 15, 2017

Assignee: Facebook, Inc.

Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
Polymorphic heterogeneous multi-core architecture

Patent number: 9690620

Abstract: Methods and architecture for dynamic polymorphic heterogeneous multi-core processor operation are provided. The method for dynamic heterogeneous polymorphic processing includes the steps of receiving a processing task comprising a plurality of serial threads. The method is performed in a processor including a plurality of processing cores, each of the plurality of processing cores being assigned to one of a plurality of core clusters and each of the plurality of core clusters capable of dynamically forming a coalition comprising two or more of its processing cores. The method further includes determining whether each of the plurality of serial threads requires more than one processing core, and sending a go-into-coalition-mode-now instruction to ones of the plurality of core clusters for handling ones of the plurality of serial threads that require more than one processing core.

Type: Grant

Filed: December 3, 2012

Date of Patent: June 27, 2017

Assignee: National University of Singapore

Inventors: Tulika Mitra, Mihai Pricopi
Arithmetic branch fusion

Patent number: 9672037

Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

Type: Grant

Filed: January 23, 2013

Date of Patent: June 6, 2017

Assignee: Apple Inc.

Inventors: Conrado Blasco-Allue, Sandeep Gupta
Testing insecure computing environments using random data sets generated from characterizations of real data sets

Patent number: 9558089

Abstract: The disclosed embodiments provide a system that facilitates testing of an insecure computing environment. During operation, the system obtains a real data set comprising a set of data strings. Next, the system determines a set of frequency distributions associated with the set of data strings. The system then generates a test data set from the real data set, wherein the test data set comprises a set of random data strings that conforms to the set of frequency distributions. Finally, the system tests the insecure computing environment using the test data set.

Type: Grant

Filed: November 12, 2014

Date of Patent: January 31, 2017

Assignee: INTUIT INC.

Inventor: Colin R. Dillard
Method and apparatus for mining test coverage data

Patent number: 9454467

Abstract: A method of mining test coverage data includes: at a device having one or more processors and memory: sequentially processing each of a plurality of coverage data files that is generated by executing the program using a respective test input of a plurality of test inputs, where the processing of each current coverage data file extracts respective execution counter data from the current coverage data file; after processing each current coverage data file, determining whether the respective execution counter data extracted from the current coverage data file includes a predetermined change relative to the respective execution counter data extracted from previously processed coverage data files; and in response to detecting the predetermined change for the current coverage data file, including the respective test input used to generate the current coverage data file in a test input collection for testing the program.

Type: Grant

Filed: August 12, 2014

Date of Patent: September 27, 2016

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor: Yunjia Wu
Data processing apparatus and method for controlling use of an issue queue to represent an instruction suitable for execution by a wide operand execution unit

Patent number: 9424045

Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.

Type: Grant

Filed: January 29, 2013

Date of Patent: August 23, 2016

Assignee: ARM Limited

Inventors: Cedric Denis Robert Airaud, Luca Scalabrino, Frederic Jean Denis Arsanto, Guillaume Schon, Frederic Claude Marie Piry, Albin Pierick Tonnerre
Scalable decode-time instruction sequence optimization of dependent instructions

Patent number: 9354874

Abstract: Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.

Type: Grant

Filed: October 3, 2011

Date of Patent: May 31, 2016

Assignee: International Business Machines Corporation

Inventors: Michael K. Gschwind, Valentina Salapura
Creating SIMD efficient code by transferring register state through common memory

Patent number: 9354892

Abstract: Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement.

Type: Grant

Filed: November 29, 2012

Date of Patent: May 31, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
Using register last use infomation to perform decode-time computer instruction optimization

Patent number: 9286072

Abstract: Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.

Type: Grant

Filed: October 3, 2011

Date of Patent: March 15, 2016

Assignee: International Business Machines Corporation

Inventors: Michael K. Gschwind, Valentina Salapura
Chaining between exposed vector pipelines

Patent number: 9250916

Abstract: Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.

Type: Grant

Filed: March 12, 2013

Date of Patent: February 2, 2016

Assignee: International Business Machines Corporation

Inventors: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair
Detecting network traffic content

Patent number: 9118705

Abstract: A device for detecting network traffic content is provided. The device includes a memory configured for storing one or more signatures, each of the one or more signatures associated with content desired to be detected, and 5 defined by one or more predicates. The device a/so includes a processor configured to receive data associated with network traffic content, execute one or more instructions based on the one or more signatures and the data, and determine whether the network traffic content matches the content desired to be detected.

Type: Grant

Filed: March 12, 2013

Date of Patent: August 25, 2015

Assignee: Fortinet, Inc.

Inventor: Michael Xie
SYSTEM-ON-CHIP (SoC) TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Publication number: 20150100761

Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.

Type: Application

Filed: December 12, 2014

Publication date: April 9, 2015

Applicant: INTEL CORPORATION

Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
PROCESSOR TO PERFORM A BIT RANGE ISOLATION INSTRUCTION

Publication number: 20150100760

Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.

Type: Application

Filed: December 12, 2014

Publication date: April 9, 2015

Applicant: INTEL CORPORATION

Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
INSTRUCTION DEFINITION TO IMPLEMENT LOAD STORE REORDERING AND OPTIMIZATION

Publication number: 20150095615

Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.

Type: Application

Filed: December 11, 2014

Publication date: April 2, 2015

Inventors: Mohammad A. ABDALLAH, Gregory A. WOODS
Control of entry of program instructions to a fetch stage within a processing pipepline

Patent number: 8977815

Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.

Type: Grant

Filed: November 29, 2010

Date of Patent: March 10, 2015

Assignee: ARM Limited

Inventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
Apparatus, method and system for using real-time performance feedback for modeling and improving access to solid state media

Patent number: 8972689

Abstract: A storage processor identifies latency of memory drives for different numbers of concurrent storage operations. The identified latency is used to identify debt limits for the number of concurrent storage operations issued to the memory drives. The storage processor may issue additional storage operations to the memory devices when the number of storage operations is within the debt limit. Storage operations may be deferred when the number of storage operations is outside the debt limit.

Type: Grant

Filed: February 2, 2011

Date of Patent: March 3, 2015

Assignee: Violin Memory, Inc.

Inventor: Erik de la Iglesia
MICROPROCESSOR ACCELERATED CODE OPTIMIZER

Publication number: 20150039859

Abstract: A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.

Type: Application

Filed: November 22, 2011

Publication date: February 5, 2015

Inventor: Mohammad Abdallah
TRACKING LONG GHV IN HIGH PERFORMANCE OUT-OF-ORDER SUPERSCALAR PROCESSORS

Publication number: 20150032997

Abstract: Tracking global history vector in high performance out of order superscalar processors, in one aspect, may comprise providing a shift register storing global history vector that stores branch predictions and outcomes. A counter is maintained to determine a number of bits to shift the shift register to recover branch history. In another aspect, the global history vector may be implemented with a circular buffer structure. Youngest and oldest pointers to the circular buffer are maintained and used in recovery.

Type: Application

Filed: July 23, 2013

Publication date: January 29, 2015

Applicant: International Business Machines Corporation

Inventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Jose E. Moreira, Mauricio J. Serrano
Highly Integrated Scalable, Flexible DSP Megamodule Architecture

Publication number: 20150019840

Abstract: This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.

Type: Application

Filed: July 15, 2014

Publication date: January 15, 2015

Inventors: Timothy D. Anderson, Joseph Zbiciak, Duc Quang Bui, Abnijeet A. Chachad, Kai Chirca, Naveen Bhoria, Matthew D. Pierson, Daniel Wu
APPARATUS AND METHOD FOR COMPRESSING INSTRUCTION FOR VLIW PROCESSOR, AND APPARATUS AND METHOD FOR FETCHING INSTRUCTION

Publication number: 20140317383

Abstract: Provided are an instruction compression apparatus and method for a very long instruction word (VLIW) processor, and an instruction fetching apparatus and method. The instruction compression apparatus includes: an indicator generator configured to generate an indicator code that indicates an issue width of an instruction bundle to be executed in the VLIW processor, and a number of No-Operation (NOP) instruction bundles following the instruction bundle; an instruction compressor configured to compress the instruction bundle by removing at least one of NOP instructions from the instruction bundle and the NOP instruction bundles following the instruction bundle; and an instruction converter configured to include the generated indicator code in the compressed instruction bundle.

Type: Application

Filed: April 22, 2014

Publication date: October 23, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jae-Un PARK, Suk-jin KIM

1 2 3 4 5 … next