Instruction Decoding (e.g., By Microinstruction, Start Address Generator, Hardwired) Patents (Class 712/208)
-
Patent number: 12132821Abstract: A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words Aj, Bj, Cj, Dj, Ej, Fj, Gj, and Hj that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words Aj+2, Bj+2, Ej+2, and Fj+2, which are to correspond to a round (j+2) of the SM3 hash algorithm.Type: GrantFiled: September 20, 2021Date of Patent: October 29, 2024Assignee: Intel CorporationInventors: Shay Gueron, Vlad Krasnov
-
Patent number: 12118642Abstract: This application provides a graphics rendering method and apparatus. A service starts an application and obtains a rendering instruction sent by the application; and sends the rendering instruction to an electronic device. The electronic device performs graphics rendering according to the rendering instruction, to display an image related to the application. According to the technical solutions provided in this application, the electronic device, instead of the server, can perform graphics rendering according to the rendering instruction, thereby improving picture quality and user experience.Type: GrantFiled: November 1, 2021Date of Patent: October 15, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Pan Zhang, Liang Li, Wei Huang
-
Patent number: 12086596Abstract: Techniques are described for an instruction for a conditional rotate and XOR operation in a single instruction and triple input bitwise logical operations in a single instruction in an instruction set of a computing system.Type: GrantFiled: February 6, 2023Date of Patent: September 10, 2024Assignee: Intel CorporationInventors: Christoph Dobraunig, Santosh Ghosh, Manoj Sastry
-
Patent number: 12067465Abstract: A machine learning network is implemented by executing a computer program of instructions on a machine learning accelerator (MLA) comprising a plurality of interconnected storage elements (SEs) and processing elements (PEs. The instructions are partitioned into blocks, which are retrieved from off-chip memory. The block includes a set of deterministic instructions to be executed by on-chip storage elements and/or processing elements according to a static schedule. The block also includes the number of non-deterministic instructions to be executed prior to executing the set of deterministic instructions in this block. These non-deterministic instructions may be instructions for storage elements to retrieve data from off-chip memory and are contained in one or more prior blocks. The execution of these non-deterministic instructions is counted, for example through the use of tokens.Type: GrantFiled: December 17, 2020Date of Patent: August 20, 2024Assignee: SiMa Technologies, Inc.Inventor: Subba Rao Venkata Kalari
-
Patent number: 12026515Abstract: A data processing apparatus includes detection circuitry that detects a parent instruction and a child instruction from a stream of instructions. The parent instruction references a destination register that is referenced as a source register by the child instruction. Adjustment circuitry then adjusts the child instruction to produce an adjusted child instruction whose behaviour is logically equivalent to a behaviour of executing the parent instruction followed by the child instruction.Type: GrantFiled: October 4, 2022Date of Patent: July 2, 2024Assignee: Arm LimitedInventors: William Elton Burky, Nicholas Andrew Plante, Alexander Cole Shulyak, Joshua David Knebel, Yasuo Ishii
-
Patent number: 11907716Abstract: The present disclosure discloses a method for vector reading-writing, a vector-register system, a device and a medium. When a vector-writing instruction is obtained, by using a vector-register controller, a to-be-written-vector address space is converted into a to-be-written-vector-register-file bit address, and, for a nonstandard vector, by using a nonstandard-vector converting unit, after the nonstandard vector is converted into a to-be-written nonstandard vector, and, subsequently, writing is performed, to realize the saving of vector data of any format. When a vector-reading instruction is obtained, by using the vector-register controller, according to the to-be-read width and the to-be-read length, after the to-be-read-vector address space is converted into a to-be-read-vector-register-file bit address, and, subsequently, reading is performed, to realize the reading of vector data of any format.Type: GrantFiled: April 28, 2022Date of Patent: February 20, 2024Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.Inventors: Lingjun Kong, Zhaochun Pang, Qi Song
-
Patent number: 11783169Abstract: Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.Type: GrantFiled: January 2, 2023Date of Patent: October 10, 2023Assignee: Femtosense, Inc.Inventors: Sam Brian Fok, Alexander Smith Neckar
-
Patent number: 11681786Abstract: Briefly, example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more processing devices to develop compilers and microcode for generation of runtime images for secure execution according to an instruction set architecture (ISA) on a computing device. For example, a co-development of a paired compiler and microcode may obscure how such a paired compiler and microcode are to express program instructions into binary runtime image.Type: GrantFiled: December 7, 2020Date of Patent: June 20, 2023Assignee: Arm LimitedInventor: Andrew Neil Sloss
-
Patent number: 11669273Abstract: A device includes a scoreboard and a processor. The scoreboard includes scoreboard entries configured to store information regarding one or more uncompleted memory access operations. The scoreboard also includes a dependency matrix configured to store dependency information corresponding to the scoreboard entries. The processor is configured to retrieve a first memory access instruction that indicates a first address range of a first memory access operation, and to add an indication of the first memory access instruction to a first scoreboard entry. The processor is further configured to, based on determining that the first address range at least partially overlaps a second address range associated with a second scoreboard entry that corresponds to a second memory access instruction, set an element of the dependency matrix to have a has-dependency value indicating a dependency of the first scoreboard entry on the second scoreboard entry.Type: GrantFiled: February 3, 2021Date of Patent: June 6, 2023Assignee: Qualcomm IncorporatedInventors: Eric Wayne Mahurin, Hitesh Kumar Gupta, Ahmad Radaideh
-
Patent number: 11537523Abstract: Implementations of the disclosure provide systems and methods for receiving, by a processing device, a request for an application image. A sequence of commands associated with the application image and a value of a parameter associated with the sequence of commands is received. Responsive to determining that the sequence of commands has been previously executed with the value of the parameter, the processing device retrieves, from a cache, a result of executing the sequence with the value of the parameter. The application image is built using the first result of executing the sequence.Type: GrantFiled: July 31, 2019Date of Patent: December 27, 2022Assignee: Red Hat, Inc.Inventor: Boaz Shuster
-
Patent number: 11526352Abstract: Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.Type: GrantFiled: July 17, 2020Date of Patent: December 13, 2022Assignee: Intel CorporationInventor: Sergiu D. Ghetie
-
Patent number: 11507378Abstract: In one example, an integrated circuit comprises: a memory configured to store a first mapping between a first opcode and first control information and a second mapping between the first opcode and second control information; a processing engine configured to perform processing operations based on the control information; and a controller configured to: at a first time, provide the first opcode to the memory to, based on the first mapping stored in the memory, fetch the first control information for the processing engine, to enable the processing engine to perform a first processing operation based on the first control information; and at a second time, provide the first opcode to the memory to, based on the second mapping stored in the memory, fetch the second control information for the processing engine, to enable the processing engine to perform a second processing operation based on the second control information.Type: GrantFiled: March 1, 2021Date of Patent: November 22, 2022Assignee: Amazon Technologies, Inc.Inventors: Ron Diamant, Sundeep Amirineni, Mohammad El-Shabani, Sagar Sonar, Kenneth Wayne Patton
-
Patent number: 11463698Abstract: A method of encoding image data, including: frequency-transforming input image data to generate an array of frequency-transformed input image coefficients by a matrix-multiplication process, according to a maximum dynamic range of the transformed data and using transform matrices having a data precision; and selecting the maximum dynamic range and/or the data precision of the transform matrices according to the bit depth of the input image data.Type: GrantFiled: September 13, 2018Date of Patent: October 4, 2022Assignee: Sony CorporationInventors: David Berry, James Alexander Gamei, Nicholas Ian Saunders, Karl James Sharman
-
Patent number: 11429385Abstract: Hardware processors and methods for extended microcode patching through on-die and off-die secure storage are described. In one embodiment, the additional storage resources used for storing micro-operations are section(s) of a cache that are unused at runtime and/or unused by a configuration of a processor. For example, the additional storage resources may be a section of a cache that is used to store context information from a core when the core is transitioned to a power state that shuts off voltage to the core. Non-limiting examples of such sections are one or more sections for: storage of context information for a transition of a thread to idle or off, storage of context information for a transition of a core for a multiple core processor to idle or off, or storage of coherency information for a transition of a cache coherency circuit (e.g., cache box (CBo)) to idle or off.Type: GrantFiled: December 29, 2018Date of Patent: August 30, 2022Assignee: Intel CorporationInventor: Sergiu D. Ghetie
-
Patent number: 11422821Abstract: A system and method for efficiently handling instruction execution ordering. In various embodiments, a processor includes multiple execution lanes, each executing instructions of a particular type, which are not executed by one or more of the other execution lanes. The instruction queue includes one queue for each particular execution lane. Control logic identifies a current youngest age used in allocated entries of the multiple queues, and determines a starting age based on the identified current youngest age and the number of instructions to be issued. Beginning with the determined starting age, ages (in program order) are assigned to a group of instructions being allocated in the multiple queues. Ages of entries in the multiple queues are updated for instructions not being issued based on the number of instructions being issued. Instructions being issued have age differences between them below a threshold.Type: GrantFiled: September 4, 2018Date of Patent: August 23, 2022Assignee: Apple Inc.Inventors: James N. Hardage, Jr., Christopher M. Tsay, Mahesh K. Reddy
-
Patent number: 11397583Abstract: In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.Type: GrantFiled: September 3, 2019Date of Patent: July 26, 2022Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Duc Quang Bui, Joseph Raymond Michael Zbiciak
-
Patent number: 11379229Abstract: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.Type: GrantFiled: August 7, 2020Date of Patent: July 5, 2022Assignee: INTEL CORPORATIONInventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Debbie Marr, Abhijit Davare, Asit Mishra, Steven Burns, Desmond A. Kirkpatrick, Andrey Ayupov, Anton Alexandrovich Sorokin, Eriko Nurvitadhi
-
Patent number: 11347652Abstract: The present disclosure relates to devices and methods for using a banked memory structure with accelerators. The devices and methods may segment and isolate dataflows in datapath and memory of the accelerator. The devices and methods may provide each data channel with its own register memory bank. The devices and methods may use a memory address decoder to place the local variables in the proper memory bank.Type: GrantFiled: November 13, 2020Date of Patent: May 31, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Stephen Sangho Youn, Steven Karl Reinhardt, Hui Geng
-
Patent number: 11327779Abstract: Techniques for facilitating parallelized configuration of multiple virtual machines. The techniques include duplicating commands received from an administrator and controlling the multiple virtual machines with those commands in a parallel manner. Different types of commands are treated differently. More specifically, commands for controlling software executing in the virtual machines are replicated and sent to each virtual machine. By contrast, commands for managing virtual machines themselves are provided to virtualization software like a hypervisor to be executed. Duplication of the commands for controlling software executing in the virtual machines is performed by an input/output multiplexer, which also has the function of combining display output from each of the virtual machines. More specifically, the input/output multiplexer displays a common display output to the administrator, where the common display output is the screen that is shown on each of the virtual machines.Type: GrantFiled: June 26, 2015Date of Patent: May 10, 2022Assignee: VMWARE, INC.Inventors: Jinto Antony, Sudhish P. T., Madhusudhanan Gangadharan
-
Patent number: 11314515Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.Type: GrantFiled: December 23, 2019Date of Patent: April 26, 2022Assignee: Intel CorporationInventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
-
Patent number: 11294685Abstract: Method and systems for creating a sequence of fused instructions. An instruction stream is obtained, and a window of instructions from the instruction stream is examined and one or more groups of instructions that satisfy one or more fusion rules are identified. One or more of the groups of instructions that satisfy the one or more fusion rules are fused and a maximal length data dependence chain in the instruction stream is analyzed by analyzing every node in a dependence graph in a selected window of instructions. Fusion of an instruction group is prevented based on the maximal length data dependence chain.Type: GrantFiled: June 4, 2019Date of Patent: April 5, 2022Assignee: International Business Machines CorporationInventors: Jessica Hui-Chun Tseng, Manoj Kumar, Kattamuri Ekanadham, Jose E. Moreira, Pratap C. Pattnaik
-
Patent number: 11256509Abstract: Embodiments of the present invention include methods, systems, and computer program products for implementing instruction fusion after register rename. A computer-implemented method includes receiving, by a processor, a plurality of instructions at an instruction pipeline. The processor can further performing a register rename within the instruction pipeline in response to the received plurality of instructions. The processor can further determine that two or more of the plurality of instructions can be fused after the register rename. The processor can further fuse the two or more instructions that can be fused based on the determination to create one or more fused instructions. The processor can further perform an execution stage within the instruction pipeline to execute the plurality of instructions, including the one or more fused instructions.Type: GrantFiled: December 7, 2017Date of Patent: February 22, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Joel A. Silberman, Balaram Sinharoy
-
Patent number: 11228696Abstract: To reduce power consumption in an image pickup apparatus that captures a plurality of pieces of image data. An image pickup apparatus includes a signal processing unit and a control unit. The signal processing unit executes, in accordance with a predetermined control signal, either compound-eye processing for synthesizing a plurality of pieces of image data by carrying out signal processing on each of the plurality of pieces of image data or monocular processing for carrying out the signal processing on any one of the plurality of pieces of image data. The control unit supplies the predetermined control signal to the signal processing unit and causes one of the compound-eye processing and the monocular processing to be switched to the other one of the compound-eye processing and the monocular processing, on a basis of a result of a comparison between a measured predetermined physical amount and a predetermined threshold value.Type: GrantFiled: August 7, 2018Date of Patent: January 18, 2022Assignee: Sony Semiconductor Solutions CorporationInventor: Satoshi Sugiyama
-
Patent number: 11216278Abstract: A computer-implemented method for multi-thread processing, the method including: compiling a first plurality of threads using a corresponding first register set for each thread in the first plurality of threads, to obtain a first plurality of corresponding machine instruction codes; and fusing the first plurality of machine instruction codes using first instructions in an instruction set supported by a processing core, to obtain machine instruction code of a fused thread, the machine instruction code of the fused thread including thread portions corresponding to each thread of the first plurality of threads, in which the first instructions include load effective address instructions and control transfer instructions, in which the load effective address instructions and the control transfer instructions are compiled using a second register set, and in which jump operations between thread portions are implemented by the control transfer instructions inserted into the machine instruction code of the fused threadType: GrantFiled: March 2, 2020Date of Patent: January 4, 2022Assignee: Advanced New Technologies Co., Ltd.Inventors: Ling Ma, Wei Zhou, Changhua He
-
Patent number: 11210073Abstract: Translating text encodings of machine learning models to executable code, the method comprising: receiving a text encoding of a machine learning model; generating, based on the text encoding of the machine learning model, compilable code encoding the machine learning model; and generating, based on the compilable code, executable code encoding the machine learning model.Type: GrantFiled: July 29, 2020Date of Patent: December 28, 2021Assignee: SPARKCOGNITION, INC.Inventor: Jarred Capellman
-
Patent number: 11132599Abstract: Processors and methods for neural network processing are provided. A method in a processor including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the MVU, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes decoding instructions including a first type of instruction for processing by only the MVU and a second type of instruction for processing by only one of the multifunction units. The method includes mapping a first instruction for processing by the matrix vector unit or to any one of the first multifunction unit, the second multifunction unit, or the third multifunction unit depending on whether the first instruction is the first type of instruction or the second type of instruction.Type: GrantFiled: June 29, 2017Date of Patent: September 28, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Eric S. Chung, Douglas C. Burger, Jeremy Fowers
-
Patent number: 11132633Abstract: The disclosed embodiments illustrate a method and a system for controlling KPI parameters of a transportation system. The method includes extracting historical commuting characteristics of one or more commuters, from a database server over a communication network. The method further includes generating a predictive model based on the extracted historical commuting characteristics. The method further includes generating a service schedule of one or more transportation services of the transportation system. The service schedule of the one or more transportation services may be generated by use of the generated predictive model, based on defined criteria of the transportation system. The method further includes controlling a KPI parameter of the transportation system to attain a desired KPI parameter of the KPI parameter, based on the generated service schedule, when the one or more transportation services are deployed at one or more time stamps.Type: GrantFiled: December 16, 2016Date of Patent: September 28, 2021Assignee: Conduent Business Services, LLCInventors: Theja Tulabandhula, Asim Anand, Deeksha Sinha
-
Patent number: 11128443Abstract: A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words Aj, Bj, Cj, Dj, Ej, Fj, Gj, and Hj that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words Aj+2, Bj+2, Ej+2, and Fj+2, which are to correspond to a round (j+2) of the SM3 hash algorithm.Type: GrantFiled: November 6, 2020Date of Patent: September 21, 2021Assignee: Intel CorporationInventors: Shay Gueron, Vlad Krasnov
-
Speculative instruction wakeup to tolerate draining delay of memory ordering violation check buffers
Patent number: 11113065Abstract: A technique for speculatively executing load-dependent instructions includes detecting that a memory ordering consistency queue is full for a completed load instruction. The technique also includes storing data loaded by the completed load instruction into a storage location for storing data when the memory ordering consistency queue is full. The technique further includes speculatively executing instructions that are dependent on the completed load instruction. The technique also includes in response to a slot becoming available in the memory ordering consistency queue, replaying the load instruction. The technique further includes in response to receiving loaded data for the replayed load instruction, testing for a data mis-speculation by comparing the loaded data for the replayed load instruction with the data loaded by the completed load instruction that is stored in the storage location.Type: GrantFiled: October 31, 2019Date of Patent: September 7, 2021Assignee: Advanced Micro Devices, Inc.Inventors: John Kalamatianos, Susumu Mashimo, Krishnan V. Ramani, Scott Thomas Bingham -
Patent number: 11113053Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.Type: GrantFiled: September 23, 2019Date of Patent: September 7, 2021Assignee: Intel CorporationInventors: Asit K. Mishra, Edward T. Grochowski, Jonathan D. Pearce, Deborah T. Marr, Ehud Cohen, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal San Adrian, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Milind B. Girkar
-
Patent number: 11080813Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.Type: GrantFiled: September 26, 2019Date of Patent: August 3, 2021Assignee: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
-
Patent number: 11080811Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 16-bit and/or 32 bit floating-point elements.Type: GrantFiled: June 19, 2019Date of Patent: August 3, 2021Assignee: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
-
Patent number: 11075746Abstract: A processor includes a decode unit to decode an SM3 two round state word update instruction. The instruction is to indicate one or more source packed data operands. The source packed data operand(s) are to have eight 32-bit state words Aj, Bj, Cj, Dj, Ej, Fj, Gj, and Hj that are to correspond to a round (j) of an SM3 hash algorithm. The source packed data operand(s) are also to have a set of messages sufficient to evaluate two rounds of the SM3 hash algorithm. An execution unit coupled with the decode unit is operable, in response to the instruction, to store one or more result packed data operands, in one or more destination storage locations. The result packed data operand(s) are to have at least four two-round updated 32-bit state words Aj+2, Bj+2, Ej+2, and Fj+2, which are to correspond to a round (j+2) of the SM3 hash algorithm.Type: GrantFiled: April 13, 2020Date of Patent: July 27, 2021Assignee: Intel CorporationInventors: Shay Gueron, Vlad Krasnov
-
Patent number: 11061682Abstract: The invention relates to a method for processing instructions out-of-order on a processor comprising an arrangement of execution units. The inventive method comprises looking up operand sources in a Register Positioning Table and setting operand input references of the instruction to be issued accordingly, checking for an Execution Unit (EXU) available for receiving a new instruction, and issuing the instruction to the available Execution Unit and entering a reference of the result register addressed by the instruction to be issued to the Execution Unit into the Register Positioning Table (RPT).Type: GrantFiled: December 13, 2015Date of Patent: July 13, 2021Inventor: Martin Vorbach
-
Patent number: 11062065Abstract: A matching method for multiple reaction chambers includes selecting at least one factor, setting an adjustment coefficient for the factor corresponding to each of the reaction chambers, and obtaining an input value of the factor to enter into each reaction chamber based on the target value of the factor and the adjustment coefficient corresponding to each of the reaction chambers. The processing factor has a target value and a real value corresponding to each of the reaction chamber. The adjustment coefficient is based on the real value and the target value of the factor being within a preset accuracy range when the corresponding chamber operates a process.Type: GrantFiled: May 26, 2020Date of Patent: July 13, 2021Assignee: BEIJING NAURA MICROELECTRONICS EQUIPMENT CO., LTD.Inventors: Jihong Zhang, Jinsheng Fu
-
Patent number: 11054890Abstract: Techniques to control power and processing among a plurality of asymmetric processing elements are disclosed. In one embodiment, one or more asymmetric processing elements are power managed to migrate processes or threads among a plurality of processing elements according to the performance and power needs of the system.Type: GrantFiled: July 31, 2013Date of Patent: July 6, 2021Assignee: Intel CorporationInventors: Herbert Hum, Eric Sprangle, Doug Carmean, Rajesh Kumar
-
Patent number: 11055094Abstract: Disclosed embodiments relate to improved heterogeneous CPUID spoofing for remote processors. In one example, a system includes multiple processors, including a first processor including configuration circuitry to enable remote processor identification (ID) spoofing; fetch circuitry to fetch an instruction; decode circuitry to decode the instruction having fields to specify an opcode and a context, the opcode indicating execution circuitry is to: when remote processor ID spoofing is enabled, access a processor ID spoofing data structure storing processor ID information for each of the plurality of processors, and report processor ID information for a processor identified by the context; and, when remote processor ID spoofing is not enabled, report processor ID information for the first processor; and execution circuitry to execute the instruction as per the opcode.Type: GrantFiled: June 26, 2019Date of Patent: July 6, 2021Assignee: Intel CorporationInventors: Toby Opferman, Russell C. Arnold, Vedvyas Shanbhogue, Michael W. Chynoweth
-
Patent number: 11051047Abstract: A better compromise between encoding complexity and achievable rate distortion ratio, and/or to achieve a better rate distortion ratio is achieved by using multitree sub-divisioning not only in order to subdivide a continuous area, namely the sample array, into leaf regions, but using the intermediate regions also to share coding parameters among the corresponding collocated leaf blocks. By this measure, coding procedures performed in tiles—leaf regions—locally, may be associated with coding parameters individually without having to, however, explicitly transmit the whole coding parameters for each leaf region separately. Rather, similarities may effectively exploited by using the multitree subdivision.Type: GrantFiled: October 9, 2020Date of Patent: June 29, 2021Assignee: GE Video Compression, LLCInventors: Philipp Helle, Detlev Marpe, Simon Oudin, Thomas Wiegand
-
Patent number: 11036500Abstract: Processing circuitry performs processing operations specified by program instructions. An instruction decoder decodes an atomic-add-with-carry instruction AADDC to control the processing circuitry to perform an atomic operation of an add of an addend operand value and a data value stored in a memory to generate a result value stored in the memory and a carry value indicative of whether or not the add generated a carry out. The atomic-add-with-carry instructions may be used within systems which accumulate a local sum value prior to a data value being returned into a local cache memory at which time the local sum value is added to the return data value. The atomic-add-with-carry instructions may also be used in embodiments comprising a coalescing tree of respective processing apparatus where the carry out values generated from local sums produced at each node are returned early to higher nodes within the hierarchy thereby releasing them to commence other processing.Type: GrantFiled: October 23, 2019Date of Patent: June 15, 2021Assignee: Arm LimitedInventor: Andreas Due Engh-Halstvedt
-
Patent number: 11005503Abstract: Memory controllers, decoders and methods execute a hybrid decoding scheme. An initial iteration of decoding of a codeword is performed using a bit-flipping (BF) decoder or a min-sum (MS) decoder depending on whether or not an unsatisfied check (USC) count of the codeword is less than a threshold. For this initial iteration, the BF decoder is used when the USC count is less than the threshold, and MS decoder when the USC count is greater than or equal to the threshold. When decoding of the codeword is initially performed with the BF decoder, decoding continues with the BF decoder until a first set of conditions is satisfied or the codeword is successfully decoded. When decoding of the codeword is performed with the MS decoder, decoding continues with the MS decoder until a second set of conditions is satisfied.Type: GrantFiled: March 8, 2019Date of Patent: May 11, 2021Assignee: SK hynix Inc.Inventors: Naveen Kumar, Aman Bhatia, Abhiram Prabhakar, Chenrong Xiong, Fan Zhang
-
Patent number: 10977185Abstract: Initializing a data structure for use in predicting table of contents (TOC) pointer values. A request to load a module is obtained. Based on the loaded module, a pointer value for a reference data structure is determined. The pointer value is stored in a reference data structure tracking structure, and used to access a variable value for a variable of the module.Type: GrantFiled: November 17, 2017Date of Patent: April 13, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10977361Abstract: Systems and methods for controlling privileged operations. The system and method may comprise the steps of: providing a kernel module having a kernel authorization subsystem, the kernel module being loadable to a client computer system and configured to intercept file operations, wherein the kernel authorization subsystem may manage authorization of the one or more file operations; registering a listener for the kernel authorization subsystem; monitoring the file operations for a file access, and calling the registered listener by the kernel authorization subsystem when the kernel authorization subsystem detects the file access; calling a privileged daemon by the kernel module, when identifying the file access; and checking a policy, by the privileged daemon, and determining, based on the policy, whether at least one applied rule is applicable. If the at least one applied rule is applicable, the privileged daemon may initialize a launcher module, which may launch the target application.Type: GrantFiled: May 16, 2017Date of Patent: April 13, 2021Inventor: Andrey Kolishchak
-
Patent number: 10936316Abstract: Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using an instruction decoder that decodes instructions having variable numbers of target operands. In one example of the disclosed technology, a block-based processor core includes an instruction decoder configured to decode target operands for an instruction in an instruction block, the instruction being encoded to allow for a variable number of target operands and a control unit configured to send data for at least one of the decoded target operands for an operation performed by the at least one of the cores. In some examples, the instruction indicates target instructions with a vector encoding. In other examples, a variable length format allows for the indication of one or more targets.Type: GrantFiled: February 2, 2016Date of Patent: March 2, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron L. Smith
-
Patent number: 10931301Abstract: A code decompression engine reads compressed code from a memory containing a series of code parts and a dictionary part. The code parts each have a bit indicating compressed or uncompressed. When the code part is compressed, it has a value indicating the number of segments, followed by the segments, followed by an index into the dictionary part. The decompressed instruction is the dictionary value specified by the index, which is modified by the segments. Each segment describes the modification to the dictionary part specified by the index by a mask type, a mask offset, and a mask.Type: GrantFiled: December 16, 2019Date of Patent: February 23, 2021Assignee: Redpine Signals, Inc.Inventors: Subba Reddy Kallam, Sriram Mudulodu
-
Patent number: 10917269Abstract: An electric system comprising communication link between a signal transmitting end and a signal receiving end, wherein, at the signal transmitting end, a number of data bits are integrated into a low frequency signal to form an integrated signal. Each data bit is transmitted as part of a symbol. Each symbol comprises a predefined number of bits encoding at least one data bit, the state of some of the bits of each symbol being dependent on the state of the low frequency symbol.Type: GrantFiled: January 17, 2017Date of Patent: February 9, 2021Assignee: VACON OYInventors: Petri Ylirinne, Trygve Björkgren, Stefan Strandberg
-
Patent number: 10909259Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure.Type: GrantFiled: September 25, 2018Date of Patent: February 2, 2021Assignee: INTEL CORPORATIONInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney
-
Patent number: 10884930Abstract: A Set Table of Contents (TOC) Register instruction. An instruction to provide a pointer to a reference data structure, such as a TOC, is obtained by a processor and executed. The executing includes determining a value for the pointer to the reference data structure, and storing the value in a location (e.g., a register) specified by the instruction.Type: GrantFiled: November 27, 2017Date of Patent: January 5, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10884736Abstract: An approach is described for a method and apparatus for a low energy programmable vector processing unit for use in processing such as for example neural network backend processing. According to some embodiments, this approach provides a pooling/vector processing unit for performing backend processing that implements a single issue multiple data (SIMD) datapath that performs various backend processing functions using only a single instruction. For instance, the present approach provides an apparatus and method for execution of operations in parallel using a single issued instruction to a plurality of processing cells. In some embodiments, there are multiple groups of processing cells for performing different operations—e.g. pooling, permute, sigmoid/tanh, and element wise operations.Type: GrantFiled: March 15, 2019Date of Patent: January 5, 2021Assignee: Cadence Design Systems, Inc.Inventor: Aamir Alam Farooqui
-
Patent number: 10880580Abstract: A better compromise between encoding complexity and achievable rate distortion ratio, and/or to achieve a better rate distortion ratio is achieved by using multitree sub-divisioning not only in order to subdivide a continuous area, namely the sample array, into leaf regions, but using the intermediate regions also to share coding parameters among the corresponding collocated leaf blocks. By this measure, coding procedures performed in tiles—leaf regions—locally, may be associated with coding parameters individually without having to, however, explicitly transmit the whole coding parameters for each leaf region separately. Rather, similarities may effectively exploited by using the multitree subdivision.Type: GrantFiled: October 25, 2019Date of Patent: December 29, 2020Assignee: GE VIDEO COMPRESSION, LLCInventors: Philipp Helle, Detlev Marpe, Simon Oudin, Thomas Wiegand
-
Patent number: 10880581Abstract: A better compromise between encoding complexity and achievable rate distortion ratio, and/or to achieve a better rate distortion ratio is achieved by using multitree sub-divisioning not only in order to subdivide a continuous area, namely the sample array, into leaf regions, but using the intermediate regions also to share coding parameters among the corresponding collocated leaf blocks. By this measure, coding procedures performed in tiles—leaf regions—locally, may be associated with coding parameters individually without having to, however, explicitly transmit the whole coding parameters for each leaf region separately. Rather, similarities may effectively exploited by using the multitree subdivision.Type: GrantFiled: February 27, 2020Date of Patent: December 29, 2020Assignee: GE VIDEO COMPRESSION, LLCInventors: Philipp Helle, Detlev Marpe, Simon Oudin, Thomas Wiegand