Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9996356
    Abstract: Apparatus and method for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
    Type: Grant
    Filed: December 26, 2015
    Date of Patent: June 12, 2018
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Oleg Margulis, Jason M. Agron, Ethan Schuchman, Sebastian Winkel, Youfeng Wu, Gisle Dankel
  • Patent number: 9971703
    Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.
    Type: Grant
    Filed: August 9, 2017
    Date of Patent: May 15, 2018
    Assignee: Intel Corporation
    Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
  • Patent number: 9940229
    Abstract: Technologies for persistent memory programming include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may assign a virtual memory address of a target location in persistent memory to a persistent memory pointer using persistent pointer strategy, and may dereference the pointer using the same strategy. Persistent pointer strategies include off-holder, ID-in-value, optimistic rectification, and pessimistic rectification. The computing device may log changes to persistent memory during the execution of a data consistency section, and commit changes to the persistent memory when the last data consistency section ends. Data consistency sections may be grouped by log group identifier. Using type metadata stored in the nonvolatile region, the computing device may identify the type of a root object within the nonvolatile region and then recursively identify the type of all objects referenced by the root object. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: April 10, 2018
    Assignee: Intel Corporation
    Inventors: Xipeng Shen, Youfeng Wu, Cheng Wang, Hyunchul Park, Hongbo Rong
  • Publication number: 20180074827
    Abstract: A processor for redundant stores includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, a binary translator, and a memory execution unit. The binary translator includes circuitry to identify a first region of the instruction stream including a redundant store, mark a first starting instruction of the first region with a protection designator, mark a first ending instruction of the first region with a clear designator, and store an amended instruction stream with the markings. The memory execution unit includes circuitry to track the first redundant store based on the protection designator and the clear designator to eliminate the first redundant store.
    Type: Application
    Filed: September 14, 2016
    Publication date: March 15, 2018
    Inventors: Vineeth Mekkat, Youfeng Wu, Sebastian Winkel, Oleg Margulis
  • Patent number: 9910650
    Abstract: A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: March 6, 2018
    Assignee: Intel Corporation
    Inventors: Albert Hartono, Nalini Vasudevan, Sara S. Baghsorkhi, Cheng Wang, Youfeng Wu
  • Publication number: 20180060049
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Application
    Filed: June 6, 2017
    Publication date: March 1, 2018
    Inventors: DAVID J. SAGER, RUCHIRA SASANKA, RON GABOR, SHLOMO RAIKIN, JOSEPH NUZMAN, LEEOR PELED, JASON A. DOMER, HO-SEOP KIM, YOUFENG WU, KOICHI YAMADA, TIN-FOOK NGAI, HOWARD H. CHEN, JAYARAM BOBBA, JEFFREY J. COOK, OMAR M. SHAIKH, SURESH SRINIVAS
  • Publication number: 20180018177
    Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
    Type: Application
    Filed: July 18, 2017
    Publication date: January 18, 2018
    Inventors: NALINI VASUDEVAN, CHENG WANG, YOUFENG WU, ALBERT HARTONO, SARA S. BAGHSORKHI
  • Publication number: 20180004524
    Abstract: In an example, there is disclosed an apparatus, including a binary translator (BT) including circuitry to: analyze a code block; determine that an architectural register mapped to a physical register in the physical register file is available for early reclamation; and insert a reclamation hint into the code block. There is also disclosed a processor to reclaim the physical register based at least in part on the reclamation hint.
    Type: Application
    Filed: July 2, 2016
    Publication date: January 4, 2018
    Applicant: Intel Corporation
    Inventors: Vineeth Mekkat, Janghaeng Lee, Youfeng Wu
  • Publication number: 20170351516
    Abstract: A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.
    Type: Application
    Filed: June 7, 2016
    Publication date: December 7, 2017
    Inventors: Vineeth Mekkat, Oleg Margulis, Ching-Tsun Chou, Youfeng Wu
  • Patent number: 9836316
    Abstract: Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: December 5, 2017
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 9830196
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to manage concurrent predicate expressions. An example method discloses inserting a first condition hook into a first thread, the first condition hook associated with a first condition, inserting a second condition hook into a second thread, the second condition hook associated with a second condition, preventing the second thread from executing until the first condition is satisfied, and identifying a concurrency violation when the second condition is satisfied.
    Type: Grant
    Filed: August 24, 2015
    Date of Patent: November 28, 2017
    Assignee: INTEL CORPORATION
    Inventors: Justin E. Gottschlich, Cristiano Ligieri Pereira, Gilles Pokam, Youfeng Wu
  • Publication number: 20170337137
    Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.
    Type: Application
    Filed: August 9, 2017
    Publication date: November 23, 2017
    Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
  • Patent number: 9817644
    Abstract: An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.
    Type: Grant
    Filed: September 28, 2015
    Date of Patent: November 14, 2017
    Assignee: Intel Corporation
    Inventors: Mauricio Breternitz, Jr., Youfeng Wu, Cheng Wang, Edson Borin, Shiliang Hu, Craig B. Zilles
  • Patent number: 9767037
    Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: September 19, 2017
    Assignee: Intel Corporation
    Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
  • Patent number: 9720667
    Abstract: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: August 1, 2017
    Assignee: Intel Corporation
    Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Nalini Vasudevan, Cheng Wang
  • Patent number: 9715388
    Abstract: Logic and instruction to monitor loop trip count are disclosed. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: July 25, 2017
    Assignee: Intel Corporation
    Inventors: Jaewoong Chung, Hyunchul Park, Hongbo Rong, Cheng Wang, Youfeng Wu
  • Patent number: 9715376
    Abstract: A method and apparatus for optimizing parallelized single threaded programs is herein described. Code regions, such as dependency chains, are replicated utilizing any known method, such as dynamic code replication. A flow network associated with a replicated code region is built and a minimum cut algorithm is applied to determine duplicated nodes, which may include a single instruction or a group of instructions, to be removed. The dependency of removed nodes is fulfilled with inserted communication to ensure proper data consistency of the original single-threaded program. As a result, both performance and power consumption is optimized for parallel code sections through removal of expensive workload nodes and replacement with communication between other replicated code regions to be executed in parallel.
    Type: Grant
    Filed: December 29, 2008
    Date of Patent: July 25, 2017
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 9710280
    Abstract: In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: July 18, 2017
    Assignee: Intel Corporation
    Inventors: Jaewoong Chung, Cheng Wang, Youfeng Wu
  • Patent number: 9710391
    Abstract: Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.
    Type: Grant
    Filed: May 3, 2013
    Date of Patent: July 18, 2017
    Assignee: Intel Corporation
    Inventors: Wei Liu, Youfeng Wu, Christopher Wilkerson, Herbert Hum
  • Patent number: 9710279
    Abstract: An apparatus and method for speculative vectorization. For example, one embodiment of a processor comprises: a queue comprising a set of locations for storing addresses associated with vectorized memory access instructions; and execution logic to execute a first vectorized memory access instruction to access the queue and to compare a new address associated with the first vectorized memory access instruction with existing addresses stored within a specified range of locations within the queue to detect whether a conflict exists, the existing addresses having been previously stored responsive to one or more prior vectorized memory access instructions.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: July 18, 2017
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Cheng Wang, Youfeng Wu, Albert Hartono, Sara S. Baghsorkhi