Patents by Inventor Youfeng Wu

Youfeng Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9697040
    Abstract: A system is disclosed that includes a processor and a dynamic random access memory (DRAM). The processor includes a hybrid transactional memory (HyTM) that includes hardware transactional memory (HTM), and a program debugger to replay a program that includes an HTM instruction and that has been executed has been executed using the HyTM. The program debugger includes a software emulator that is to replay the HTM instruction by emulation of the HTM. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: March 26, 2014
    Date of Patent: July 4, 2017
    Assignee: Intel Corporation
    Inventors: Justin E. Gottschlich, Gilles A. Pokam, Shiliang Hu, Rolf Kassa, Youfeng Wu, Irina Calciu
  • Publication number: 20170185404
    Abstract: Embodiments of apparatus and methods for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
    Type: Application
    Filed: December 26, 2015
    Publication date: June 29, 2017
    Inventors: Vineeth Mekkat, Oleg Margulis, Jason M. Agron, Ethan Schuchman, Sebastian Winkel, Youfeng Wu, Gisle Dankel
  • Patent number: 9690552
    Abstract: Technologies for generating composable library functions include a first computing device that includes a library compiler configured to compile a composable library and second computing device that includes an application compiler configured to compose library functions of the composable library based on a plurality of abstractions written at different levels of abstractions. For example, the abstractions may include an algorithm abstraction at a high level, a blocked-algorithm abstraction at medium level, and a region-based code abstraction at a low level. Other embodiments are described and claimed herein.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: June 27, 2017
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Peng Tu, Tatiana Shpeisman, Hai Liu, Todd A. Anderson, Youfeng Wu, Paul M. Petersen, Victor W. Lee, P. G. Lowney, Arch D. Robison, Cheng Wang
  • Patent number: 9690582
    Abstract: A processor includes a decoder to decode an instruction, a scheduler to schedule the instruction, and an execution unit to execute the instruction. The instruction is to load a memory operation applicable to a quantity of addresses into an execution vector. The execution vector includes a plurality of vector positions for respective addressees. The instruction is further to evaluate, for a given address in the execution vector at a vector position, whether a cache indicates that a previous memory operation was performed at a higher vector position than the vector position of the given address. The instruction is also to determine, based on the evaluation whether the cache indicates that the previous memory operation was performed at a higher vector position than the vector position of the given address, whether the memory operation will cause a memory error.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: June 27, 2017
    Assignee: Intel Corporation
    Inventors: Nalini Vasudevan, Youfeng Wu, Cheng Wang, Sara Baghsorkhi, Albert Hartono
  • Patent number: 9672019
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Grant
    Filed: December 25, 2010
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffery J. Cook, Omar M. Shaikh, Suresh Srinivas
  • Patent number: 9588814
    Abstract: The present disclosure is directed to fast approximate conflict detection. A device may comprise, for example, a memory, a processor and a fast conflict detection module (FCDM) to cause the processor to perform fast conflict detection. The FCDM may cause the processor to read a first and second vector from memory, and to then generate summaries based on the first and second vectors. The summaries may be, for example, shortened versions of write and read addresses in the first and second vectors. The FCDM may then cause the processor to distribute the summaries into first and second summary vectors, and may then determine potential conflicts between the first and second vectors by comparing the first and second summary vectors. The summaries may be distributed into the first and second summary vectors in a manner allowing all of the summaries to be compared to each other in one vector comparison transaction.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: March 7, 2017
    Assignee: Intel Corporation
    Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Cheng Wang
  • Patent number: 9542211
    Abstract: In an embodiment, a processor includes at least one core and a dynamic language accelerator to execute a bytecode responsive to a memory mapped input/output (MMIO) operation on a file descriptor associated with the dynamic language accelerator. The processor may block execution of native code while the dynamic language accelerator executes the bytecode. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 26, 2014
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu, Hongbo Rong, Hyunchul Park
  • Publication number: 20160378679
    Abstract: Technologies for persistent memory pointer access include a computing device having a persistent memory including one or more nonvolatile regions. The computing device may load a persistent memory pointer having a static region identifier, a segment identifier, and an offset from the persistent memory. The computing device may map the static region identifier to a dynamic region identifier and determine a virtual memory address of the persistent memory pointer target based on the dynamic region identifier, the segment identifier, and the offset. The computing device may load an in-storage representation of a persistent-export pointer from the persistent memory, map the in-storage representation to a runtime representation, and determine a target address of a persistent external data object based on the runtime representation. The computing device may include a compiler to generate output code including persistent memory pointer and/or persistent-export pointer accesses.
    Type: Application
    Filed: June 26, 2015
    Publication date: December 29, 2016
    Inventors: Marcelo S. Cintra, Cheng Wang, Youfeng Wu, Alexandre Xavier DuChateau
  • Patent number: 9519467
    Abstract: A method and apparatus for efficient and consistent validation/conflict detection in a Software Transactional Memory (STM) system is herein described. A version check barrier is inserted after a load to compare versions of loaded values before and after the load. In addition, a global timestamp (GTS) is utilized to track a latest committed transaction. Each transaction is associated with a local timestamp (LTS) initialized to the GTS value at the start of a transaction. As a transaction commits it updates the GTS to a new value and sets versions of modified locations to the new value. Pending transactions compare versions determined in read barriers to their LTS. If the version is greater than their LTS indicating another transaction has committed after the pending transaction started and initialized the LTS, then the pending transaction validates its read set to maintain efficient and consistent transactional execution.
    Type: Grant
    Filed: September 27, 2011
    Date of Patent: December 13, 2016
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu, Wei-Yu Chen, Bratin Saha, Ali Reza Adl-Tabatabai
  • Patent number: 9501135
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Grant
    Filed: January 31, 2014
    Date of Patent: November 22, 2016
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang
  • Patent number: 9495168
    Abstract: In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 30, 2013
    Date of Patent: November 15, 2016
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Cheng Wang, Hyunchul Park, Youfeng Wu
  • Patent number: 9405547
    Abstract: A system may comprise an optimizer/scheduler to schedule on a set of instructions, compute a data dependence, a checking constraint and/or an anti-checking constraint for the set of scheduled instructions, and allocate alias registers for the set of scheduled instructions based on the data dependence, the checking constraint and/or the anti-checking constraint. In one embodiment, the optimizer is to release unused registers to reduce the alias registers used to protect the scheduled instructions. The optimizer is further to insert a dummy instruction after a fused instruction to break cycles in the checking and anti-checking constraints.
    Type: Grant
    Filed: April 7, 2011
    Date of Patent: August 2, 2016
    Assignee: INTEL CORPORATION
    Inventors: Cheng Wang, Youfeng Wu
  • Publication number: 20160188392
    Abstract: The present disclosure is directed to fast approximate conflict detection. A device may comprise, for example, a memory, a processor and a fast conflict detection module (FCDM) to cause the processor to perform fast conflict detection. The FCDM may cause the processor to read a first and second vector from memory, and to then generate summaries based on the first and second vectors. The summaries may be, for example, shortened versions of write and read addresses in the first and second vectors. The FCDM may then cause the processor to distribute the summaries into first and second summary vectors, and may then determine potential conflicts between the first and second vectors by comparing the first and second summary vectors. The summaries may be distributed into the first and second summary vectors in a manner allowing all of the summaries to be compared to each other in one vector comparison transaction.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: SARA S. BAGHSORKHI, ALBERT HARTONO, YOUFENG WU, CHENG WANG
  • Publication number: 20160188382
    Abstract: Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
    Type: Application
    Filed: December 24, 2014
    Publication date: June 30, 2016
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Robert VALENTINE, Milind B. GIRKAR, Hideki IDO, Youfeng WU, Cheng WANG
  • Publication number: 20160188305
    Abstract: Technologies for generating composable library functions include a first computing device that includes a library compiler configured to compile a composable library and second computing device that includes an application compiler configured to compose library functions of the composable library based on a plurality of abstractions written at different levels of abstractions. For example, the abstractions may include an algorithm abstraction at a high level, a blocked-algorithm abstraction at medium level, and a region-based code abstraction at a low level. Other embodiments are described and claimed herein.
    Type: Application
    Filed: December 27, 2014
    Publication date: June 30, 2016
    Inventors: Hongbo Rong, Peng Tu, Tatiana Shpeisman, Hai Liu, Todd A. Anderson, Youfeng Wu, Arthur N. Glew, Paul M. PetersEn, Victor W. Lee, P.G. Lowney, Arch D. Robinson, Cheng Wang
  • Publication number: 20160179586
    Abstract: Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
    Type: Application
    Filed: December 17, 2014
    Publication date: June 23, 2016
    Inventors: Cheng Wang, Youfeng Wu, Sara S. Baghsorkhi, Albert Hartono, Robert Valentine
  • Publication number: 20160179550
    Abstract: In one embodiment vector conflict detection instructions are disclosed to perform dynamic memory conflict detection within a vectorized iterative scalar operation. The instructions may be performed by a vector processor to generate a partition vector identifying groups of conflict free iterations. The partition vector may be used to generate a write mask for subsequent vector operations.
    Type: Application
    Filed: December 23, 2014
    Publication date: June 23, 2016
    Inventors: CHENG WANG, ALBERT HARTONO, SARA S. BAGHSORKHI, YOUFENG WU
  • Patent number: 9354882
    Abstract: Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions.
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: May 31, 2016
    Assignee: Intel Corporation
    Inventors: Edson Borin, Youfeng Wu
  • Patent number: 9342303
    Abstract: A system and method to enhance execution of architected instructions in a processor uses auxiliary code to optimize execution of base microcode. An execution context of the architected instructions may be profiled to detect potential optimizations, resulting in generation and storage of auxiliary microcode. When the architected instructions are decoded to base microcode for execution, the base microcode may be enhanced or modified using retrieved auxiliary code.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: May 17, 2016
    Assignee: Intel Corporation
    Inventors: James E. Smith, Denis M. Khartikov, Shiliang Hu, Youfeng Wu
  • Publication number: 20160116965
    Abstract: Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
    Type: Application
    Filed: January 2, 2016
    Publication date: April 28, 2016
    Inventors: Youfeng Wu, Shiliang Hu, Edson Borin, Cheng Wang