Patents by Inventor Tsung-Yuan Tai

Tsung-Yuan Tai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230114263
    Abstract: Systems, apparatuses and methods may provide for technology that uses centralized hardware to detect a local allocation request associated with a local thread, detect a remote allocation request associated with a remote thread, wherein the remote allocation request bypasses a remote operating system, and process the local allocation request and the remote allocation request with respect to central heap, wherein the central heap is shared by the local thread and the remote thread. The local allocation request and the remote allocation request may include one or more of a first request to allocate a memory block of a specified size, a second request to allocate multiple memory blocks of a same size, a third request to resize a previously allocated memory block, or a fourth request to deallocate the previously allocated memory block.
    Type: Application
    Filed: December 13, 2022
    Publication date: April 13, 2023
    Inventors: Ren Wang, Poonam Shidlyali, Tsung-Yuan Tai
  • Patent number: 11601531
    Abstract: One embodiment provides a network system. The network system includes an application layer to execute one or more networking applications to generate or receive data packets having flow identification (ID) information; and a packet processing layer having profiling circuitry to generate a sketch table indicative of packet flow count data; the sketch table having a plurality of buckets, each bucket includes a first section including a plurality of data fields, each data field of the first section to store flow ID and packet count data, each bucket also having a second section having a plurality of data fields, each data field of the second section to store packet count data.
    Type: Grant
    Filed: December 3, 2019
    Date of Patent: March 7, 2023
    Assignee: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai
  • Patent number: 11500825
    Abstract: Techniques and apparatus for dynamic data access mode processes are described. In one embodiment, for example, an apparatus may a processor, at least one memory coupled to the processor, the at least one memory comprising an indication of a database and instructions, the instructions, when executed by the processor, to cause the processor to determine a database utilization value for a database, perform a comparison of the database utilization value to at least one utilization threshold, and set an active data access mode to one of a low-utilization data access mode or a high-utilization data access mode based on the comparison. Other embodiments are described.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: November 15, 2022
    Assignee: INTEL CORPORATION
    Inventors: Ren Wang, Bruce Richardson, Tsung-Yuan Tai, Yipeng Wang, Pablo De Lara Guarch
  • Publication number: 20220326999
    Abstract: Apparatuses, methods, and systems for dynamic resource allocation based on quality-of-service prediction are disclosed. In embodiments, an apparatus includes quality-of-service prediction circuitry and a resource controller. The quality-of-service prediction circuitry is to make quality-of-service predictions using a model based at least in part on at least one performance counter measurements and at least one quality-of-service measurement. The resource controller is to allocate one or more shared resources based on the quality-of-service predictions and architectural performance counter measurements.
    Type: Application
    Filed: June 29, 2022
    Publication date: October 13, 2022
    Inventors: Drew Penney, Bin Li, Tsung-Yuan Tai, Anna Drewek-Ossowicka, Rameshkumar Illikkal, Andrew J. Herdrich, Jaroslaw Sydir
  • Publication number: 20210406147
    Abstract: An apparatus and method for closed loop dynamic resource allocation.
    Type: Application
    Filed: June 27, 2020
    Publication date: December 30, 2021
    Inventors: BIN LI, REN WANG, KSHITIJ ARUN DOSHI, FRANCESC GUIM BERNAT, YIPENG WANG, RAVISHANKAR IYER, ANDREW HERDRICH, TSUNG-YUAN TAI, ZHU ZHOU, RASIKA SUBRAMANIAN
  • Publication number: 20210110269
    Abstract: Neural network dense layer sparsification and matrix compression is disclosed. An example of an apparatus includes one or more processors; a memory to store data for processing, including data for processing of a deep neural network (DNN) including one or more layers, each layer including a plurality of neurons, the one or more processors to perform one or both of sparsification of one or more layers of the DNN, including selecting a subset of the plurality of neurons of a first layer of the DNN for activation based at least in part on locality sensitive hashing of inputs to the first layer; or compression of a weight or activation matrix of one or more layers of the DNN, including detection of sparsity patterns in a matrix of the first layer of the DNN based at least in part on locality sensitive hashing of patterns in the matrix.
    Type: Application
    Filed: December 21, 2020
    Publication date: April 15, 2021
    Applicant: Intel Corporation
    Inventors: Sameh Gobriel, Jesmin Jahari Tithi, Tsung-Yuan Tai
  • Patent number: 10789176
    Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.
    Type: Grant
    Filed: August 9, 2018
    Date of Patent: September 29, 2020
    Assignee: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai, Cristian Florin Dumitrescu, Xiangyang Guo
  • Patent number: 10719442
    Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: July 21, 2020
    Assignee: Intel Corporation
    Inventors: Ren Wang, Raanan Sade, Yipeng Wang, Tsung-Yuan Tai, Sameh Gobriel
  • Patent number: 10623311
    Abstract: Technologies for distributed table lookup via a distributed router includes an ingress computing node, an intermediate computing node, and an egress computing node. Each computing node of the distributed router includes a forwarding table to store a different set of network routing entries obtained from a routing table of the distributed router. The ingress computing node generates a hash key based on the destination address included in a received network packet. The hash key identifies the intermediate computing node of the distributed router that stores the forwarding table that includes a network routing entry corresponding to the destination address. The ingress computing node forwards the received network packet to the intermediate computing node for routing. The intermediate computing node receives the forwarded network packet, determines a destination address of the network packet, and determines the egress computing node for transmission of the network packet from the distributed router.
    Type: Grant
    Filed: September 27, 2017
    Date of Patent: April 14, 2020
    Assignee: Intel Corporation
    Inventors: Sameh Gobriel, Ren Wang, Christian Maciocco, Tsung-Yuan Tai
  • Publication number: 20200106867
    Abstract: One embodiment provides a network system. The network system includes an application layer to execute one or more networking applications to generate or receive data packets having flow identification (ID) information; and a packet processing layer having profiling circuitry to generate a sketch table indicative of packet flow count data; the sketch table having a plurality of buckets, each bucket includes a first section including a plurality of data fields, each data field of the first section to store flow ID and packet count data, each bucket also having a second section having a plurality of data fields, each data field of the second section to store packet count data.
    Type: Application
    Filed: December 3, 2019
    Publication date: April 2, 2020
    Applicant: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai
  • Publication number: 20200104259
    Abstract: A snapshot prefetcher to perform snapshot prefetching to improve performance of snapshot read operations. An apparatus embodiment includes a snapshot read tracking circuitry to track snapshot read requests made by a first processor core to read a plurality of cache lines, and to detect a snapshot read access stream based on the tracked snapshot read requests. A snapshot prefetch issuing circuitry of the apparatus to issue, based on the detected snapshot read access stream, one or more snapshot prefetch requests, including a first snapshot prefetch request to prefetch data from a first cache line stored in, and owned exclusively by, a first storage location outside the first processor core. The snapshot prefetch issuing circuitry further to store the prefetched data in a second storage location within the first processor core, wherein after the prefetch, exclusive ownership of the first cache line is to remain with the first storage location.
    Type: Application
    Filed: September 28, 2018
    Publication date: April 2, 2020
    Inventors: Ren Wang, Lawrence C. Stewart, Binh Pham, Andrew Herdrich, Venkata Krishnan, Anil Vasudevan, Joseph Nuzman, Tsung-Yuan Tai
  • Publication number: 20200081835
    Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.
    Type: Application
    Filed: September 10, 2018
    Publication date: March 12, 2020
    Inventors: REN WANG, RAANAN SADE, YIPENG WANG, Tsung-Yuan TAI, SAMEH GOBRIEL
  • Patent number: 10567510
    Abstract: Various embodiments are generally directed to techniques for improving the efficiency of exchanging packets between pairs of VMs within a communications server. An apparatus may include a processor component; a network interface to couple the processor component to a network; a virtual switch to analyze contents of at least one packet of a set of packets to be exchanged between endpoint devices through the network and the communications server, and to route the set of packets through one or more virtual servers of multiple virtual servers based on the contents; and a transfer component of a first virtual server of the multiple virtual servers to determine whether to route the set of packets to the virtual switch or to transfer the set of packets to a second virtual server of the multiple virtual servers in a manner that bypasses the virtual switch based on a routing rule.
    Type: Grant
    Filed: October 2, 2017
    Date of Patent: February 18, 2020
    Assignee: INTEL CORPORATION
    Inventors: Mesut A. Ergin, Jr-Shian Tsai, Janet Tseng, Ren Wang, Jun Nakajima, Tsung-Yuan Tai
  • Patent number: 10356012
    Abstract: Various embodiments are generally directed to techniques for improving the efficiency of exchanging packets among multiple VMs within a communications server, and between the communications server and other devices in a communications system. An apparatus may include a virtual switch to analyze contents of at least one packet of a set of packets to be exchanged between endpoint devices through a network, and to correlate the contents to a pathway to extend through one or more of the VMs that are each configured as virtual servers of multiple virtual servers; and an interface control component to select at least one virtual network interface of each of the one or more virtual servers along the pathway to operate in a polling mode, and to select a virtual network interface of at least one virtual server of the multiple virtual servers not along the pathway to operate in a non-polling mode.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: July 16, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander W. Min, Tsung-Yuan Tai, Ren Wang, Mesut A. Ergin, Jr-Shian Tsai
  • Patent number: 10216668
    Abstract: Technologies for a distributed hardware queue manager include a compute device having a processor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: February 26, 2019
    Assignee: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Jr-Shian Tsai, Andrew Herdrich, Tsung-Yuan Tai, Niall McDonnell, Stephen Van Doren, David Sonnier, Debra Bernstein, Hugh Wilkinson, Narender Vangati, Stephen Miller, Gage Eads, Andrew Cunningham, Jonathan Kenny, Bruce Richardson, William Burroughs, Joseph Hasting, An Yan, James Clee, Te Ma, Jerry Pirog, Jamison Whitesell
  • Publication number: 20190042304
    Abstract: Methods, apparatus, systems, and software for architectures and mechanisms to accelerate tuple-space search with integrated GPUs (Graphic Processor Units). One of the architectures employs GPU-side lookup table sorting, under which local and global hit count histograms are maintained for work groups, and sub-tables containing rules for tuple matching are re-sorted based on the relative hit rates of the different sub-tables. Under a second architecture, two levels of parallelism are implemented: packet-level parallelism and lookup table-parallelism. Under a third architecture, dynamic two-level parallel processing with pre-screen is implemented. Adaptive decision making mechanisms are also disclosed to select which architecture is optimal in view of multiple considerations, including application preferences, offered throughput, and available GPU resources.
    Type: Application
    Filed: December 3, 2017
    Publication date: February 7, 2019
    Applicant: Intel Corporation
    Inventors: Ren Wang, Janet Tseng, Jr-Shian Tsai, Tsung-Yuan Tai
  • Publication number: 20190042602
    Abstract: Techniques and apparatus for dynamic data access mode processes are described. In one embodiment, for example, an apparatus may a processor, at least one memory coupled to the processor, the at least one memory comprising an indication of a database and instructions, the instructions, when executed by the processor, to cause the processor to determine a database utilization value for a database, perform a comparison of the database utilization value to at least one utilization threshold, and set an active data access mode to one of a low-utilization data access mode or a high-utilization data access mode based on the comparison. Other embodiments are described.
    Type: Application
    Filed: August 20, 2018
    Publication date: February 7, 2019
    Inventors: Ren Wang, Bruce Richardson, Tsung-Yuan Tai, Yipeng Wang, Pablo De Lara Guarch
  • Publication number: 20190044869
    Abstract: Technologies for classifying network flows using adaptive virtual routing include a network appliance with one or more processors. The network appliance is configured to identify a set of candidate classification algorithms from a plurality of classification algorithm designs to perform a flow classification operation and deploy each of the candidate classification algorithms to a processor. Additionally the network appliance is configured to monitor a performance level of each of the deployed candidate classification algorithms and identify a candidate classification algorithm of the deployed candidate classification algorithms with the highest performance level. The network appliance is further configured to deploy the identified candidate classification algorithm with the highest performance level on each of the one or more processors that are configured to perform the flow classification operation. Other embodiments are described herein.
    Type: Application
    Filed: August 17, 2018
    Publication date: February 7, 2019
    Inventors: Yipeng Wang, Ren Wang, Janet Tseng, Jr-Shian Tsai, Tsung-Yuan Tai
  • Publication number: 20190042471
    Abstract: Technologies for least recently used (LRU) cache replacement include a computing device with a processor with vector instruction support. The computing device retrieves a bucket of an associative cache from memory that includes multiple entries arranged from front to back. The bucket may be a 256-bit array including eight 32-bit entries. For lookups, a matching entry is located at a position in the bucket. The computing device executes a vector permutation processor instruction that moves the matching entry to the front of the bucket while preserving the order of other entries of the bucket. For insertion, an inserted entry is written at the back of the bucket. The computing device executes a vector permutation processor instruction that moves the inserted entry to the front of the bucket while preserving the order of other entries. The permuted bucket is stored to the memory. Other embodiments are described and claimed.
    Type: Application
    Filed: August 9, 2018
    Publication date: February 7, 2019
    Inventors: Ren Wang, Yipeng Wang, Tsung-Yuan Tai, Cristian Florin Dumitrescu, Xiangyang Guo
  • Publication number: 20180270309
    Abstract: Various embodiments are generally directed to techniques for improving the efficiency of exchanging packets between pairs of VMs within a communications server. An apparatus may include a processor component; a network interface to couple the processor component to a network; a virtual switch to analyze contents of at least one packet of a set of packets to be exchanged between endpoint devices through the network and the communications server, and to route the set of packets through one or more virtual servers of multiple virtual servers based on the contents; and a transfer component of a first virtual server of the multiple virtual servers to determine whether to route the set of packets to the virtual switch or to transfer the set of packets to a second virtual server of the multiple virtual servers in a manner that bypasses the virtual switch based on a routing rule.
    Type: Application
    Filed: October 2, 2017
    Publication date: September 20, 2018
    Applicant: INTEL CORPORATION
    Inventors: MESUT A. ERGIN, JR-SHIAN TSAI, JANET TSENG, REN WANG, JUN NAKAJIMA, TSUNG-YUAN TAI