Patents by Inventor Guansong Zhang
Guansong Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9218186Abstract: A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.Type: GrantFiled: September 1, 2011Date of Patent: December 22, 2015Assignee: International Business Machines CorporationInventors: Raul E. Silvera, Guansong Zhang, Yue Zhao
-
Patent number: 9112625Abstract: A method and apparatus for emulating stream clock signal in asynchronous data transmission. The inventive subject matter proposes a system consisting of a transmitter module, a receiver module, and a link or network in between. A scheme to generate the emulated stream clock across a wide frequency range is also proposed with the property of controllable deviation from the original stream frequency to meet jitter requirement and fast frequency convergence (minimal number of converging steps). The scheme includes an optional first step to derive a frequency estimation of the stream clock and a second step of continuous adjusting the emulated clock frequency to keep the average frequency equals that of the original stream clock.Type: GrantFiled: June 21, 2010Date of Patent: August 18, 2015Inventors: Guansong Zhang, Tsung-Yi Yang, Cathy Zhang
-
Publication number: 20150110125Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, payload data originated by a user process running on a host processor of the computer system is fetched by an interface of the computer system by performing direct virtual memory addressing of a user memory space of a system memory of the computer system on behalf of a network processor of the computer system. The direct virtual memory addressing maps a physical address of the payload data to a virtual address. The payload data is segmented by the network processor across one or more packets.Type: ApplicationFiled: December 12, 2014Publication date: April 23, 2015Applicant: Fortinet, Inc.Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Patent number: 8964785Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a user process of a host processor requests a network driver to store payload data within a system memory. The network driver stores (i) payload buffers each containing therein at least a subset of the payload data and (ii) buffer descriptors each containing therein information indicative of a starting address of a corresponding payload buffer within a user memory space. A network processor transmits onto a network the payload data within multiple transport layer protocol packets by (i) causing a network interface to retrieve the payload data from the payload buffers by performing direct virtual memory addressing of the user memory space using the buffer descriptors and information contained within a translation data structure stored within the system memory; and (ii) segmenting the payload data across the transport layer protocol packets.Type: GrantFiled: March 29, 2013Date of Patent: February 24, 2015Assignee: Fortinet, Inc.Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Patent number: 8527962Abstract: A method for promotion of a child procedure in a software application for a heterogeneous architecture, wherein the heterogeneous architecture comprises a first architecture type and a second architecture type, comprises inserting a parameter representing a parallel frame pointer to a parent procedure of the child procedure into the child procedure; and modifying a reference in the child procedure to a stack variable of the parent procedure to include an indirect access to the parent procedure via the parallel frame pointer.Type: GrantFiled: March 10, 2009Date of Patent: September 3, 2013Assignee: International Business Machines CorporationInventors: Raul Silvera, Ettore Tiotto, Guansong Zhang
-
Patent number: 8411702Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing transport layer protocol segmentation offloading. Multiple buffer descriptors are stored in a system memory of a network device. The buffer descriptors contain information indicative of a starting address of a payload buffer stored in a user memory space of the system memory. The payload buffers contain payload data originated by a user process running on a host processor of the network device. The payload data is retrieved from the payload buffers on behalf of a network processor of the network device without copying the payload data from the user memory space to a kernel memory space of the system memory by performing direct virtual memory addressing of the user memory space. Finally, the payload data is segmented across one or more transport layer protocol packets.Type: GrantFiled: April 28, 2011Date of Patent: April 2, 2013Assignee: Fortinet, Inc.Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Publication number: 20130061000Abstract: A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.Type: ApplicationFiled: September 1, 2011Publication date: March 7, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Raul E. Silvera, Guansong Zhang, Yue Zhao
-
Patent number: 8375375Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.Type: GrantFiled: January 21, 2009Date of Patent: February 12, 2013Assignee: International Business Machines CorporationInventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
-
Patent number: 8341615Abstract: Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization.Type: GrantFiled: July 11, 2008Date of Patent: December 25, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Raul E. Silvera, Amy K. Wang, Guansong Zhang
-
Patent number: 8104030Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.Type: GrantFiled: December 21, 2005Date of Patent: January 24, 2012Assignee: International Business Machines CorporationInventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
-
Publication number: 20110311011Abstract: A method and apparatus for emulating stream clock signal in asynchronous data transmission. The inventive subject matter proposes a system consisting of a transmitter module, a receiver module, and a link or network in between. A scheme to generate the emulated stream clock across a wide frequency range is also proposed with the property of controllable deviation from the original stream frequency to meet jitter requirement and fast frequency convergence (minimal number of converging steps). The scheme includes an optional first step to derive a frequency estimation of the stream clock and a second step of continuous adjusting the emulated clock frequency to keep the average frequency equals that of the original stream clock.Type: ApplicationFiled: June 21, 2010Publication date: December 22, 2011Inventors: Guansong Zhang, Tsung-Yi Yang, Cathy Zhang
-
Publication number: 20110200057Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing transport layer protocol segmentation offloading. Multiple buffer descriptors are stored in a system memory of a network device. The buffer descriptors contain information indicative of a starting address of a payload buffer stored in a user memory space of the system memory. The payload buffers contain payload data originated by a user process running on a host processor of the network device. The payload data is retrieved from the payload buffers on behalf of a network processor of the network device without copying the payload data from the user memory space to a kernel memory space of the system memory by performing direct virtual memory addressing of the user memory space. Finally, the payload data is segmented across one or more transport layer protocol packets.Type: ApplicationFiled: April 28, 2011Publication date: August 18, 2011Applicant: FORTINET, INC.Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Patent number: 7944946Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.Type: GrantFiled: October 21, 2008Date of Patent: May 17, 2011Assignee: Fortinet, Inc.Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Patent number: 7877739Abstract: A computer-implemented method for determining whether an array within a loop can be privatized for that loop is presented. The method calculates the array sections that require first or last privatization and copies only those sections, reducing the privatization overhead of the known solutions.Type: GrantFiled: October 9, 2006Date of Patent: January 25, 2011Assignee: International Business Machines CorporationInventors: Roch G. Archambault, Erik P. Charlebois, Guansong Zhang
-
Publication number: 20100235811Abstract: A method for promotion of a child procedure in a software application for a heterogeneous architecture, wherein the heterogeneous architecture comprises a first architecture type and a second architecture type, comprises inserting a parameter representing a parallel frame pointer to a parent procedure of the child procedure into the child procedure; and modifying a reference in the child procedure to a stack variable of the parent procedure to include an indirect access to the parent procedure via the parallel frame pointer.Type: ApplicationFiled: March 10, 2009Publication date: September 16, 2010Applicant: International Business Machines CorporationInventors: Raul Silvera, Ettore Tiotto, Guansong Zhang
-
Open multi-processing reduction implementation in cell broadband engine (CBE) single source compiler
Patent number: 7689977Abstract: The present disclosure is directed to a method for providing an OpenMP reduction implementation. The method may comprise creating an aggregate of at least one reduction variable in a parallel region or a work-sharing construct; defining a pointer variable, the pointer variable pointing to a dynamic array of the aggregate; creating an initialization routine, an outlined routine and a reduction accumulation routine; replacing the parallel region or the work-sharing construct with a runtime routine, the runtime routine taking a plurality of arguments including an address of the initialization routine, an address of the outlined routine, an address of the reduction accumulation routine, an address of the pointer variable, and a size of the aggregate; and executing the runtime routine when the at least one reduction variable is in the parallel region or the work-sharing construct.Type: GrantFiled: April 15, 2009Date of Patent: March 30, 2010Assignee: International Business Machines CorporationInventors: Guansong Zhang, Shimin Cui, Ettore Tiotto -
Publication number: 20100011339Abstract: Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization.Type: ApplicationFiled: July 11, 2008Publication date: January 14, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Raul E. Silvera, Amy K. Wang, Guansong Zhang
-
Publication number: 20090307363Abstract: Methods and systems are provided for network protocol reassembly acceleration. According to one embodiment, an incoming packet is received at a network interface. Payload data from the packet is written by a memory interface to a physical page within a system memory on behalf of the network interface based on a sequence number associated with the incoming packet and by obtaining a physical address from a virtual memory map corresponding to an incoming session with which the packet is associated. After the physical page is full, the physical page is made accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the user process.Type: ApplicationFiled: October 22, 2008Publication date: December 10, 2009Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Publication number: 20090304029Abstract: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.Type: ApplicationFiled: October 21, 2008Publication date: December 10, 2009Inventors: Xu Zhou, David Chen, Lin Huang, Guansong Zhang
-
Patent number: 7581222Abstract: The present invention provides an approach for barrier synchronization. The barrier has a first array of elements with each element of the first array having an associated process, and a second array of elements with each element of the second array having an associated process. Prior to use, the values or states of the elements in each array may be initialized. As each process finishes its phase and arrives at the barrier, it may update the value or state of its associated element in the first array. Each process may then proceed to spin at its associated element in the second array, waiting for that element to switch. When the values or states of the elements of the first array reach a predetermined value or state, an instruction is sent to all of the elements in the second array to switch their values or states, allowing all processes to leave.Type: GrantFiled: November 20, 2003Date of Patent: August 25, 2009Assignee: International Business Machines CorporationInventors: Robert James Blainey, Guansong Zhang