NETWORK SERVER SYSTEMS, ARCHITECTURES, COMPONENTS AND RELATED METHODS

Info

Publication number: 20170237672
Type: Application
Filed: Dec 30, 2016
Publication Date: Aug 17, 2017
Inventor: Parin Bhadrik Dalal (Milpitas, CA)
Application Number: 15/396,318

Abstract

A server system can include a switching tier configured to receive network packets and enable network connections between a plurality of servers and a middle tier comprising first servers of the plurality of servers. Each first server can include at least one host processor, at least one network interface device, and at least one hardware accelerator module physically mounted in the first server. Each hardware accelerator module can include at least one field programmable gate array (FPGA) device coupled to receive network packet data from the switching tier over a first data path, and coupled to the at least one host processor by a second data path, each hardware accelerator module configurable to execute network packet data processing tasks independent from the at least one host processor of the server.

Description

Description

PRIORITY CLAIMS

This application is a continuation of U.S. patent application Ser. No. 13/900,318 filed May 22, 2013, which claims the benefit of U.S. Provisional Patent Application Nos. 61/650,373 filed May 22, 2012, 61/753,892 filed on Jan. 17, 2013, 61/753,895 filed on Jan. 17, 2013, 61/753,899 filed on Jan. 17, 2013, 61/753,901 filed on Jan. 17, 2013, 61/753,903 filed on Jan. 17, 2013, 61/753,904 filed on Jan. 17, 2013, 61/753,906 filed on Jan. 17, 2013, 61/753,907 filed on Jan. 17, 2013, 61/753,910 filed on Jan. 17, 2013, and is a continuation of U.S. patent application Ser. No. 15/283,287 filed Sep. 30, 2016, which is a continuation of International Application no. PCT/US2015/023730, filed Mar. 31, 2015, which claims the benefit of U.S. Provisional Patent Application No. 61/973,205 filed Mar. 31, 2014, and a continuation of International Application no. PCT/US2015/023746, filed Mar. 31, 2015, which claims the benefit of U.S. Provisional Patent Application Nos. 61/973,207 filed Mar. 31, 2014 and 61/976,471 filed Apr. 7, 2014. The contents of all of these applications are incorporated by reference herein.

TECHNICAL FIELD

The present disclosure relates generally to network server systems, and more particularly to systems having servers with hardware accelerator components that can operate independently of server host processors, thus forming a hardware acceleration plane.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a server system according to an embodiment.

FIG. 2 is a block diagram of a hardware accelerated server that can be included in embodiments.

FIG. 3 is a block diagram of a hardware accelerator module that can be included in embodiments.

FIG. 4 is a block diagram of a server system according to another embodiment.

FIG. 5 is a block diagram of a hardware accelerated server that can be included in embodiments.

FIG. 6 is a diagram of a server system according to embodiments.

FIG. 7 is a diagram of a server system according to embodiments.

FIG. 8 is a diagram showing one particular hardware accelerator module that can be included in embodiments.

FIG. 9 is a diagram showing one particular hardware accelerated server that can be included in embodiments.

DETAILED DESCRIPTION

Embodiments disclosed herein include server systems having servers equipped with hardware accelerator modules. Hardware accelerator modules can form a mid-plane and accelerate the processing of network packet data independent of any host processors on the servers. Network packet processing can include, but is not limited to, classifying packets, encrypting packets and/or decrypting packets. Hardware accelerator modules can be attached to a bus in a server, and can include one or more programmable logic devices, such as field programmable gate array (FPGA) devices.

Embodiments can also include a server system having servers interconnected to one another by network connections, where each server includes a host processor, a network interface device, and a hardware accelerator module. One or more hardware accelerator modules can be mounted in each server, and can include one or more programmable logic devices (e.g., FPGAs). The hardware accelerator modules can form a hardware acceleration plane for processing network packet data independent of the host processors. Further, network packet data can be transmitted between hardware acceleration modules independent of the host processors.

FIG. 1 shows a server system 100 according to an embodiment. A server system 100 can include servers equipped with hardware accelerator modules that can process network packet data received by the system 100. A server system 100 can be organized into groups of servers 126-0/1. In some embodiments, groups of servers 126-0/1 can be a physical organization of servers, such as racks in which the server components are mounted. However, in other embodiments such a grouping can be a logical grouping.

A server group 126-0 can include a switching tier 102, a mid-tier 104, and one or more server tiers 110. A switching tier 102 can provide network connections between various components of the system 100. In a particular embodiment, a switching tier 102 can be formed by a top-of-rack (TOR) switch device.

A mid-tier 104 can be formed by a number of hardware accelerator modules, which are described in more detail below. In some embodiments, a mid-tier 104 can be conceptualized, architecturally, as being placed near a top-of-rack. A mid-tier 104 can perform any number of packet processing tasks, as will also be described in more detail below. A server tier 110 can include server components (apart from the hardware accelerator modules), including host processors.

A system 100 can include various data communication paths for interconnecting the various tiers 102/104/110. Such communication paths can include intra-group switch/server connections 131-0, which can provide connections between a switching tier 102 and server tier(s) 110 of the same group 126-0; inter-group switch/server connections 131-1, which can provide connections between a switching tier 102 and server tier of different groups 126-0/1; intra-group switch/module connections 133-0, which can provide connections between a switching tier 102 and hardware accelerator modules of the same group 126-0; inter-group switch/module connections 133-1, which can provide connections between a switching tier 102 and hardware accelerator modules of different groups 126-0/1; intra-group module/server connections 135-0, which can provide connections between hardware accelerator modules and server components of a same group 126-0; and inter-group module/server connections 135-1, which can provide connections between hardware accelerator modules and server components of different groups 126-0/1.

FIG. 2 is a diagram of a server that can be included in embodiments, including the embodiment shown in FIG. 1. A server 206 can include one or more host processors 214 and one or more hardware accelerator modules 208. A server 206 can receive network packet data over first data path 209 from a network data packet source 212, which can be a TOR switch in the embodiment shown.

A hardware accelerator module 208 can be connected to a host processor 214 by a second data path 211. In some embodiments, a second data path 208 can include a bus formed on the server 208. In particular embodiments, second data path 208 can be a memory mapped bus.

A hardware accelerator module 208 can enable network data processing tasks to be completely offloaded for execution by the hardware accelerator module 208. In this way, a hardware accelerator module 208 can receive and process network packet data independent of host processor 214.

FIG. 3 is a block diagram of a hardware accelerator module 308 that can be included in any of the embodiments shown herein. A hardware accelerator module 308 can include one or more programmable logic devices 316 that can be connected to random access memory (RAM) 318. In particular embodiments, a programmable logic device 316 can be a field programmable gate array (FPGA) device (e.g., an FPGA integrated circuit (IC)), and RAM 318 can include be one or more dynamic RAM (DRAM) ICs. An FPGA 316 and RAM 318 can be in separate IC packages, or can be integrated in the same IC package.

Programmable logic device 316 can receive network packet data over a first connection 309. Programmable logic device 316 can be connected to RAM 318 by a bus 320, which in particular embodiments can be a memory mapped bus. In some embodiments, a programmable logic device 316 can be connected to another device by a third connection 320. Such another device could include another programmable logic device or processor, as but two of many possible examples.

FIG. 4 is a block diagram of a server system 400 according to another embodiment. A system 400 can be one implementation of that shown in FIG. 1. A system 400 can include multiple racks (one shown as 426) each connected through respective TOR switches 402. TOR switches 402 can communicate with each other through an aggregation layer 430. Aggregation layer 430 may include several switches and routers and can act as the interface between an external network and the server racks 426.

Server racks 426 can each include a number of servers. All or some of the servers in each rack 426 can be a hardware accelerated server (one shown as 406). Each hardware accelerated server 406 can include one or more network interfaces 424, one or more host processors 414, and one or more hardware accelerator modules 408, according to any of the embodiments described herein, or equivalents.

FIG. 5 is a block diagram of a hardware accelerated server 506 that can be included in embodiments. A server 506 can include a network interface 524, one or more hardware accelerator modules 508, and one or more host processors 514.

Network interface 524 can receive network packet data from a network or another computer or virtual machine. In the very particular embodiment shown, a network interface 524 can include a network interface card (NIC). Network interface 524 can be connected to a host processor 514 and hardware accelerator module 508 by one or more buses 527. In some embodiments, bus(es) 527 can include a peripheral component interconnect (PCI) type bus. In very particular embodiments, a network interface 524 can be a NIC PCI and/or PCI express (PCIe) device connected with a host motherboard via PCI or PCIe bus (included in 527).

A host processor 514 can be any suitable processor device. In particular embodiments, a host processor 514 can include processors with “brawny” cores, such x86 based processors, as but one example.

A hardware accelerator module 508 can be connected to bus(es) 527 of server 506. In particular embodiments, hardware accelerator module 508 can be a circuit board that inserts into a bus socket on a main board of a server 506. As shown In FIG. 5, a hardware accelerator module 508 can include one or more FPGAs 526. FPGA(s) 526 can include circuits capable of receiving network packet data from bus(es) 527, and can process network packet data in any of various ways described herein. FPGA(s) 526 can also include circuits, or be connected to circuits, which can access data stored in a buffer memories of the hardware accelerator module 508.

In some embodiments, hardware accelerator module 508 can serve as part of a switch fabric. In such embodiments, hardware accelerator modules can include managed output queues. Session flows queued in each such queue can be sent out through an output port to a downstream network element of the system in which the server is employed.

FIG. 6 is a diagram showing a server system 600 according to another embodiment. A server system 600 can include a network packet data source 630, a mid-plane formed from hardware accelerator modules, hereinafter referred to as a hardware acceleration plane 604, and a plane formed by host processors, hereinafter referred to as a host processor plane 634. A network packet data source 630 can be a network, including the Internet, and/or can include an aggregation layer, like that shown as 430 in FIG. 4.

It is understood that hardware acceleration plane 604 and host processor plane 634 can be a logical representation of system resources. In particular, components of the same server can form different planes of the system. As but one particular example, a system 600 can include hardware accelerated servers (one shown as 606) that include one or more hardware acceleration modules 608-0 and one or more host processors 614-0. Such hardware accelerated servers can take the form of any of those shown herein, or equivalents.

FIG. 6 shows two of various possible network data processing paths (630, 632) that can be executed in a system 600. It is understood that such processing paths (630, 632) are provided by way of example, and should not be construed as limiting. Processing path 630 can include processing by two hardware accelerator modules 608-1/2. In some embodiments, such processing can be independent of any host processor (i.e., independent of host processor plane 634).

In contrast, processing path 632 can include processing by a hardware accelerator module 608-3 and a host processor 614-1. It is understood that hardware accelerator module 608-3 and host processor 614-1 can be in the same server (i.e., a same hardware accelerated server), or can be in different servers (e.g., hardware accelerator module 608-3 is in one hardware accelerated server, while host processor 614-1 is in a different server, which may or may not be a hardware accelerated server).

FIG. 7 is a diagram of a system 700 according to another embodiment. In a particular embodiment, system 700 can be one implementation of that shown in FIG. 6. A system 700 can provide a mid-plane switch architecture. One or more server units 706-0/1 can be equipped hardware accelerator modules 708-0/1, and thus can be considered hardware accelerated servers. Each hardware accelerator module 708-0/1 can act as a virtual switch 736-0/1 that is capable of receiving and forwarding packets. All the virtual switches 736-0/1 can be connected to each other, which can form a hardware acceleration plane 704.

In some embodiments, ingress packets can be examined and classified by the hardware accelerated modules 708-0/1. Hardware accelerated modules 708-0/1 can be capable of processing a relatively large number of packets. Accordingly, in some embodiments, a system 700 can include TOR switches (not shown) configured in conventional tree-like topologies, which can forward packets based on MAC address. Hardware accelerator modules 708-0/1 can perform deep packet inspection and classify packets with much more granularity before they are forwarded to other locations.

In certain embodiments, the role of layer 2 TOR switches can be limited to forwarding packets to hardware accelerated modules 708-0/1 such that essentially all the packet processing can be handled by the hardware accelerated modules 708-0/1. In such embodiments, progressively more server units can be equipped with hardware accelerated modules 708-0/1 to scale the packet handling capabilities instead of upgrading the TOR switches (which can be more costly).

While embodiments herein show hardware accelerator modules having particular components, such arrangements should not be construed as limiting. Based on the descriptions herein, a person skilled in the relevant art will recognize that other hardware components are within the spirit and scope of the embodiments described herein.

FIG. 8 is a diagram of a hardware accelerator module 808 according to one particular embodiment. A hardware accelerator module 808 can include a printed circuit board 838 having a physical interface 840. Physical interface 840 can enable hardware accelerator module 808 to be inserted into a slot on a server board. Mounted on the hardware accelerator module 808 can be circuit components 826, which can include programmable logic devices, such as an FPGA devices. In addition or alternatively, circuit components 826 can include any of: memory, including both volatile and nonvolatile memory; a programmable switch (e.g., network switch); and/or one or more processor cores.

In addition, hardware accelerator module 808 can include one or more network I/Fs 824. A network I/F 824 can enable a physical connection to a network. In some embodiments, this can include a wired network connection compatible with IEEE 802 and related standards. However, in other embodiments, a network I/F 824 can be any other suitable wired connection and/or a wireless connection.

Referring now to FIG. 9, a hardware accelerated server 906, according to one particular embodiment, is shown in a block diagram. A hardware accelerated server 906 can include a network I/F 924, a bus system 927, a host processor 914, and a hardware accelerator module 908. A network I/F 924 can receive packet or other I/O data from an external source. In some embodiments, network I/F 924 can include physical or virtual functions to receive a packet or other I/O data from a network or another computer or virtual machine. A network I/F 24 can include, but is not limited to, PCI and/or PCIe devices connecting with a server motherboard via PCI or PCIe bus (e.g., 927-0). Examples of network I/Fs 924 can include, but are not limited to, a NIC, a host bus adapter, a converged network adapter, or an ATM network interface.

In some embodiments, a hardware accelerated server 906 can employ an abstraction scheme that allows multiple logical entities to access the same network I/F 924. In such an arrangement, a network I/F 924 can be virtualized to provide for multiple virtual devices, each of which can perform some of the functions of a physical network I/F. Such IO virtualization can redirect network packet traffic to different addresses of the hardware accelerated server 906.

In the very particular embodiment shown, a network I/F 924 can include NIC having input buffer 924a and in some embodiments, an I/O virtualization function 924b. While a network I/F 924 can be configured to trigger host processor interrupts in response to incoming packets, in some embodiments, such interrupts can be disabled, thereby reducing processing overhead for a host processor 914.

In some embodiments, a hardware accelerated server 906 can also include an I/O management unit 940 which can translate virtual addresses to corresponding physical addresses of the server 906. This can enable data to be transferred between various components the hardware accelerated server 906.

A host processor 906 can perform certain processing tasks on network packet data, however, as noted herein, other network packet data processing tasks can be performed by hardware accelerator module 908 independent of host processor 914. In some embodiments, a host processor 914 can be a “brawny core” type processor (e.g., an x86 or any other processor capable of handling “heavy touch” computational operations).

A hardware accelerator module 908 can interface with a server bus 927-1 via a standard module connection. A server bus 927-1 can be any suitable bus, including a PCI type bus, but other embodiments can include a memory bus. A hardware accelerator module 908 can be implemented with one or more FPGAs 926-0/1. In the embodiments of FIG. 9, hardware accelerator module 908 can include FPGA(s) 926-0/1 in which can be formed any of the following: a host bus interface 942, an arbiter 944, a scheduler circuit 948, a classifier circuit 950, and/or processing circuits 952.

A host bus interface 942 can be connected to server bus 927-1, and can be capable of block data transfers over server bus 927-1. Packets can be queued in a memory 918. Memory 918 can be any suitable memory, including volatile and/or nonvolatile memory devices, separate and/or integrated with FGPA(s) 926-0/1.

An arbiter 944 can provide access to resources (e.g., processing circuits 952) on the hardware accelerator module 908 to one or more requestors. If multiple requestors request access, an arbiter 944 can determine which requestor becomes the accessor and then pass data from the accessor to the resource, and the resource can begin executing processing on the data. After the data has been transferred to a resource, and the resource has competed execution, an arbiter 944 can transfer control to a different requestor and this cycle can repeat for all available requestors. In the embodiment of FIG. 9, arbiter 944 can notify other portions of hardware accelerator module 908 of incoming data. Arbiter 944 can input and output data via data ingress path 946-0 and data egress path 946-1.

In some embodiments, a scheduler circuit 948 can perform traffic management on incoming packets by categorizing them according to flow using session metadata. Packets from a certain source, relating to a certain traffic class, pertaining to a specific application, or flowing to a certain socket, are referred to as part of a session flow and can be classified using session metadata. In some embodiments, such classification can be performed by classifier circuit 950. Packets can be queued for output in memory (e.g., 918) based on session priority.

In particular embodiments, a scheduler circuit 948 can allocate a priority to each of many output queues (e.g., in 918) and carry out reordering of incoming packets to maintain persistence of session flows in these queues. A scheduler circuit 948 can be configured to control the scheduling of each of these persistent sessions in processing circuits 952. Packets of a particular session flow can belong to a particular queue. A scheduler circuit 948 can control the prioritization of these queues such that they are arbitrated for handling by a processing resource (e.g., processing circuits 952) located downstream. Processing circuits 952 can be configured to allocate execution resources to a particular queue. Embodiments contemplate multiple sessions running on a processing circuits 952, with portions of processing circuits 952 each handling data from a particular session flow resident in a queue established by the scheduler circuit 948, to tightly integrate the scheduler circuit 948 and its downstream resources (e.g., 952). This can bring about persistence of session information across the traffic management and scheduling circuit 948 and processing circuits 952 Processing circuits 952 can be capable of processing packet data. In particular embodiments, processing circuit 952 can be capable of handling packets of different application or transport sessions. According to some embodiments, processing circuits 952 can provide dedicated computing resources for handling, processing and/or terminating session flows. Processing circuits 952 can include any suitable circuits of the FPGA(s) 926-0/1. However, in some embodiments, processing circuits 952 can include processors, including CPU type processors. In particular embodiments, processing circuits 952 can include low power processors capable of executing general purpose instructions, including but not limited to: ARM, ARC, Tensilica, MIPS, StrongARM or any other suitable processor that serve the functions described herein.

In operation, a hardware accelerated server 906 can receive network data packets from an external network. Based on their classification, the packets can be destined for a host processor 914 or processing circuits 952 on hardware accelerator module 908. The network data packets can have certain characteristics, including transport protocol number, source and destination port numbers, source and destination IP addresses, for example. In some embodiments, the network data packets can further have metadata that helps in their classification and/or management.

In some embodiments, any of multiple devices of the hardware accelerated server 906 can be used to redirect traffic to specific addresses. Such network data packets can be transferred to addresses where they can be handled by one or more processing circuits (e.g., 952). In particular embodiments, such transfers can be to physical addresses, thus logical entities can be removed from the processing, and a host processor 914 can be free from such packet handling. Accordingly, embodiments can be conceptualized as providing a “black box” to which specific network data can be fed for processing.

In some embodiments, session metadata can serve as the criteria by which packets are prioritized and scheduled and as such, incoming packets can be reordered based on their session metadata. This reordering of packets can occur in one or more buffers (e.g., 918) and can modify the traffic shape of these flows. The scheduling discipline chosen for this prioritization, or traffic management, can affect the traffic shape of flows and micro-flows through delay (buffering), bursting of traffic (buffering and bursting), smoothing of traffic (buffering and rate-limiting flows), dropping traffic (choosing data to discard so as to avoid exhausting the buffer), delay jitter (temporally shifting cells of a flow by different amounts) and by not admitting a connection (e.g., cannot simultaneously guarantee existing service level agreements (SLAs) with an additional flow's SLA).

In some embodiments, a hardware accelerator module 908 can serve as part of a switch fabric, and provide traffic management with output queues (e.g., in 918), the access to which is arbitrated by a scheduling circuit 948. Such output queues can be managed using a scheduling that provides traffic management for incoming flows. The session flows queued in each of these queues can be sent out through an output port to a downstream network element.

It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It is also understood that the embodiments of the invention may be practiced in the absence of an element and/or step not specifically disclosed. That is, an inventive feature of the invention may be elimination of an element.

Accordingly, while the various aspects of the particular embodiments set forth herein have been described in detail, the present invention could be subject to various changes, substitutions, and alterations without departing from the spirit and scope of the invention.

Claims

1. A server system, comprising:

a switching tier configured to receive network packets and enable network connections between a plurality of servers; and

a middle tier comprising first servers of the plurality of servers, each first server including at least one host processor, at least one network interface device, and at least one hardware accelerator module physically mounted in the first server, each hardware accelerator module including at least one field programmable gate array (FPGA) device coupled to receive network packet data from the switching tier over a first data path, and coupled to the at least one host processor by a second data path, each hardware accelerator module configurable to execute network packet data processing tasks independent from the at least one host processor of the server.

2. The server system of claim 1, wherein the at least one FPGA device is configured to classify network packets incoming to the at least one hardware accelerator module.

3. The server system of claim 2, wherein the at least one FPGA device classifying network packets includes any selected from the group of identifying: logical connections of the network packets, networks for the network packets, sessions of the network packets, and application level data for the network packets.

4. The server system of claim 1, wherein the at least one FPGA device is configured to encrypt network packets incoming to the at least one hardware accelerator module or decrypt network packets outgoing from to the at least one hardware accelerator module.

5. The server system of claim 1, wherein the at least one FPGA is configured as a virtual switch, receiving network packets and forwarding the network packet data to a destination based on a network address.

6. The server system of claim 1, wherein each hardware accelerator module further includes module memory accessible by circuits of the FPGA device.

7. The server system of claim 6, wherein the module memory is formed in the FPGA device.

8. The server system of claim 6, wherein the module memory includes volatile or nonvolatile memory.

9. The server system of claim 1, wherein the at least one FPGA device is configured to completely offload a predetermined network packet data processing task from the at least one host processor.

10. The server system of claim 1, wherein the network interface device comprises a network interface card (NIC).

11. The server system of claim 1, wherein:

the network interface device is physically connected to the at least one host processor by a first bus; and

the FPGA device is connected to the at least one host processor by a second bus different from the first bus.

12. The server system of claim 1, wherein the at least one network interface device is located on the hardware accelerator module.

13. The server system of claim 1, wherein:

the first servers are mounted in a rack; and

the switching tier comprises a top-of-rack (TOR) switch mounted in at least one of the racks of the server.

14. The server system of claim 1, further including a lower tier comprising a plurality of second servers of the plurality of servers, the second servers not including any of the hardware accelerator modules.

15. A server system, comprising:

a plurality of first type servers interconnected by network connections, each first type server including at least one host processor, at least one network interface device, and at least one hardware accelerator module physically mounted in the first server comprising at least one field programmable gate array (FPGA) device; wherein

the hardware accelerator modules of the first type servers form a hardware acceleration plane configured to enable processing of network packet data by the hardware acceleration modules independent of the host processors, and transmission of network packet data between the hardware accelerator modules independent of the host processors.

16. The server system of claim 15, wherein the hardware acceleration plane is a midplane disposed between a source for the network packet data and the host processors.

17. The server system of claim 15, wherein each first type server includes at least a first bus connection between the at least one host processor and the at least one hardware accelerator module.

18. The server system of claim 17, wherein each the FPGA device of each at least one hardware accelerator module includes a bus interface directly connected to the first bus connection.

19. The server system of claim 15 wherein each hardware acceleration module receives network packet data at its FPGA device.

20. The server system of claim 15, wherein each at least one hardware accelerator module is further configured to transmit network packet data to at least one of the host processors.

21. The server system of claim 20, wherein each at least one hardware accelerator module is further configured to transmit network packet data to at least one of the host processor of its first type server.

22. The server system of claim 15, wherein the plurality of first type servers are arranged into groups, each group including a switching device to provide network connections between at least the first type servers of the group.

23. The server system of claim 22, wherein:

each group is physically organized in its own rack; and

the switching device for each group includes a top-of-rack switch.