Efficient command delivery and data transfer
In an example embodiment, a method of delivering a command from an initiator device also transfers data identified by the command to a target device. The data is transferred between the initiator device and the target device according to a selected maximum payload size. The method includes determining whether or not the size of the data associated with the command is greater than the selected maximum payload size. If the size of the data associated with the command is not greater than the selected maximum payload size, then a block is transferred to or from the target device which includes the command and all of the data associated with the command. If the size of the data associated with the command is greater than the selected maximum payload size, then a block is transferred to or from the target device which includes the command, an amount of data associated with the command equal to the selected maximum payload size and an indication that not all of the data associated with the command was included in the transferred block.
1. Field of the Invention
This invention relates generally to methods and apparatus for transferring commands and associated data blocks. In particular, the present invention relates to methods and apparatus for efficiently transferring commands and their associated data between various devices in a network or in a server architecture.
2. Description of the Related Art
The latency incurred when transferring data can greatly diminish the performance of networks and server architectures since the transferring and the transferee input/output (I/O) devices are usually unable to engage in other operations until the data transfer is complete. This latency is longer and even more complicated in networks and server architectures than in other computer systems because there is so much competition for network and server resources including system memory, processor(s) and multiple I/O devices. This can be quite disadvantageous in networks and server architectures where a large number of data blocks are frequently transferred between the processor, memory and several different I/O devices and/or the data blocks are of widely different sizes. Indeed, the lack of efficiency in transferring data blocks may have a larger effect on overall performance than the speed or other performance characteristics of the elements in the network or server architecture. It also may be that the buses and/or I/O adaptor cards connecting I/O devices to the processor are the bottleneck and the performance of these I/O subsystem components needs to be improved.
Conventional servers typically have multiple adaptor cards, each of which usually supports multiple I/O devices. A server may have a significant number of I/O devices configured in a load/store configuration such as shown in
More particularly, in the example of
In this load/store configuration, taking a write command, for example, suppose the processor P wishes to write a block of data within the hard disk HD. First, as shown in
A similar procedure occurs when the processor P reads a block of data from the hard disk HD, i.e., the adapter card A would store the block of data within a block B within the system memory SM, then pass an indication to the processor P that the read process from the hard disk HD has been finished, whereupon the processor P can access the block B within the system memory SM to obtain the data. Such a conventional procedure (illustrated generally in
The present invention is directed to the delivery of commands and transfer of data associated with the commands. A method of delivering a command from an initiator device also transfers data identified by the command to a target device. The data is transferred between the initiator device and the target device according to a selected maximum payload size. The method includes determining whether or not the size of the data associated with the command is greater than the selected maximum payload size. If the size of the data associated with the command is not greater than the selected maximum payload size, then a block is transferred to or from the target device which includes the command and all of the data associated with the command. If the size of the data associated with the command is greater than the selected maximum payload size, then a block is transferred to or from the target device which includes the command, an amount of data associated with the command equal to the selected maximum payload size and an indication that not all of the data associated with the command was included in the transferred block.
BRIEF DESCRIPTION OF THE DRAWINGSThe foregoing and a better understanding of the present invention will become apparent from the following detailed description of example embodiments and the claims when read in connection with the accompanying drawings, all forming a part of the disclosure of the invention. While the foregoing and following written and illustrated disclosure focuses on disclosing example embodiments of the invention, it should be clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation.
An example embodiment of the present invention seeks to decrease the inefficiencies of the transfer of data to input/output devices in a server architecture, such as what occurs when an I/O data block is transferred to or from a mass storage device such as a hard disk. In particular, PCI compliant I/O adapters cannot accomplish data transfers without the multiple steps discussed above. Computer systems generally have a processor, associated system memory, an input/output (I/O) device, and at least one bus, such as a PCI bus, connecting these components. A server is a type of computer system having an architecture or otherwise designed to be able to support multiple I/O devices and to transfer data with other computer systems at high speed. (Due to recent advances in the performance and flexibility of computer systems, many modern computers are servers under this definition.) Although many servers currently utilize PCI buses, the example embodiment of the invention set forths a data transfer where the transferee device has remote direct memory access (RDMA) to virtual addresses, thus enabling protected, target-managed data transfer.
The example embodiment attempts to reduce the latency when an element of the host server, such as one of the processors, attempts to write a data block to the hard disk drive either for the execution of instructions or to store the data block in system memory and to optimize the coordination of the transfer of I/O data blocks. For a disk drive, the data block is the unit of addressing and data transfer. If the value of one byte is to be updated on a disk, then the data transfer would include a block of data (512 bytes, for example) that contains the byte of interest. The byte value in the copy of the block held in memory would be updated, and then that block would be transferred from memory to the drive, overwriting the old block stored on the disk. However, the method according to the example embodiment is not limited in its application to disk drives or storage devices. In particular, the method according to the example embodiment may be useful for transferring data among computers and other devices on a network since data latency is critical in such environments. The data may be transferred in blocks of different sizes depending upon, for example, the target device, the transfer protocol (such as, for example, ethernet packets), etc.
One example application of the invention is in a processor or chipset incorporated in the input/output control circuit of a server device to operate in conjunction with a processor, such as the Intel Pentium II Xeon™ or Pentium III Xeon™ processor. However, such an embodiment is but one possible example of the invention which may, of course, be applied in any computer having a processor and an input/output device and indeed in any server architecture where an improvement in writing and reading data blocks to or from an I/O device is desired for whatever reason.
One possible application of the invention is in a server architecture with the switched fabric configuration shown in
The channel adapter CA of the I/O unit, in turn, is connected to a switching fabric SF, which may contain many different switches SW and redundant paths throughout the fabric, such that a plurality of messages can be traveling through the switching fabric at any given time. Accordingly, when the processor P issues a write command, for example, the processor P now simply passes the same to the channel adaptor CA which injects it into the switched fabric SF, such that the processor P does not have to wait for processing of the and locking of the system bus, but instead go on to perform other processing operation until the processing is completed.
According to the present invention, the channel is any means of transferring data, including but not limited to virtual channels, used to transfer data between two endpoints. While the example embodiment is an NGIO implementation and thus supports the channel definition provided in the specification identified above, the present invention is not so limited. In accordance with the implantation in the NGIO specification, once injected into the switched fabric SF, the write command travels through the switches and eventually arrives at a second channel adapter CA where it can be given to an I/O adaptor card A where it is subsequently written to the hard disk HD or to a network interface where it is subsequently transferred to another computer device on a connected network (not shown). Accordingly, the inherent delays in deciphering the command and writing of the data as required by the I/O adaptor card A are not experienced by the processor P which is on the other side of the switching fabric, and can continue processing. As shown in
Turning now to
Accordingly,
Of course, the parameter of primary importance is the selection of the amount of data, called the maximum payload size, that can be transferred in one of the transfer blocks. If the allocation of memory in a computer device was inconsequential in terms of cost, power consumption, etc., then of course an extremely large of memory could be provided. Since that is not the case, one important point of the present invention is that an advantageous memory size/latency tradeoff is made by proper selection of the maximum payload size for a target device and data transfer. In the two transfer size graphs shown in
Accordingly, a further advantageous arrangement is shown in the channel example shown in
In
There are different possible points in the server architecture to implement the delivery method. The first possible implementation is at a somewhat centralized (but not shared) location. This implementation takes advantage of the fact that I/O adaptor cards are a standard component of input/output subsystems and generally don't include any specialized circuitry or software for effectuating the described method of transferring data blocks. In the context of this application, they can be considered “dumb” I/O cards. An example implementation of the invention uses such dumb I/O cards because they are standardized and less expensive than non-standard I/O cards, and performs the method elsewhere in the network configuration. The method may support a different block size B for each I/O device. In such case, it then looks at the block size for the respective I/O device and the data transfer is carried out per I/O device based on the size of the data blocks generally transferred for operation of that particular I/O device. Also, for example, one I/O device may be a CD-RW disk drive and the other may be a high speed communications interface (e.g., an asynchronous transfer mode (ATM) interface). Preferably, the method is implemented as firmware or software, although it may be accelerated with hardware support.
A set system length parameter is tuned to the target device or some basic unit. Preferably, a basic unit is set for the data block size B which is equal to a multiple of the data storage format of the I/O device. For example, in a disk drive storing data in 1 kilobyte sectors, it is preferable that the predetermined block size be a multiple of kilobytes. However, the data blocks may be of different sizes controlled according to the firmware or software. The firmware or software may or may not maintain an internal history of data transfers to each I/O device and adjust the block size B according to the history to provide a further level of adaptability to adjust to operating conditions. The block size B can thus respond to conditions of the I/O device at the time of operation, rather than a static design made at decision time or boot time.
Although an example embodiment, the invention is not limited to the switched fabric configuration or to a host server as illustrated in
Other features of the invention may be apparent to those skilled in the art from the detailed description of the example embodiments and claims when read in connection with the accompanying drawings. While the foregoing and following written and illustrated disclosure focuses on disclosing example embodiments of the invention, it should be understood that the same is by way of illustration and example only, is not to be taken by way of limitation and may be modified in learned practice of the invention. While the foregoing has described what are considered to be example embodiments of the invention, it is understood that various modifications may be made therein and that the invention may be implemented in various forms and embodiments, and that it may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim all such modifications and variations.
Claims
1-31. (canceled)
32. A method comprising:
- at least one of issuing a block from a first device to a second device and receiving the block at the second device, the block including a command that is associated with data, if the data has a size that is less than or equal to a maximum payload size, the block also including all of the data with which the command is associated.
33. The method of claim 32, wherein:
- the second device is associated with a data storage format; and
- the block is associated with a data block size that is equal to a multiple of the data storage format.
34. The method of claim 32, wherein:
- the block is associated with a data block size; and
- the method further comprises adjusting the data block size.
35. The method of claim 32, further comprising:
- determining whether the size of the data with which the command is associated is greater than the maximum payload size.
36. The method of claim 32, further comprising:
- selecting the maximum payload size.
37. The method of claim 32, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
38. A method comprising:
- at least one of issuing one block from a first device to a second device and receiving the one block at the second device, the one block including a command that is associated with data, if the data has a size that is greater than a maximum payload size, the one block including the command, an amount of the data that is equal to the maximum payload size, and an indication that less than all of the data is included in the one block.
39. The method of claim 38, further comprising:
- at least one of transferring to the second device from the first device another block and receiving at the second device the another block, the another block including a remaining portion of the data that is not included in the one block.
40. The method of claim 39, further comprising:
- at least one of issuing a request from the second device and receiving at the first device the request, the request requesting transfer to the second device of the remaining portion.
41. The method of claim 38, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
42. An apparatus comprising:
- circuitry that is capable of at least one of issuing a block from a first device to a second device and receiving the block at the second device, the block including a command that is associated with data, if the data has a size that is less than or equal to a maximum payload size, the block also including all of the data with which the command is associated.
43. The apparatus of claim 42, wherein:
- the second device is associated with a data storage format; and
- the block is associated with a data block size that is equal to a multiple of the data storage format.
44. The apparatus of claim 42, wherein:
- the block is associated with a data block size; and
- the circuitry is also capable of adjusting the data block size.
45. The apparatus of claim 42, wherein:
- the circuitry is also capable of determining whether the size of the data with which the command is associated is greater than the maximum payload size.
46. The apparatus of claim 42, wherein:
- the circuitry is also capable of selecting the maximum payload size.
47. The apparatus of claim 42, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
48. An apparatus comprising:
- circuitry that is capable of at least one of issuing one block from a first device to a second device and receiving the one block at the second device, the one block including a command that is associated with data, if the data has a size that is greater than a maximum payload size, the one block including the command, an amount of the data that is equal to the maximum payload size, and an indication that less than all of the data is included in the one block.
49. The apparatus of claim 48, wherein:
- the circuitry is also capable of at least one of transferring to the second device from the first device another block and receiving at the second device the another block, the another block including a remaining portion of the data that is not included in the one block.
50. The apparatus of claim 49, wherein:
- the circuitry is also capable of at least one of issuing a request from the second device and receiving at the first device the request, the request requesting transfer to the second device of the remaining portion.
51. The apparatus of claim 48, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
52. An article comprising:
- a storage medium storing instructions that when executed by a machine result in the following:
- at least one of issuing a block from a first device to a second device and receiving the block at the second device, the block including a command that is associated with data, if the data has a size that is less than or equal to a maximum payload size, the block also including all of the data with which the command is associated.
53. The article of claim 52, wherein:
- the second device is associated with a data storage format; and
- the block is associated with a data block size that is equal to a multiple of the data storage format.
54. The article of claim 52, wherein:
- the block is associated with a data block size; and
- the instruction when executed by the machine also result in adjusting the data block size.
55. The article of claim 52, wherein:
- the instructions when executed by the machine also result in determining whether the size of the data with which the command is associated is greater than the maximum payload size.
56. The article of claim 52, wherein:
- the instructions when executed by the machine also result in selecting the maximum payload size.
57. The article of claim 52, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
58. An article comprising:
- a storage medium storing instructions that when executed by a machine result in the following:
- at least one of issuing one block from a first device to a second device and receiving the one block at the second device, the one block including a command that is associated with data, if the data has a size that is greater than a maximum payload size, the one block including the command, an amount of the data that is equal to the maximum payload size, and an indication that less than all of the data is included in the one block.
59. The article of claim 58, wherein:
- the instructions when executed by the machine also result in at least one of transferring to the second device from the first device another block and receiving at the second device the another block, the another block including a remaining portion of the data that is not included in the one block.
60. The article of claim 59, wherein:
- the instructions when executed by the machine also result in at least one of issuing a request from the second device and receiving at the first device the request, the request requesting transfer to the second device of the remaining portion.
61. The article of claim 58, wherein:
- the first device comprises an initiator device; and
- the second device comprises a target device.
62. A system comprising:
- a first device;
- a second device;
- one or more channels to communicatively couple the first device and the second device;
- the first device being capable of issuing a block to a second device via the one or more channels;
- the second device being capable of receiving the block via the one or more channel; and
- the block including a command that is associated with data, if the data has a size that is less than or equal to a maximum payload size, the block also including all of the data with which the command is associated.
63. The system of claim 62, wherein:
- the one or more channels comprise one or more point-to-point connections.
64. A system comprising:
- a first device;
- a second device;
- one or more channels to communicatively couple the first device and the second device;
- the first device being capable of issuing a block to a second device via the one or more channels;
- the second device being capable of receiving the block via the one or more channel; and
- the one block including a command that is associated with data, if the data has a size that is greater than a maximum payload size, the one block including the command, an amount of the data that is equal to the maximum payload size, and an indication that less than all of the data is included in the one block.
65. The system of claim 64, wherein:
- the one or more channels comprise one or more point-to-point connections.
Type: Application
Filed: Feb 2, 2005
Publication Date: Jul 14, 2005
Inventor: Cecil Simpson (Beaverton, OR)
Application Number: 11/050,143