Even though sending files over http might sound like a simple task that can be done only in a few ways, the opposite is true.
During the development of grizzly-sendfile several kinds of algorithms were identified and there might be more types that have not been categorized yet.
- classification based on IO mode used
- blocking algorithms
- non-blocking algorithms
- classfication based on number of write bursts
- single-burst algorithms
- multi-burst algorithms
There are two modes that can be used to write data into a SocketChannel - blocking and non-blocking mode.
Blocking algorithms will block on a write, until the write buffer is empty or until the client closes the connection.
Non-blocking algorithms will attempt to write all the data from the buffer and return immediately without blocking resulting in all the data, only some data, or no data to be written depending on the clients readiness to accept data.
A write burst is one or a few write attempts executed by a dedicated worker thread. In order to understand write bursts better, you should check out the download life cycle.
Single-burst algorithms will attempt to send an entire file to the client in one burst, meaning that there is a dedicated thread exclusively assigned to this transfer.
Multi-burst algorithms will attempt to send a file in multiple bursts. Each burst is executed by a thread assigned to the transfer for the duration of one burst. Once a write burst is completed, the transfer is paused and put to a queue where it waits until the client is ready to receive more data and a thread is available to execute the burst. This rescheduling occurs fast enough that in the most cases the client is not aware of sharing threads with other downloads.
The theory behind multi-burst algorithms is that clients are usually slower than the server. This results in a lot of time spent by server threads waiting for the client to get ready to accept more data. So instead of waiting, the thread can serve a burst for another download and when the client is ready, an available thread will be assigned to the download and will serve a burst.
|Name||IO Mode||Burst Mode||Notes|
|SimpleBlockingAlgorithm (SBA)||blocking||single-burst||Simple and reliable algorithm with limited scalability but excellent performance.|
|EqualNonBlockingAlgorithm (ENBA)||non-blocking||multi-burst||Algorithm with performance only slightly worse (~10%) compared to single-burst blocking algorithms when not under low load, but when under high load the performance equally degrades for all file transfers in order to achieve better scalability.|
|EqualBlockingAlgorithm (EBA)||blocking||multi-burst||Algorithm with performance almost as good as single-burst blocking algorithms when not under low load, but when under high load the performance equally degrades for all file transfers in order to achieve better scalability. This is currently the default algorithm.|
Specifying the algorithm that grizzly-sendfile should use is as simple as specifying a jvm parameter. Please see grizzly-sendfile configuration page for all the details.
A thorough comparison of performance and scalability characteristics of different algorithms can be found blog post: grizzly-sendfile and Comparison of Blocking and NonBlocking IO
While blocking algorithms that employ the traditional (single-burst) approach can achieve better performance for really fast clients, they don't scale well. On the diagram below, clients requested 6 downloads at approximately the same time, but because the server uses blocking io without multiplexing (SimpleBlockingAlgorithm) and has only 3 worker threads available to handle these downloads, 3 of the downloads are being put to the queue and won't start until there is a free thread that could handle them.
On the second diagram, the server is using non-blocking io with multiplexing (EqualNonBlockingAlgorithm), which results in slightly longer average download time, but the 3 threads can handle all 6 downloads concurrently, which results in a much better throughput. This is most likely because the threads are utilized whenever any of the clients is able to receive more data.
The last diagram represents blocking io with multiplexing (EqualBlockingAlgorithm), which combines the first two approaches. The benchmarks executed against this algorithm suggest that the amount of blocking is much lower compared to SimpleBlockingAlgorithm (possibly because OS and network buffers have more time to prepare for receiving more data), yet the throughput is higher (most likely due to less frequent selector re-registration, which translates to lower overhead).