Java Resizable Array()

  本篇文章为你整理了Java Resizable Array()的详细内容,包含有 Java Resizable Array,希望能帮助你了解 Java Resizable Array。

   Sometimes you want to keep data (typically raw bytes) in a single, consecutive array for fast and easy access, but need the array to be

   resizable, or at least expandable. Java arrays are not resizable, so using an array by itself is not sufficient.

   Thus, to get a resizable array for primitive types you need to implement

   it yourself. In this tutorial I will show you how to implement a resizable array in Java.

  
You might wonder why you cannot just use a Java ArrayList for this purpose. The answer is - you can if

   one of these conditions are true:

  
You are storing some object type in the array.

   You are storing primitive types in the array and performance and memory usage doesnt matter.

  
The Java ArrayList class only works for objects - not for primitive types (byte, int, long etc). If you use a

   Java ArrayList to store primitive data it will be autoboxed into an object when inserted into the ArrayList, and

   unboxed out of this object back to a primitive type. This boxing and unboxing adds an overhead to insertion of and

   access to each element - which you do not want when you are trying to optimize for performance (this article is

   part of my Java Performance trail).

   Additionally, you have no guarantee for where in memory the autoboxed objects are stored. They can be scattered all

   over the heap. Thus consecutive access

   to them can be much, much slower than when stored consecutively in memory - in a primitive type array.

   Additionally, storing a primitive in its object counterpart (e.g. a long in a Long object) incurs significant

   extra memory overhead per element.

  
 

  Java Resizable Array - GitHub Repository

   The code for the Java resizable array implemented in this tutorial can be found on GitHub (for easy access):

   https://github.com/jjenkov/java-resizable-array

   The code consist of 3 Java classes and 2 unit tests.

  
Imagine you have a server receiving messages of varying sizes. Some messages are very small (e.g. less than 4KB),

   others as large as 1MB or even larger. If the server cannot see from the beginning of the message how big the message

   will end up being, there is no way of pre-allocating the correct amount of memory for that message. The

   alternative is to "over-allocate", meaning allocate a memory block that we believe is big enough to hold the

   message.

   If the server is receiving messages from many (100K +) connections at the same time, we need to limit how much

   memory we allocate for each message. We cannot just allocate the maximal message size (e.g. 1MB or 16MB etc.) for

   each message buffer. That would deplete the server memory fast when the number of connections and messages go up!

   100.000 x 1MB = 100GB (approximately - not precisely - but you get the picture).

   The alternative is to start with a smaller buffer and if that buffer turns out not to be big enough to hold the

   full message, we expand the buffer.

  
This is an expensive set of operations! Especially the copying of the old data to the new data gets more and

   more expensive the larger the old array is.

   Note, that in languages with built-in garbage collection (like Java, C# etc.) you do not explicitly have to

   deallocate the old array, but the VM still has to do it, so your application still pays the deallocation price

   sooner or later.

  
To minimize the price paid for array expansion you want to expand the array as few times as possible, in order

   to make it large enough to hold the total message.

   If we assume that most messages are small, we can start with a small buffer. If a message grows beyond

   the small buffer size, we allocate a new, medium sized array and copy the data to the new, medium sized array.

   If the message outgrows the medium sized array too, a large array is allocated, and the message is copied to that.

   The large array should be the largest size allowed for a message. If the message outgrows the large message buffer,

   it is too large to be processed and should be rejected by the server.

   Using this strategy, most messages will only ever use a small buffer. This means that the server memory is

   used more efficiently. 100.000 x 4KB (small buffer size) is only 400MB. Most servers should be able to handle that.

   Even at 4GB (1.000.000 x 4KB) modern servers should be able to handle it.

  
The ResizableArrayBuffer contains a single, large array. This array is divided into three sections.

   One section for small arrays, one for medium size arrays and one for large arrays.

   The ResizableArray represents a single, resizable array, with its data stored in the array in

   the ResizableArrayBuffer.

   Here is a diagram illustrating the large array divided into sections, and each section divided into blocks.

   By reserving an area of the large array in the ResizableArrayBuffer for each message size interval

   (small, medium, large), we make sure that the array cannot be fully filled with either size messages. For instance,

   receiving a lot of small messages cannot take all the memory and thus block for medium and large messages. Similarly,

   receiving a lot of large messages cannot take up all memory and block for medium and small messages.

   Since all messages start as small messages, if the amount of small arrays is depleted, no new arrays can be allocated,

   regardless of whether or not there is space in the medium and large array sections. However, it is possible to

   make the small array section large enough that the probability of this happening is very low.

   It is still possible to grow small messages into medium and large messages, even if the small message section

   is fully used.

  
A possible optimization is to use only a single block size. Thus, sometimes a newly allocated block might

   be allocted just after the block that needs to be expanded. In that case you dont need to copy the data from

   the old array to the new. You can simply "expand" the block to include both of the two allocated blocks, and

   write new data into the expanded block from where the second block was added. This saves copying of all array

   data when a memory block can be "expanded" into the next consecutive block.

   A drawback of the above mentioned possible optimization is, that in the cases where it is not possible to expand

   into the next memory block, copying of data is still necessary. Thus you have a small overhead of checking

   if expansion is possible. Additionally, if you use a small block size, you may have to expand blocks more often

   than if you use e.g. a small, medium and large block size.

  
The large array inside the ResizableArrayBuffer is split up into three sections. Each section is split

   up into smaller blocks. Each block in each section has the same size. Blocks in the small message section has the

   same small block size. Blocks in the medium message section has the same medium block size. And blocks in the

   large message section has the same large block size.

   When all blocks within a section has the same size, it is easier to keep track of used and unused blocks.

   You can simply use a queue containing the start indexes of each block. One queue is needed for each section of

   the shared array. Thus, one queue is needed to keep track of free small blocks, one queue for free medium blocks

   and one queue for free large blocks.

   Allocating a block from any of the sections can be accomplished simply by taking the next free block start index

   from the queue associated with the desired section. Freeing a block is done by putting the start index back into

   the corresponding queue.

   As a queue I have used a simple Ring Buffer implementation. The implementation

   used in the GitHubRepository is called QueueIntFlip.

  
The Resizable array will expand itself when you write data to it. If you attempt to write more data to it than

   it has space for in its currently allocated block, it will allocate a new, larger block and copy all its data

   to that new block. The previous smaller block is then freed.

  
Once you are done with a resizable array you should free it again, so that it can be used for other messages.

  
I want to show you how to use the ResizableArrayBuffer (from the GitHub repository).

  
This example creates a ResizableArrayBuffer with a small array size of 4KB, medium array size of

   128KB and a larger array size of 1MB. The ResizableArrayBuffer contains space for 1024 small arrays (4MB in total),

   32 medium arrays (4MB in total) and 4 large arrays (4MB in total) - for a full shared array size of 12MB.

  
To obtain a ResizableArray instance, call the ResizableArrayBuffers getArray()

   method, like this:

  

 

 

  ResizableArray resizableArray = arrayBuffer.getArray();

  

 

  
This example obtains a ResizableArray of the smallest size (4KB the small size was set to earlier).

  
You write to a ResizableArray by calling its write() method. The ResizableArray

   class in the GitHub repository only contains a single write() method which takes a ByteBuffer

   as parameter. It should be pretty easy to add more write() methods yourself, though.

   Here is a write example:

  

 

 

  ByteBuffer byteBuffer = ByteBuffer.allocate(1024);

  for(int i=0; i ByteBuffer into the array (block) of the ResizableArray.

   The value returned by write() is the number of bytes copied from the ByteBuffer.

   In case the ByteBuffer had contained more data than the ResizableArray, the

   ResizableArray will attempt to expand itself to make space for the data in the ByteBuffer.

   If the ResizableArray cannot contain all the data in the ByteBuffer after expanding

   itself to the max size, the write() method will return -1 and no data will have been

   copied at all!

  
When you read from a ResizableArray you do so directly in the shared array which all

   ResizableArray instances share. The ResizableArray contains the following

   public field:

  

 

 

  public byte[] sharedArray = null;

  public int offset = 0;

  public int capacity = 0;

  public int length = 0;

  

 

   The sharedArray field references the shared array which all ResizableArray instances

   keep their data in. That is the array kept internally in the ResizableArrayBuffer too.

   The offset field contains the start index in the shared array where this ResizableArray

   keeps its data (the block assigned to it).

   The capacity field contains the size of the block in the shared array assigned to this

   ResizableArray instance.

   The length field contains how much of the assigned block the ResizableArray is actually

   using.

   To read the data written to a ResizableArray simply read the bytes from sharedArray[offset]

   to sharedArray[offset + length -1].

  
Once you are done using a ResizableArray instance you should free it. You do so simply by calling

   the free() method on the ResizableArray, like this:

  

 

 

  resizableArray.free();

  

 

   Calling free() takes care of returning the used block to the correct block queue, regardless of

   the size of the block allocated to the ResizableArray.

  
You can change the design of my ResizableArrayBuffer to meet your needs. For instance, you can

   create more than 3 sections inside it. It should be fairly easy to do. Just look at the code in the GitHub

   repository and modify it.

  以上就是Java Resizable Array()的详细内容,想要了解更多 Java Resizable Array的内容,请持续关注盛行IT软件开发工作室。

郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。

留言与评论(共有 条评论)
   
验证码: