As the Vector is not trendy any more (and performs unnecessary synchronization), my first thought was to just switch to the ArrayList, as everybody advise.
ArrayList is not a direct substitution, though. It lacks the setSize() method that would create holes in the Array - elements filled with nulls up to the requested index.
I know I could overcome this problem with looped add(null), but I decided to go the easy way. I just created a plain old java array[] and provided my own private ensureCapacity equivalent to do the growing.
Lesson for me: don't blindly assume that what everybody (including Joshua Bloch) says is applicable in all situations. There are contexts where simple array is better then the newest Collections framework :-)
Why there is no direct substitution for Vector is another story.
10 comments:
First of all you should use interfaces like List or Collection. Second thing is that you can do what you want using ArrayList constructor: http://java.sun.com/javase/6/docs/api/java/util/ArrayList.html#ArrayList(int)
Hope that helps.
Sorry - I missed "dynamically growing array". Anyway you probably should just use ArrayList (which dynamically grows if you reach capacity limit) or LinkedList. Implementing your own collection "smells" unless it's very primitive. With your own implementation you have to remember to "nullify" each element of the array when you do the clean-up. Otherwise you're just causing memory-leak (yes, it's possible in Java :) because you don't release references to the objects in your array and GC will never collect them.
What I need is efectively:
array[0] = new Something();
array[7] = new SomethingElse();
array[lotsa] = new Something();
Why I need that this way - because later I do
if (array[x] != null) doSomething(array[x])
for some arbitrary x.
See java.util.BitSet - I practically rewrite it (but I use references instead of longs)
There is just no clean way to do it using ArrayList, and instead of playing some tricks to force ArrayList to behave like Vector I just reimplemented pieces of Vector that I needed.
What's happening here is that Java is trying to send you a message that you're using the wrong collection type. To almost quote the Javadoc, a List is an "ordered collection" of elements, otherwise known as a "sequence". It's a data structure designed for keeping a bunch of values in a particular order.
You're trying to use it as a way to associate values with arbitrary numbers. The data structure you're really looking for is an "associative array", which Java calls a Map.
I pretty much agree with Charles - it looks like Map...
Try Google collection framework to see - maybe they implemented what you need.
Again - it's quite amazing and unbelievable (yet slightly possible) that there is no implementation of a collection you need. Maybe you have some perverted needs ;) and should rethink the implementation? Note that some other programmer looking at your code will have no bloody idea what you wrote...
I did some experimentation with (Hash)Set of Longs vs sorted long[] array with binary search.
Turned out that for small sizes ( < ~100) binary searched array performed order of magnitude better then {Set<Long>} set.contains(new Long(value)).
For bigger sets/arrays performance was similar - there was no significant performance gain from using the HashSet up to sizes of ~1000 (I did not need anything bigger).
(longs represented timestamps)
So, unless you tweak hashcode() & stuff, don't take O(1) HashSet lookup for granted.
Have I mentioned that this peace of code is performance-critical (CPU, not memory)?
BTW - I challenge you and call you "everybody" in the sense defined in my original post ;-)
What you propose is the "best practice", applicable in most situations, but not all.
BTW, if anyone knows an unsynchronized Vector implementation, please send a pointer.
Could you post the test program you used to justify "I did some experimentation with (Hash)Set of Longs vs sorted long[] array with binary search.
Turned out that for small sizes ( < ~100) binary searched array performed order of magnitude better then {Set<Long>} set.contains(new Long(value))"
I did some experimentation too and I don't agree with you - my results show that HashSet is faster than sorted array.
The smaller value of 20, the more visible the difference.
import java.util.Random;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
public class PerfTest {
public static final int LOOPS = 1000;
public static void main(String[] args) {
long[] data = createData(20);
long[] search = createSearch(data);
HashSet<Long> set = new HashSet<Long>(data.length);
for (long l : data) {
set.add(new Long(l));
}
int hitCount = 0;
long t1 = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++) {
for (long l : search) {
if (Arrays.binarySearch(data, l) >= 0) ++hitCount;
}
}
long t2 = System.currentTimeMillis();
for (int i = 0; i < LOOPS; i++) {
for (long l : search) {
if (set.contains(new Long(l))) ++hitCount;
}
}
long t3 = System.currentTimeMillis();
System.out.println("T1 = " + (t2 - t1) + " T2 = " + (t3 - t2));
}
static long[] createData(int len) {
Random rnd = new Random();
long[] array = new long[len];
for (int i = 0; i < len; i++) {
array[i] = rnd.nextLong();
}
Arrays.sort(array);
return array;
}
static long[] createSearch(long[] data) {
Random rnd = new Random();
long[] array = new long[data.length * 2];
System.arraycopy(data, 0, array, 0, data.length);
for (int i = data.length; i < array.length; i++) {
array[i] = rnd.nextLong();
}
Collections.shuffle(Arrays.asList(array));
return array;
}
}
Funny ;) Well my conclusion is that it only makes sense when the array size is less than 12. With number greater than this HashSet was basically faster (change LOOPS to 100000 to see something) - Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Was it really such a big problem for you that you had to re-implement Vector? :) Is the gain of 30 ms (during 100K iterations) worth such overengineering?
Too many loops and the HotSpot will change the outcome ;-)
Such microbenchmarks are very fragile anyway.
The problem with HashSet/array was different then the one from the post, I just mentioned it as an argument against implementing arrays over HashMaps.
The HashSet performance problem was one of top performance bottlenecks reported by profiler - 30ms do make difference when invoked in a loop.
Post a Comment