Also See Link

2004-05-19 The Java Specialists' Newsletter [Issue 088] - Resetting ObjectOutputStream

Author: Dr. Heinz M. Kabutz

If you are reading this, and have not subscribed, please consider doing it now by going to our subscribe page. You can subscribe either via email or RSS.


Welcome to the 88th edition of The Java(tm) Specialists' Newsletter. Our readership has increased to 100 countries, with the recent addition of Botswana. A special welcome to my neighbouring country :-) Please remember to forward these newsletters to friends and colleagues who would be interested in joining. The bigger the readership, the more pressure I will be under to write new, good, newsletters :-)

For all the Chinese readers of this newsletter, please read our Mandarin translation of the Design Patterns Brochure and tell us what you think of it. [For the English version, please click here] Please let me know if would be able to translate some of our newsletters into Mandarin.

There comes a time in any company, when it becomes important to know what its "vision" and "mission" are. A vision and a mission are supposed to help staff be more focused and provide better consistent service to customers. After some thought, we came up with the following Vision for Maximum Solutions - The Java(tm) Specialists (not set in concrete yet):

Maximum Solutions develops and provides the best training in the world for professional Java programmers.

There is a saying that goes round: "Those who cannot do, teach." The impression is that trainers are usually teaching others because they themselves are not good enough to be in the real world. The saying is perhaps a bit unfair, so I am determined to change this perception. A part of our Mission is that in order to stay "the best" at training, we make sure that all our trainers are active in developing real software.

That said, our collection of courses is growing:

  1. Java Course on Java 2 Standard Edition
  2. Java Course on Java 2 Enterprise Edition
  3. Design Patterns in Java
  4. Design Patterns in Delphi
  5. Java Data Objects
  6. UML and Object Orientation (By Thanassis Tsintsifas)
  7. Webservices (By Thilo Frotscher)
  8. Java Performance Tuning (By Jack Shirazi)

Please let me know via email if you would like more information about the courses that we offer. I don't mind getting lots of emails, so please don't hesitate to email me :-) [the SPAMmers have no qualms in sending me lots of exciting offers]

Resetting ObjectOutputStream

A class with many mysteries is java.io.ObjectOutputStream. For instance, when and why should you reset the stream?

Let's look at an example. First we have class Person, which is the class that we want to send over the network:

public class Person implements java.io.Serializable {
  private final String firstName;
  private final String surname;
  private int age;

  public Person(String firstName, String surname, int age) {
    this.firstName = firstName;
    this.surname = surname;
    this.age = age;
  }

  public String toString() {
    return firstName + " " + surname + ", " + age;
  }

  public void setAge(int age) {
    this.age = age;
  }
}
  

Next we have the code that Receives lots of Person objects and code that Sends them:

import java.net.*;
import java.io.*;

public class Receiver {
  public static void main(String[] args) throws Exception {
    ServerSocket ss = new ServerSocket(7000);
    Socket socket = ss.accept();
    ObjectInputStream ois = new ObjectInputStream(
        socket.getInputStream());
    int count=0;
    while(true) {
      Person p = (Person) ois.readObject();
      if (count++ % 1000 == 0) {
        System.out.println(p);
      }
    }
  }
}


import java.net.Socket;
import java.io.*;

public class Sender {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    Person p = new Person("Heinz", "Kabutz", 0);
    for (int age=0; age < 1500 * 1000; age++) {
      p.setAge(age);
      oos.writeObject(p);
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}

The output was:

java Receiver:
  *snip*
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0

java Sender:
  That took 19548ms
  

When we run this, we will see lots of People objects on the Receiver side, but all the age values will be 0, even though we changed the age on the Sender side. Why is this?

When you construct an ObjectOutputStream and an ObjectInputStream, they each contain a cache of objects that have already been sent across this stream. The cache relies on object identity, rather than the traditional hashing function. It is more similar to a java.util.IdentityHashMap than a normal java.util.HashMap. So, if you resend the same object, only a pointer to the object is sent across the network. This is very clever, and saves network bandwidth. However, the ObjectOutputStream cannot detect whether your object was changed internally, resulting in the Receiver just seeing the same object over and over again. You will notice that this was quite fast. We sent 1'500'000 objects in 19548ms (on my machine). (well, we only sent one object, and 1'499'999 pointers to that object).

There seemed to be some problem with sending the same Person object many times, especially if the contents of that Person changed. Due to the optimisation in ObjectOutputStream, only the pointer to the Person would be sent each time. So, what would happen if we simply sent a new Person each time? Let's try it out...

import java.net.Socket;
import java.io.*;

public class Sender2 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}

This seems to run fine for a while, until we all of a sudden see an OutOfMemory error on both the Receiver and the Sender2. Someone once challenged regarding the pathetic speed of Java. They claimed that Java was so slow that the Garbage Collector could not even keep up with objects that were being read over the network. It sounded strange to me that Java should run out of memory so after some questioning, we traced the problem to the object cache growing in the Receiver and never being cleared. Since the Person objects are always distinct, they are put into the cache on both sides of the ObjectOutputStream. The Receiver's side cannot clear entries from the table, since it does not know which entries the Sender might send again. It then keeps on growing until the JVM runs out of memory.

Resetting ObjectOutputStream

One hack^H^H^H^Hsolution to the OutOfMemory problem is to every time that you send an object also reset the cache on both sides. Let's try out what that does to our performance:

import java.net.Socket;
import java.io.*;

public class Sender3 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
      oos.reset();
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}

When I ran that, it worked without causing any OutOfMemory Errors, so I should be happy. But am I happy? I am old, after having to wait for 314242ms for it to complete, i.e. 16 times longer than with Sender. Sender was fast, but incorrect. Sender2 ran out of memory. Sender3 was correct, but slow. Is there no better way?

The problem with reset() is that it clears the cache of ALL objects, even constants such as the Strings "Heinz" and "Kabutz". So, we end up sending these constants over the network time and time again! Unfortunately the reset() is an all-or-nothing approach, so the entire cache will be lost. But perhaps, if we don't clear it all the time, we can get the advantage of speed and correctness? Let's try that out:

import java.net.Socket;
import java.io.*;

public class Sender4 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
      if (age % 1000 == 0) oos.reset();
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}

Because I don't reset the cache on every call, Sender4 can avoid sending the Strings "Heinz" and "Kabutz" over the network 1'500'000 times in just 66015ms. Infact, it only has to send these Strings 1'500 times. If we reset the ObjectOutputStream too frequently, we will increase the network bandwidth, and if we do not reset it often enough, we will increase the burden of our Garbage Collector. Like all things in Java Performance Tuning, you have to set it to the correct number, not too big and not too little.

What About RMI?

I seem to recall that at some point, RMI used the ObjectOutputStream mechanism to convert the parameters of your functions into a byte[]. The interesting part was that it would make the ObjectOutputStream, write the objects, and then close the ObjectOutputStream. This is akin to resetting the stream each time that you write to it.

Depending on how you would want to transfer your data between two machines, and depending on how many times there will be identical objects sent across the network, it may pay you to use ObjectOutputStreams directly, and be careful to reset the stream before you run out of memory.

In the last newsletter, I suggested that you could use the sun.* classes in your code. I did not emphasize strongly enough that you should be very careful of using sun.* classes in your code, since it would make your Java code non-portable between JVM vendors. This is a newsletter for Java Specialists so I will sometimes leave out such obvious details :-) However, several readers mentioned that you could achieve the same with a SecurityManager, which I had forgotten about. I guess if you were not able to use the SecurityManager, you could generate a stack trace and find out who called you. However, generating a stack trace would be rather inefficient (another obvious fact that is hardly worth mentioning ;-)

I want to personally thank you for taking the time to read my newsletters. They are a wonderful hobby for me and I thoroughly enjoy publishing them as a free resource to other Java Specialists. Please remember to forward them to friends, mention them on mailing lists, tell colleagues, etc. so that others can also enjoy them :-)

Lastly, I am collecting quotes of what my happy readers think of my newsletter. If you have some nice words that would make others subscribe to The Java(tm) Specialists' Newsletter, would you please send them to me?

Kind regards

Heinz