<?xml version="1.0" encoding="UTF-8"?>
<page>
  <created-at type="datetime">2009-05-06T03:04:34Z</created-at>
  <description></description>
  <id type="integer">1545</id>
  <name>Home</name>
  <number type="integer">12</number>
  <person-id type="integer">5777</person-id>
  <text>= List Diffs =

This is a simple library which has been used in &lt;a href=&quot;http://netbeans.org&quot;&gt;NetBeans&lt;/a&gt; for years.  It does one thing:  Generate Diffs between two &lt;code&gt;java.util.List&lt;/code&gt; instances.

Why is that useful?  There are many APIs that take &lt;code&gt;List&lt;/code&gt;s and it is not uncommon to write code that wants to respond to updates to a list.  Typically this ends up being handled by creating wrapper lists, or observable list subclasses.  But sometimes you are being given a list by a library you are calling - you cannot change the type of it, or wrap it in a List implementation that will tell you about mutations to the list.

That's where this library comes in.

== Usage ==
There are two classes and one enum to be concerned with:
* Diff - a diff encapsulates a set of changes.  It has three methods
** &lt;code&gt;getOld()&lt;/code&gt; gets the original ist
** &lt;code&gt;getNew()&lt;/code&gt; gets the new list
** &lt;code&gt;getChanges()&lt;/code&gt; gets a list of &lt;code&gt;Change&lt;/code&gt; objects representing insertions, deletions and replacements indexed by start and end position
* Change - A change is an insertion, deletion or replacement.  It has a start and end index. (Note that not all algorithms even have a concept of a replacement - any replacement can be accurately described as a deletion and an insertion - but the concept is more human-friendly).
* Algorithm - an enum of algorithms that the diff library can use to construct your diff

You get a Diff instance by calling one of the static methods of the Diff class, i.e.
&lt;pre&gt;
List&lt;T&gt; old, nue;
...
Diff diff = Diff.create (old, nue, Algorithm.ITERATIVE);
&lt;/pre&gt;

== Swing JLists and the Diff Library ==
This library also includes a subproject, Swing List Diffs, which provides a &lt;code&gt;javax.swing.ListModel&lt;/code&gt; that is driven by &lt;code&gt;java.util.List&lt;/code&gt;s and internally uses the Diff library to generate &lt;code&gt;ListModelEvent&lt;/code&gt;.  Creating a JList that shows the contents of a &lt;code&gt;List&lt;/code&gt; is as simple as 
&lt;pre&gt;JDiffList theJList = new JDiffList (someList);&lt;/pre&gt;
and updating it is as simple as 
&lt;pre&gt;theJList.setModel (someOtherList);&lt;/pre&gt;

Rather than tracking changes in one list, you simply pass in a new list when you want to change the contents of the JList's model.  The model will preserve selection and fire appropriate events based on what changes have occurred.

Note there is no requirement to use &lt;code&gt;JDiffList&lt;/code&gt; - you can use any &lt;code&gt;JList&lt;/code&gt; and a &lt;code&gt;ListListModel&lt;/code&gt; as its model.  &lt;code&gt;JDiffList&lt;/code&gt; simply contains some convenience methods, uses a generic type parameter so you can get the selection without having to cast, and has some protections against accidentally being passed a standard Swing &lt;code&gt;ListModel&lt;/code&gt; or the wrong object type.

== Limitations of Diffs ==

This library is not a solution for all problems.  Creating a diff between two lists means iterating every element of both lists at least once, usually more times.  If you are dealing with lists of thousands of elements, probably you need a solution where whatever is creating the lists tells you about the changes made, rather than using a diff to figure out what happened ex-post-facto.  Diffing is a problem that by definition, does not scale well.  Used judiciously, however, it can save programming time and solve a certain subset of problems where you simply cannot find out what happened to one list without comparing it with another one.

Diffs are by definition fuzzy.  Say that you have the sequence ''A,B,C,D,E''.  You are given a new sequence, ''A,X,D,E''.  The changes can be described in a number of ways, each accurate - and different diff algorithms will produce different results:
* B and C were deleted at index 1.  X was inserted at index 1.
* B was replaced with X at index 1.  C was deleted.
* C was replaced with X at index 2.  B was deleted.
* A,B,C,D,E was replaced with A,X,D,E
Which one is the truth?  What '''really''' happened?

A computer will never tell you.  

The definition of an accurate diff is that it can be reversed and produce the original sequence.  There is no ''what really happened'' in the world of diffs, only accuracy, which in practice means that the diff can be reversed.  Different algorithms have different trade-offs of performance versus how minimal a diff they generate, and what patterns of mutations they work best with, either in terms of performance or resulting diff complexity.

If you find this topic interesting, a much more thorough discussion can be found [http://en.wikipedia.org/wiki/Longest_common_subsequence_problem in Wikipedia].

== Algorithms ==

There are currently two algroithms implemented:
* ''Iterative'' - This is a relatively fast diff algorithm for cases where there are typically a few insertions or deletions, not wholesale reordering of the list contents.  It has the drawback that it does not tolerate duplicate elements in the lists it is diffing.  It essentially just runs two iterators in parallel, continuing with one when a difference is encountered until it encounters the same element the other is on.  Performance will fall off with the complexity of the diff.  This algorithm will not be found in any academic paper, but was the result of the [http://weblogs.java.net/blog/timboudreau/ original project author] thinking way too hard about the problem for far too long.
* ''Longest Common Subsequence'' - Martin Matula, Tomas Hurka and Pavel Flaska's implementation from the NetBeans Java module, adapted to the Diff API.  Origin of the specific algorithm used is unknown.  Tolerates duplicates and performs reasonably.

Contributions of other diff algorithms (with tests!) are welcome.

== Proving Accuracy ==

The following code should work with any implementation of Diff to prove that it is accurate.  What it does is simply make a copy of the original list, perform all of the changes the diff describes to that original list, and compare the result with the new list the diff describes.
&lt;pre&gt;
 List &amp;lt;T&amp;gt; list = new ArrayList&amp;lt;T&amp;gt; (diff.getOld());
 List &amp;lt;T&amp;gt; target = diff.getNew();
 List &amp;lt;Change&amp;gt; changes = diff.getChanges();
 for (Change change : changes) {
     int start = change.getStart();
     int end = change.getEnd();
     switch (change.getType()) {
       case Change.CHANGE :
         for (int i=start; i &lt;= end; i++) {
             list.set (i, target.get(i));
         }
         break;
       case Change.INSERT :
         int ct = 0;
         for (int i=end; i &gt;= start; i--) {
             Object o = target.get(i);
             list.add(start, o);
         }
         break;
       case Change.DELETE :
         for (int i=end; i &gt;= start; i--) {
             list.remove(i);
         }
         break;
     }
}
assert list.equals(target);
&lt;/pre&gt;
The minimal requirement for any new algorithm implementations is that they come with a test that shows that the above code works for lists of reasonable complexity.

== History ==
This library was originally created by Tim Boudreau as part of the Navigator component in NetBeans 4.0 - as the user edits a file, the list of class members changes.  The Java parser hands out a new list of class members (fields, methods, etc.);  the only way to determine what changed is to compare it with the previous list.  It is also used in the &lt;a href=&quot;http://nbwicketsupport.dev.java.net&quot;&gt;NetBeans Wicket Support&lt;/a&gt; plugin, to compare trees of wicket IDs in HTML markup with trees of wicket IDs in Java source code, to determine if the Java component hierarchy matches the HTML tag hierarchy;  it is also used in miscellaneous other projects.</text>
  <text-as-html>&lt;h1&gt;&lt;a name='List_Diffs'&gt;&lt;/a&gt; List Diffs &lt;/h1&gt;
&lt;p&gt;
This is a simple library which has been used in &lt;a href=&quot;http://netbeans.org&quot;&gt;NetBeans&lt;/a&gt; for years.  It does one thing:  Generate Diffs between two &lt;code&gt;java.util.List&lt;/code&gt; instances.

&lt;/p&gt;&lt;p&gt;Why is that useful?  There are many APIs that take &lt;code&gt;List&lt;/code&gt;s and it is not uncommon to write code that wants to respond to updates to a list.  Typically this ends up being handled by creating wrapper lists, or observable list subclasses.  But sometimes you are being given a list by a library you are calling - you cannot change the type of it, or wrap it in a List implementation that will tell you about mutations to the list.

&lt;/p&gt;&lt;p&gt;That's where this library comes in.

&lt;/p&gt;&lt;h2&gt;&lt;a name='Usage'&gt;&lt;/a&gt; Usage &lt;/h2&gt;
&lt;p&gt;There are two classes and one enum to be concerned with:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt; Diff - a diff encapsulates a set of changes.  It has three methods
&lt;ul&gt;&lt;li&gt; &lt;code&gt;getOld()&lt;/code&gt; gets the original ist
&lt;/li&gt;&lt;li&gt; &lt;code&gt;getNew()&lt;/code&gt; gets the new list
&lt;/li&gt;&lt;li&gt; &lt;code&gt;getChanges()&lt;/code&gt; gets a list of &lt;code&gt;Change&lt;/code&gt; objects representing insertions, deletions and replacements indexed by start and end position
&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt; Change - A change is an insertion, deletion or replacement.  It has a start and end index. (Note that not all algorithms even have a concept of a replacement - any replacement can be accurately described as a deletion and an insertion - but the concept is more human-friendly).
&lt;/li&gt;&lt;li&gt; Algorithm - an enum of algorithms that the diff library can use to construct your diff
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
You get a Diff instance by calling one of the static methods of the Diff class, i.e.
&lt;pre&gt;
List&amp;lt;T&amp;gt; old, nue;
...
Diff diff = Diff.create (old, nue, Algorithm.ITERATIVE);
&lt;/pre&gt;

&lt;/p&gt;&lt;h2&gt;&lt;a name='Swing_JLists_and_the_Diff_Library'&gt;&lt;/a&gt; Swing JLists and the Diff Library &lt;/h2&gt;
&lt;p&gt;This library also includes a subproject, Swing List Diffs, which provides a &lt;code&gt;javax.swing.ListModel&lt;/code&gt; that is driven by &lt;code&gt;java.util.List&lt;/code&gt;s and internally uses the Diff library to generate &lt;code&gt;ListModelEvent&lt;/code&gt;.  Creating a JList that shows the contents of a &lt;code&gt;List&lt;/code&gt; is as simple as 
&lt;pre&gt;JDiffList theJList = new JDiffList (someList);&lt;/pre&gt;
and updating it is as simple as 
&lt;pre&gt;theJList.setModel (someOtherList);&lt;/pre&gt;

&lt;/p&gt;&lt;p&gt;Rather than tracking changes in one list, you simply pass in a new list when you want to change the contents of the JList's model.  The model will preserve selection and fire appropriate events based on what changes have occurred.

&lt;/p&gt;&lt;p&gt;Note there is no requirement to use &lt;code&gt;JDiffList&lt;/code&gt; - you can use any &lt;code&gt;JList&lt;/code&gt; and a &lt;code&gt;ListListModel&lt;/code&gt; as its model.  &lt;code&gt;JDiffList&lt;/code&gt; simply contains some convenience methods, uses a generic type parameter so you can get the selection without having to cast, and has some protections against accidentally being passed a standard Swing &lt;code&gt;ListModel&lt;/code&gt; or the wrong object type.

&lt;/p&gt;&lt;h2&gt;&lt;a name='Limitations_of_Diffs'&gt;&lt;/a&gt; Limitations of Diffs &lt;/h2&gt;
&lt;p&gt;
This library is not a solution for all problems.  Creating a diff between two lists means iterating every element of both lists at least once, usually more times.  If you are dealing with lists of thousands of elements, probably you need a solution where whatever is creating the lists tells you about the changes made, rather than using a diff to figure out what happened ex-post-facto.  Diffing is a problem that by definition, does not scale well.  Used judiciously, however, it can save programming time and solve a certain subset of problems where you simply cannot find out what happened to one list without comparing it with another one.

&lt;/p&gt;&lt;p&gt;Diffs are by definition fuzzy.  Say that you have the sequence &lt;i&gt;A,B,C,D,E&lt;/i&gt;.  You are given a new sequence, &lt;i&gt;A,X,D,E&lt;/i&gt;.  The changes can be described in a number of ways, each accurate - and different diff algorithms will produce different results:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt; B and C were deleted at index 1.  X was inserted at index 1.
&lt;/li&gt;&lt;li&gt; B was replaced with X at index 1.  C was deleted.
&lt;/li&gt;&lt;li&gt; C was replaced with X at index 2.  B was deleted.
&lt;/li&gt;&lt;li&gt; A,B,C,D,E was replaced with A,X,D,E
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Which one is the truth?  What &lt;b&gt;really&lt;/b&gt; happened?

&lt;/p&gt;&lt;p&gt;A computer will never tell you.  

&lt;/p&gt;&lt;p&gt;The definition of an accurate diff is that it can be reversed and produce the original sequence.  There is no &lt;i&gt;what really happened&lt;/i&gt; in the world of diffs, only accuracy, which in practice means that the diff can be reversed.  Different algorithms have different trade-offs of performance versus how minimal a diff they generate, and what patterns of mutations they work best with, either in terms of performance or resulting diff complexity.

&lt;/p&gt;&lt;p&gt;If you find this topic interesting, a much more thorough discussion can be found &lt;a class='external' href=&quot;http://en.wikipedia.org/wiki/Longest_common_subsequence_problem&quot;&gt;in Wikipedia&lt;/a&gt;.

&lt;/p&gt;&lt;h2&gt;&lt;a name='Algorithms'&gt;&lt;/a&gt; Algorithms &lt;/h2&gt;
&lt;p&gt;
There are currently two algroithms implemented:
&lt;/p&gt;&lt;ul&gt;&lt;li&gt; &lt;i&gt;Iterative&lt;/i&gt; - This is a relatively fast diff algorithm for cases where there are typically a few insertions or deletions, not wholesale reordering of the list contents.  It has the drawback that it does not tolerate duplicate elements in the lists it is diffing.  It essentially just runs two iterators in parallel, continuing with one when a difference is encountered until it encounters the same element the other is on.  Performance will fall off with the complexity of the diff.  This algorithm will not be found in any academic paper, but was the result of the &lt;a class='external' href=&quot;http://weblogs.java.net/blog/timboudreau/&quot;&gt;original project author&lt;/a&gt; thinking way too hard about the problem for far too long.
&lt;/li&gt;&lt;li&gt; &lt;i&gt;Longest Common Subsequence&lt;/i&gt; - Martin Matula, Tomas Hurka and Pavel Flaska's implementation from the NetBeans Java module, adapted to the Diff API.  Origin of the specific algorithm used is unknown.  Tolerates duplicates and performs reasonably.
&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;
Contributions of other diff algorithms (with tests!) are welcome.

&lt;/p&gt;&lt;h2&gt;&lt;a name='Proving_Accuracy'&gt;&lt;/a&gt; Proving Accuracy &lt;/h2&gt;
&lt;p&gt;
The following code should work with any implementation of Diff to prove that it is accurate.  What it does is simply make a copy of the original list, perform all of the changes the diff describes to that original list, and compare the result with the new list the diff describes.
&lt;pre&gt;
 List &amp;amp;lt;T&amp;amp;gt; list = new ArrayList&amp;amp;lt;T&amp;amp;gt; (diff.getOld());
 List &amp;amp;lt;T&amp;amp;gt; target = diff.getNew();
 List &amp;amp;lt;Change&amp;amp;gt; changes = diff.getChanges();
 for (Change change : changes) {
     int start = change.getStart();
     int end = change.getEnd();
     switch (change.getType()) {
       case Change.CHANGE :
         for (int i=start; i &amp;lt;= end; i++) {
             list.set (i, target.get(i));
         }
         break;
       case Change.INSERT :
         int ct = 0;
         for (int i=end; i &amp;gt;= start; i--) {
             Object o = target.get(i);
             list.add(start, o);
         }
         break;
       case Change.DELETE :
         for (int i=end; i &amp;gt;= start; i--) {
             list.remove(i);
         }
         break;
     }
}
assert list.equals(target);
&lt;/pre&gt;
The minimal requirement for any new algorithm implementations is that they come with a test that shows that the above code works for lists of reasonable complexity.

&lt;/p&gt;&lt;h2&gt;&lt;a name='History'&gt;&lt;/a&gt; History &lt;/h2&gt;
&lt;p&gt;This library was originally created by Tim Boudreau as part of the Navigator component in NetBeans 4.0 - as the user edits a file, the list of class members changes.  The Java parser hands out a new list of class members (fields, methods, etc.);  the only way to determine what changed is to compare it with the previous list.  It is also used in the &lt;a href=&quot;http://nbwicketsupport.dev.java.net&quot;&gt;NetBeans Wicket Support&lt;/a&gt; plugin, to compare trees of wicket IDs in HTML markup with trees of wicket IDs in Java source code, to determine if the Java component hierarchy matches the HTML tag hierarchy;  it is also used in miscellaneous other projects.&lt;/p&gt;</text-as-html>
  <updated-at type="datetime">2009-05-06T04:27:27Z</updated-at>
  <wiki-id type="integer">6333</wiki-id>
</page>
