Parallel Sets
Parallel Sets (ParSets) is a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. ParSets provide a simple, interactive way to explore and analyze such data.
Even though the screenshots here show the Mac version, the program also runs on Windows and Linux. Links to the executables are in the Download Section.
Basic Operation
To open an existing dataset, select it in the list and either double-click it or click the Open button. The left tab switches to a list of dimensions in the data set. Click the checkboxes next to the name of a dimension to add it to the display.
The horizontal bars in the visualization show the absolute frequency of how often each category occurred: in this example, the top line shows the distribution between the passenger classes on the Titanic and the crew. Notice that the crew was the largest group of people, larger even than the Third Class! Within the passengers, the Third Class is of course the largest, with the First Class being second, and the Second Class the smallest.
The middle dimension shows a male to female ratio of almost 4 to 1. The bottom dimension, survival, gives you an impression of how many people survived the disaster: about 1/3 survived, 2/3 died.
Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (say, the women in Third Class) can be further subdivided (e.g., into those who survived and those who did not).
Interaction
Move your mouse over the display to see the tooltip telling you more about the data. The bars tell you absolute numbers and percentages for each category as a fraction of the entire data set. When you move the mouse over a ribbon, it shows you what combination of criteria that ribbon represents, and also absolute and relative numbers

Categories can be rearranged by clicking and holding the mouse on one and dragging it around. The other categories will move out of its way so you can place it wherever you want. Dimensions can be reordered in a similar way by clicking and dragging a dimension's label.

In addition to manually reordering categories, you can also sort them alphabetically or by size, in either ascending or descending order. When the mouse hovers over a dimension's label (or to the right from it), two small "buttons" appear. Move the mouse over them to highlight the desired sorting, and click. The categories will change their order as needed.

Clicking the little triangle next to a dimension name in the list of dimensions will show all its categories. Individual categories can be turned on and off using their checkboxes, allowing you to filter out data that is not currently of interest.
Downloading Online Data Sets
Clicking on the Online Data tab on the far left brings up a list of data sets that can be downloaded. These currently include mostly U.S. census data, but more will be coming soon. Simply select a data set and click the Download button at the bottom. A progress bar will appear below the button, indicating activity. Once the data set it downloaded, it appears in the list of local datasets under the Database tab.
Importing Your Own Data
You can import your own data into the local database by clicking the Import button on the Database tab. The only data format that is currently supported is CSV (comma-separated values). This is a simple text format that can be exported from almost any program (Excel, Numbers, etc.).
The importer assumes that the first line will contain column headings, and that there is nothing other than the data in the file. If the import fails, this is the most likely cause.
Clicking the import button opens the CSV Import window and a file requester where you can pick the file to be imported. Once the file is read in, the window will show the data it has found, plus additional information you can fill out.
In the top panel, you can give the dataset a name and a section, under which it will appear in the list. The Source and URL fields are for linking back to where the data came from (especially for data that was downloaded), and can be left empty.
The Data part lists the data as it appears in the data file. In many cases (like in the census data shown above), the values there are abbreviated or encoded. To map those codes to readable labels, select a column in the table and then edit the values in the list on the right. The human-readable name of the dimension can also be changed there. The key at the top is the column header that appears in the file.
This is also where the data type can be changed. For numerical data (i.e., data that makes sense as a number, such as measurements, money amounts, etc.), this should be set accordingly. Importing numerical columns as categories leads to very large data files and the resulting data sets are not very useful.
Once all the data items are properly described, click Save to DB. Depending on the dataset size, importing can take anywhere from a second or two to several minutes.
There is currently no way to exclude a column from the import. Columns with a large number (100 or more) categories should not be imported, or imported as numbers. Very large data sets (in the millions of records with dozens of dimensions and many categories per dimension) cause the program to become slow. Once the dimensions are added to the display, interaction is fast, but adding and removing dimensions can take some time. There are no strict limits on the size of data sets or the number of dimensions, but very complex data sets can cause the program to run out of memory while importing. If that happens, try removing dimensions before you import.
Data sets cannot currently be edited once they have been imported. The only option is to delete the data set from the Database tab, and import again.
Should the database become corrupted, it can be re-initialized using the Reinitialize DB option from the Data Set menu. A warning is shown before the operation, as it removes all data from the database.

Crash Reports
While we have extensively tested the program, it is certainly not entirely bug-free. If the program encounters a fatal crash, it will show a dialog asking you if you want it to submit a crash report. These reports contain some technical information about your system (your operating system, Java version, etc.), log messages the program creates at certain points, and the reason for the crash. There is no personal information and none of your imported data in a crash report.
If you decide to let the program submit the report, it will give you a reference code once the data is uploaded (which should only take a second or two). You can use this code to refer to your crash report when you contact us (it's always useful to know what you did when the program was crashing).
Program Updates
The program checks for updates when it is started. If it finds a new program version, it will tell you what is new, and offer to take you to this website to download it. We expect to release a new version about every one or two months.
We strongly encourage you to upgrade to new versions as soon as possible. This is academic software, we do not have a large development team, and much less quality assurance division. New versions fix bugs and introduce new features. They might also change the database schema, which means that you will no longer be able to download online datasets (new versions will always be backwards-compatible, i.e., they will be able to read older databases, but older programs will not be able to access data in new databases).
Requirements
Windows: Java 6 must be installed. The program was tested on Windows XP, and it should run on Vista and Windows 7, as well (let us know if it works!).
Mac OS X: Requires a recent version of Mac OS X 10.5 (Leopard) on a 64-bit architecture. This includes all but the first generation of Intel-based Macs, and anything running on a G5. The program uses Java 6, which is only available in a 64-bit flavor.
Linux: The program should run, but has not been tested on Linux at this point. We will support Linux on x86 with more testing and a binary package soon.
Download
Parallel Sets for Mac OS X and Windows is available from the download page.
Source Code
This is an open source program, which means that the program code is freely available. It can be checked out from the repository using the Mercurial distributed version control system. There is a description how to do that on our Google Code page.
The program is written in Java, using a number of open-source packages and OpenGL (through JOGL). At this point, code documentation is rather scarce, but we are working on improving that.
Questions, Bugs, Feature Requests
For tracking bugs and submitting feature requests, please use the Issue Tracker on the Google Code page. Google's tracker is quite user-friendly, and don't worry if you can't fill out all the details - we'll ask if necessary.
For questions, you can leave a comment below or use the contact form to send an email.
Authors, Acknowledgments
The majority of the current program was developed by Robert Kosara and Caroline Ziemkiewicz. The program includes some code from a previous version that Shree D. Chhatwal and Shilpa Sharma contributed to.
Parts of this program were developed with support from the National Visualization and Analytics Center (NVAC), a U.S. Department of Homeland Security Program, under the auspices of the SouthEast Regional Visualization and Analytics Center (SRVAC).


Comments
Par Sets
I have started using the programme for my work, what i really wnat to be able to do is export the chart to a word document: what is the best way to do that ?
Screenshot in 2.1
Version 2.1 will have a screenshot function that saves a PNG. It's not the perfect solution, but it is at least a first step. 2.1 will be released next week.
Thanks!
I just wanted to send a quick thanks your way, this is a nifly little app that encourages me to play around with making my own.
You mentioned basic export capabilities for the next version, would it somehow be possible that the respective values also be shown next to each category? I am really eager to use Parallel Sets in some presentations and would love to see this functionality.
Again, thanks of for releasing this!
-M
thank you!!
Wow I have been in love with parallel sets ever since I saw your beautiful Titanic graph in the BBC article on understanding risk earlier this year. I was just about to start making one that looks like that, but was so so so happy to stumble on your webpage and the new release! I cannot thank you enough for making this accessible!
small bug
So when I import a csv with less than 10 rows, it can't separate the columns. Not a big deal but thought I'd pass it along. Love the functionality--things work so smoothly!
We never tested with so little data
That is something that I had kind of thought might happen, but we never tested that. We have rewritten a good part of the data import for the next release (coming up very soon); I'll make sure to test it with a small dataset to make sure we don't have that same bug there, as well.
Font unreadeable
Hi, first af all, thanks for making this great soft avaliable for everyone.
I've a problem. The font which should show the information on screen is dissapeared or something is wrong with it. I've uploaded a screenshot of what parallel-sets displays to me:
Thanks again!
Álvaro
Very odd
I'll email you to get some more information. This is very strange, I've never seen it do that. If anybody else has seen this problem, please let us know!
Feature Request
Awesome visualization/analysis tool! If I understand correctly how this works, the horizontal bars show the absolute frequency of each category only. It would be very powerful to have the option of displaying the horizontal bars based on another value in the database. I'd like to know the split of the dimension categories based on this total value vs. the frequency. For example, if my database columns include Publisher, Author, and $Sales, what is the sales split between Publishers and Authors?