Parallel Sets

Parallel Sets

Parallel Sets (ParSets) is a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. ParSets provide a simple, interactive way to explore and analyze such data.

Even though the screenshots here show the Mac version, the program also runs on Windows and Linux. Links to the executables are in the Download Section.

Parallel Sets

Basic Operation

To open an existing dataset, select it in the list and either double-click it or click the Open button. The left tab switches to a list of dimensions in the data set. Click the checkboxes next to the name of a dimension to add it to the display.

ParSets showing Titanic data

The horizontal bars in the visualization show the absolute frequency of how often each category occurred: in this example, the top line shows the distribution between the passenger classes on the Titanic and the crew. Notice that the crew was the largest group of people, larger even than the Third Class! Within the passengers, the Third Class is of course the largest, with the First Class being second, and the Second Class the smallest.

The middle dimension shows a male to female ratio of almost 4 to 1. The bottom dimension, survival, gives you an impression of how many people survived the disaster: about 1/3 survived, 2/3 died.

Between the dimension bars are ribbons that connect categories and split up. This shows you how combinations of categories are distributed, and how a particular subset (say, the women in Third Class) can be further subdivided (e.g., into those who survived and those who did not).

Interaction

Move your mouse over the display to see the tooltip telling you more about the data. The bars tell you absolute numbers and percentages for each category as a fraction of the entire data set. When you move the mouse over a ribbon, it shows you what combination of criteria that ribbon represents, and also absolute and relative numbers

Bar and ribbon tooltips

Categories can be rearranged by clicking and holding the mouse on one and dragging it around. The other categories will move out of its way so you can place it wherever you want. Dimensions can be reordered in a similar way by clicking and dragging a dimension’s label.

Moving a category

In addition to manually reordering categories, you can also sort them alphabetically or by size, in either ascending or descending order. When the mouse hovers over a dimension’s label (or to the right from it), two small “buttons” appear. Move the mouse over them to highlight the desired sorting, and click. The categories will change their order as needed.

Reordering categories

Clicking the little triangle next to a dimension name in the list of dimensions will show all its categories. Individual categories can be turned on and off using their checkboxes, allowing you to filter out data that is not currently of interest.

Filtering categories

Downloading Online Data Sets

Clicking on the Online Data tab on the far left brings up a list of data sets that can be downloaded. These currently include mostly U.S. census data, but more will be coming soon. Simply select a data set and click the Download button at the bottom. A progress bar will appear below the button, indicating activity. Once the data set it downloaded, it appears in the list of local datasets under the Database tab.

Online Data Sets

Importing Your Own Data

You can import your own data into the local database by clicking the Import button on the Database tab. The only data format that is currently supported is CSV (comma-separated values). This is a simple text format that can be exported from almost any program (Excel, Numbers, etc.).

The importer assumes that the first line will contain column headings, and that there is nothing other than the data in the file. If the import fails, this is the most likely cause.

Clicking the import button opens the CSV Import window and a file requester where you can pick the file to be imported. Once the file is read in, the window will show the data it has found, plus additional information you can fill out.

CSV Import

In the top panel, you can give the dataset a name and a section, under which it will appear in the list. The Source and URL fields are for linking back to where the data came from (especially for data that was downloaded), and can be left empty.

The Data part lists the data as it appears in the data file. In many cases (like in the census data shown above), the values there are abbreviated or encoded. To map those codes to readable labels, select a column in the table and then edit the values in the list on the right. The human-readable name of the dimension can also be changed there. The key at the top is the column header that appears in the file.

This is also where the data type can be changed. For numerical data (i.e., data that makes sense as a number, such as measurements, money amounts, etc.), this should be set accordingly. Importing numerical columns as categories leads to very large data files and the resulting data sets are not very useful.

Once all the data items are properly described, click Save to DB. Depending on the dataset size, importing can take anywhere from a second or two to several minutes.

There is currently no way to exclude a column from the import. Columns with a large number (100 or more) categories should not be imported, or imported as numbers. Very large data sets (in the millions of records with dozens of dimensions and many categories per dimension) cause the program to become slow. Once the dimensions are added to the display, interaction is fast, but adding and removing dimensions can take some time. There are no strict limits on the size of data sets or the number of dimensions, but very complex data sets can cause the program to run out of memory while importing. If that happens, try removing dimensions before you import.

Data sets cannot currently be edited once they have been imported. The only option is to delete the data set from the Database tab, and import again.

Should the database become corrupted, it can be re-initialized using the Reinitialize DB option from the Data Set menu. A warning is shown before the operation, as it removes all data from the database.

Reinitialize DB warning

Crash Reports

While we have extensively tested the program, it is certainly not entirely bug-free. If the program encounters a fatal crash, it will show a dialog asking you if you want it to submit a crash report. These reports contain some technical information about your system (your operating system, Java version, etc.), log messages the program creates at certain points, and the reason for the crash. There is no personal information and none of your imported data in a crash report.

If you decide to let the program submit the report, it will give you a reference code once the data is uploaded (which should only take a second or two). You can use this code to refer to your crash report when you contact us (it’s always useful to know what you did when the program was crashing).

Program Updates

The program checks for updates when it is launched. If it finds a new program version, it will tell you what is new, and offer to take you to this website to download it. We expect to release a new version about every one or two months.

We strongly encourage you to upgrade to new versions as soon as possible. This is academic software, we do not have a large development team, and much less quality assurance division. New versions fix bugs and introduce new features. They might also change the database schema, which means that you will no longer be able to download online datasets (new versions will always be backwards-compatible, i.e., they will be able to read older databases, but older programs will not be able to access data in new databases).

Requirements

  • Windows: Java 6 must be installed. The program was tested on Windows XP, and it should run on Vista and Windows 7, as well (let us know if it works!).
  • Mac OS X: Requires at least Mac OS X 10.5 (Leopard) on a 64-bit architecture. This includes all but the first generation of Intel-based Macs, and anything running on a G5. The program uses Java 6, which is only available in a 64-bit flavor.
  • Linux: The program should run, but has not been tested on Linux at this point. We will support Linux on x86 with more testing and a binary package soon.

Download

Parallel Sets for Mac OS X and Windows is available from the download page.

Source Code

This is an open source program, which means that the program code is freely available. It can be checked out from the repository using the Mercurial distributed version control system. There is a description how to do that on our Google Code page.

The program is written in Java, using a number of open-source packages and OpenGL (through JOGL). At this point, code documentation is rather scarce, but we are working on improving that.

Questions, Bugs, Feature Requests

For tracking bugs and submitting feature requests, please use the Issue Tracker on the Google Code page. Google’s tracker is quite user-friendly, and don’t worry if you can’t fill out all the details – we’ll ask if necessary.

For questions, you can leave a comment below or use the contact form to send an email.

Authors, Acknowledgments

The majority of the current program was developed by Robert Kosara and Caroline Ziemkiewicz. The program includes some code from a previous version that Shree D. Chhatwal and Shilpa Sharma contributed to.

Parts of this program were developed with support from the National Visualization and Analytics Center (NVAC), a U.S. Department of Homeland Security Program, under the auspices of the SouthEast Regional Visualization and Analytics Center (SRVAC).