Learning Modules > Visualizing Time Series Data Description | Usage Hints | Learning Task | Discussion | References | Acknowledgments A time series is a sequence of events/observations which are ordered in one dimension- time. If successive observations depend on each other, it may make sense to display them in a [time] sorted fashion, e.g., as a scatter plot. Alternatively, one might want to know how many observations of a certain value have been made. In this case, one would sort the observations by value, count the number of observations for each value and derive a histogram. Time series data can be continuous, i.e., there is an observation at every instant of time (see figure below), or discrete, i.e., observations exist for regularly or irregularly spaced intervals.
Time series are recorded, analyzed and used in diverse domains of science. Check out the Time Series Data Library maintained by Rob Hyndman and Muhammad Akram for numerous data sets from a variety of fields: Agriculture, Chemistry, Crime, Demography, Ecology, Finance, Health, Hydrology, Industry, the Labor market, Macroeconomics, Meteorology, Microeconomics, Physics, Production, Sales, Simulated series, Sports, Transport & Tourism and Utilities.
The InfoVis code repository provides two algorithms to analyze and interactively visualize time series data. 1.) Burst Detection Currently, the burst code can only be run in stand alone mode. We are in the process of making it available via the XML toolkit. The burst code utilizes a C program for efficiency reasons. Therefore, you can only run it under Unix. You can run the code on a data file such as 'burst_sample_data' stored
in the 'IVR/data' directory by going to the main '/IVR' directory and
typing 'ant burst'. This will run a perl program that converts the input
file into the file format required by the burst code. The burst code
is started and will result in a file that has the form Note: If you feed the burst code a too short time slice then it is very likely that there will be no bursts detectable. If you would like to analyze other data sets then check out Jon Kleinberg's sample results from the burst detection algorithm. 2.) Time Searcher TimeSearcher is a stand alone program that combines diverse visualizations for the analysis of time series data. It will not be integrated into the XML toolkit. If you are working on a Unix machine then go to the main '/IVR' directory and type 'ant timesearcher'. If you are working on a DOS machine and you are remotely logged into a Unix machine, then 'ant timesearcher' will not be able to open a window. You will have to use 'timesearcher.bat' in the '/IVR/bin' directory to run the software.
There are many time series data sets available from everyday
activities: Streams of emails arrive in your mailbox; Many papers are
published every day; People continuously log into and exit software systems
or real world places. Interestingly, most of those streams are threaded,
e.g., emails follow up on each other, papers cite each other, and people
follow their peers. Almost all of these streams exhibit times with little
or no activity and times with heavy activity. The identification and visualization
of the number and magnitude of threads or existing activity patterns often
results in surprising discoveries. For example, one could examine an increase
in SPAM and identify when most of the SPAM messages arrive. Alternatively,
one might be curious to identify activity patterns in project-related
communications. To prepare your own data sets for TimeSearcher, please follow
the instruction below: If you want to generate the TimeSearchable readable file by yourself, please make sure your data fields do not contain characters such as parenthesis, comma, and/or space.
In this learning module you applied a data analysis algorithm and a visualization algorithms to make sense of a time series data set. Answer the subsequent questions for yourself:
See references for the burst detection algorithm and TimeSearcher.
This documentation was compiled by Katy Börner and Bruce William Herr. We would like to thank Ketan Mane, Ning Yu, and Sidharth Thakur for their help in preparing the sample data sets and parsers.
|