Data Summary Set

S. Marka, B. Mours, V. Sannibale
Draft
October 4, 2000


1. Motivations

The purpose of the Data Summary Set (DSS) is to provide a small data set (around 10 Kbytes/seconds per site, or 1 GBytes/day) containing the main GW channel(s) plus the description of the running conditions. The target size is such that the data should be easy to transfer via the Internet to be used for real time burst search and network analysis. Such a size made possible also to keep month of data on spinning media to be used for detector investigation or astrophysical search. This data set is close to the Level 3 data type defined in the LSC White Paper on Data Analysis, but in addition to the GW channel, it contain monitoring information which should account for the same volume of data (around 5 Kbytes/sec). This additional information will be critical in the first stage of the data analysis when we will be searching for the (detector) origin of the observed events.

The monitoring information will consist of three sets:

The design of the Data Summary Set should support multiple interferometers. This is done by naming convention (prefix).

Please notice that this document is only an attempt to define such a data set. The initial goal is to prototype it in order to check the validity of this approach.


2. File Organization

To be compatible with the various software environments, data will be stored in frames. Since storage is a key issue, the files will contain many frames. This allow us to use static data in an efficient way, to have small frames plus global information. To be efficient the file should be at least several minutes long.
If we use 1000 frames per file (about 15 minutes), we will have 10 Mbytes files, 85 file per day. For 10000 frames per files (about 3 hours), the file is still manageable (100 Mbytes) and we have only 8 files per day.

If we are succesfull by keeping the data rate bellow 8 kBytes/day, one day of data will fit on one CDROM or one week of data will fit on a DVD.

The frame file will be labeled   X-GPSTIME.DSS were X is the site (H or L) and GPSTIME is the GPS time for the first frame in the file. In the case of frame containing data from several site, the prefix fill be N.


3. Channel Summary Information

There are two types of channels. The fast ones with sampling rate of 256 Hz or above and the slow ones (16 Hz or bellow). We define also a third type of channels: the key channel for which we want a better description than only 5 numbers

Fast Channels. For each fast channel, the following values will be computed and stored:

Slow Channels. For each slow channel we compute the mean value (average of the 16 values). Like for the fast channels, we store the variation of the mean value express in percent of the rms value.

Key Channels: For a limited number of fast key channels (<40 channels, list TBD), we compute more parameters. The preliminary list is


4. Quality flags

Quality flags could be defined and computed in many different ways. The main goal here is to define flags which tells us if the data could be used in a safe way for data analysis or if there are some doubts. It this design we foreseen three steps/levels: Channel level, Group level, Frame Level. Each of this flags are computed on a one second frame basis.

4.2 Channel level

For each selected channel one or several tests are performed. It could be The output of the test is one of the following flags: The output of the channel test is bit encoded by group of channel. There are three 32 bits word for each group, one for the channels tagged Faire, one for the channels tagged Suspicious, one for the channels tagged Fatal.

4.2 Group Level

 Channel information is collected by group to build quality flags per logical detector part (the PSL, a mirror and its seismic suspension,...) . There are about 20 such groups per interferometer. The output of this test is similar to the channel test. It is one of the following flags : The result of the Group test is bit encoded by interferometer. There are three 32 bits word for each group, one for the group tagged Faire, one for the groups tagged Suspicious, one for the groups tagged Fatal.

The proposed groups are (listed only for Hanford):

Click here to see a try of channel assignment

4.3 The IFO level

The group information is collected to form the interferometer quality information. The IFO Quality Flag will be: The result of the IFO quality flag will be stored in the frame header with 2 bits per IFO (instead of 1 as described in the frame spec.) (GOLD = 3, Faire = 2, Suspicious = 1, Fatal = 0).

4.4 Quality Flag usage

To be useful, these quality flags should not only tag obvious problem. So the thresholds have to be set in such way that we get a chance to see potential problems. We probably can afford up to a few percent of suspicious frames without loosing too many good events. In that case, the typical use would be that for short burst and for the end of the binary coalescence we ask for Gold or Faire frames. Since some low mass inspiral may last many frames, we may tolerate a suspicious frame at the beginning of the inspiral if it does not carry a large fraction of the signal/noise ratio in order to limit the inefficiency due to quality checks.
The problem for CW search is different since the signal is very weak and statistical test could be performed on the main output itself. Such analysis will probable care only at fatal flags.


5. Frame Content:

The frames will contains several part:
Channel Name 
(X should be H or L)
Frame Structure Type
Data Type
Sampling rate/size
Total size (MB) for a 1000 seconds file*
Comments
X:Raw-h
FrProcData
INT_4S
2048 Hz
4
Main channel filter down to 2 kHz.
X:QcValues
FrSummary
INT_4U
~3*70 values
0.3
Quality flags values
X:CsFMean
FrSummary
INT_2S
~500 channels
0.6
mean value for fast channels**
X:CsFRms
FrSummary
INT_2S
~500 channels
0.6
rms value for one channel (on a frame basis)**
X:CsFPwr32-128
FrSummary
INT_2S
~500 channels
0.6
power in the 32Hz-128Hz band**
X:CsFPwr128-1K
FrSummary
INT_2S
~400 real channels
0.5
power in the 128Hz-1kHz band**
X:CsFPwr1K-4K
FrSummary
INT_2S
~70 real channels
0.2
power in the 1kHz-8kHz band**
X:CsFChi2
FrSummary
INT_2S
~500 channels
0.6
chisquare for fast channels
X:CsSMean
FrSummary
INT_2S
~700 channels
0.6
Slow channel mean values**
X:CsKey
FrSummary
FLOAT
15 values for 40 channels
1.4
Table of parameters for the key channels
Original channel name
FrAdcData
-
on average, no more than one channel every two frame
2
Channel with strange behavior
Table 1: Information changing every frame
* Including the structure overhead and using a typical compression factor of 2. This is a value which have been measured during the first tests.
** We store the relative variation of this parameters
 



 
Channel Name
Data Type
Sampling rate/size
Update frequency
Total size (in MBytes) in a 1000 frames file
Comments
X:QcNames
STRING
~60 values
once per file
.001
Quality flags names (see table 2)
X:TF
REAL
??
once per file
??
Overall Calibration/Transfer Function (TBD)
X:CsFName
STRING
~500 channels
once per file
.01
Fast channel (>16Hz) names
X:CsFRates
INT_2U
~500 channels
once per file
.001
Fast channel rates
X:CsSNames
STRING
~700 channels
once per file
.015
Slow channel (<= 16Hz) names
X:CsSMean-<>
FLOAT
~700 channels
every 50 frames
.06
Mean of Mean Value for Slow channels
X:CsSMean-rms
FLOAT
~700 channels
every 50 frames
.06
rms of the Mean Value for Slow channels
X:CsFMean-<>
FLOAT
~500 channels
every 50 frames
.04
Mean of Mean Value for Fast channels
X:CsFMean-rms
FLOAT
~500 channels
every 50 frames
.04
rms of the Mean Value for Fast channels
X:CsFRms-<>
FLOAT
~500 channels
every 50 frames
.04
Mean of rms Value for Fast channels
X:CsFrms-rms
FLOAT
~500 channels
every 50 frames
.04
rms of the rms Value for Fast channels
X:CsFPwr32-128<>
FLOAT
~500 channels
every 50 frames
.04
Mean of the power in the 32-128 band
X:CsFPwr32-128-rms
FLOAT
~500 channels
every 50 frames
.04
rms of the power in the 32-128 band
X:CsFPwr128-1k-<>
FLOAT
~500 channels
every 50 frames
.04
Mean of the power in the 128-1k band
X:CsFPwr128-1k-rms
FLOAT
~500 channels
every 50 frames
.04
rms of the power in the 128-1k band
X:CsFPwr1k-8k-<>
FLOAT
~500 channels
every 50 frames
.04
Mean of the power in the 1k-8k band
X:CsFPwr1k-8k-rms
FLOAT
~500 channels
every 50 frames
.04
rms of the power in the 1k-8k band
Table 2: Static Data (Frame type: FrStatData)
Remark: Other information available in the FrAdcData could be copy in the static data if needed.
 
 

Type of Data
Size for a 1000 frames file (MBytes)
GW data
4
Channel Summary (Slow channels)
0.6
Channel Summary (Fast channels)
3.1
Channel Summary (Key channels)
1.4
Detailed Quality Information
0.3
Static Data
 0.56
Snapshots
2
Frame Header, History,  TOC
0.4
Total
12.4
Table 3: File Size


6. Prototype results

A prototype version of a DSS builder has been set up and run online in one of the DMT computer in Hanford.  This allow us to have access to an almost infinite amount of data for test. It is a preliminary step before deciding which part of this work needs to be integrated within LDAS.

Here are some plots of the channel taking on october 4. Each figure contain two plots. The top one shows the parameter value for all channel (horizontal axis) for the last frame. The second plot has the time as horizontal scale (about half an hour of data) and the channel number os vertical scale. There is an entry in this plot is the value is above some threshold.

A vertical line on the scatter plot indicate strange running conditions which could be tag as bad data. On horizontal line correspond to a channel which as a non stationary behavior.

All these plots are preliminary and are shown to give an idea of what we can do.