13.4 Offline Data Analysis System

Document technical information

Format doc
Size 336.4 kB
First found May 22, 2018

Document content analysis

Category Also themed
not defined
no text concepts found





13 Offline Computing System
13.1 Overview
The BES detector has been in operation for more than 12 years, and the BES offline
data analysis environment has been developed and upgraded along with the development
of the BES hardware and software. At present the BES data are processed on both
HP-UNIX farm system and PC-farm system. The network system consists of a 1000Mbps
optical fiber network together with a distributed 100Mbps fast Ethernet system, as well as
a 100Mbps FDDI local area network.
Based on the existing BES computing environment, following points should be taken
into account for the future BESIII offline computing system and software environment:
The system should be set up by adopting or referring to the latest technology
commonly used in HEP community, both in hardware and software, in order to benefit
the collaboration and to have easier exchanges with other experiments.
The system should support hundreds of the existing BES software packages and
should serve for both experts of the BESII software and new members in the
Many of the BESII packages will be modified or re-designed to suit for the new
computing environment.
The BESIII computing facility and software system will operate for many years.
Thus they should have the scalability to keep up with the development of the technology
in both the hardware and software. It should be highly flexible, powerful, reliable and easy
for maintenance.
13.2 Requirements
13.2.1 BESIII Data Yields
The peak luminosity of the BESIII at the J /  resonance will be about 1033
cm 2 s 1 . The event rate recorded on tape is estimated to be about 3000 Hz. and the event
size is about 12 Kbytes/event for raw data, 24 Kbytes/event for reconstructed data(Rec.)
and 2 Kbytes for summary data(DST).
Assuming BESIII will take J /  data at the begin of the data taking for one year or
more, and then move to   data energy region. So the maximum data yields per year is
about 11010 J /  data. The total data size in first years: 12103110101201012 bytes.
BESIII Detector
Detail information is listed in table 13.2-1.
Table 13.2-1 Estimate of the BESIII data yields in the first year
Data type
Event size( k bytes)
Total Data size(1012bytes)
M.C. Rec.
13.2.2 Data Storage and Management
All kinds of data, including raw data and reconstructed data, are stored in tapes
mounted on Robot in the computer center. The total amount of raw data in 5 years is
estimated to be about 2401012 bytes, which includes 120 Tbytes of J /  data and
120 Tbytes of   , D and Ds data. Suppose the data reconstruction is repeated three
times per year, the total size of the Rec. and DST data will be about 1440 Tbytes and
120 Tbytes respectively. The size of Rec. and DST data from Monte Carlo simulation will
be about the same as that of real data.
All of raw and Rec. data, about 3120 Tbytes, will be put on the tape library. A
total of 240 Tbytes DST data will be stored on a disk array accesse via high-speed network
system. Details are listed in table 13.2-2.
Table 13.2-2 Requirements of the tape and Disk space for BESIII Data
Sort of data
Amounts of data (Tbytes)
Tape Lib.
Tape Lib.
M.C. Rec.
Tape Lib.
13.2.3 CPU Power Requirement
According to the experience of data processing at BESII, required CPU power for
data reconstruction is about 20s×MIPS per event. Suppose the total active running time of
the computer is about 2×107 second per year, and the data reconstruction is repeated
three times a year for improving calibration and reconstruction, the required CPU power
is about 130000 MIPS. Details are listed in table 13.2-3.
Table 13.2-3 The CPU power required for handling the BESIII data.
Total event
Job type
Total CPU (MIPS)
(MIPS  s)
Data Rec.
MC Sim.
MC Rec.
13.2.4 Bandwidth for Data Transfer
The bandwidth required for online data transfer from the online computing system to
the offline data server should be more than 400 Mbps, which is determined by the product
of trigger rate times the event length, i.e. 4000  12Kbytes8. It also requires that the
network system should be highly stable and secure to avoid event losses.
The bandwidth required for data transfer from the data server (i.e. RAID disk) to the
reconstruction farm depends mainly on the processor speed of selected machines. The
higher the processor speed, the larger the bandwidth required. Due to very high data
traffic in the local network, it is necessary to create an isolated BES computing
environment, which is separated from other part of the IHEP network, and can ensure a
reasonable efficiency in data transfer.
13.3 Computing Enviroment
The main tasks of the BESIII Computing Environment can be divided into four parts:
The first one is the various data handling such as the data reconstruction and offline
analysis; The second one is the transport of various data; The third one is storage and
management of various data and documents; The fourth one is the communication
between users and system devices.
To satisfy these requirements, the system to be built should have good performance,
including stability, reliability and flexibility, with a reasonable and acceptable cost. Also
the rapid development of advance technology in both computer hardware and software
should be followed closely so that we can benefit from the latest development of
technology. Especially a high-speed network is essential for mass storage system, such as
a robot tape library and a disk array. Fig.13.3-1 shows a preliminary scheme of the
computing system for the BESIII. The main considerations are the following:
BESIII Detector
Fig.13.3-1 The scheme of the BESIII computing system
CPU type and architecture: A high quality computing system based on PC/Cluster
or PC/Grid technology will be taken. The CPU type can be any or all of Intel、AMD or
Data storage: The BESIII Storage System will adopt the visual technology of the
Disk Array and Tape library with HSM(Hierarchical Storage Management). A
SAN(Storage Area Network) construction can satisfy the requirement of large amount data
storage, high access speed and expandability. In such a system, all the sub-storage
system such as the Disk Array and the Tape Library, are connected through a switcher
and are independent from the server.
Network and I/O control: In order to increase the data access speed and to reduce
the interference, a second network based on SAN will be adopted to separate data transfer
and normal network traffic. In addition, all nodes will have both 100TX/1000TX network
cards, in which 100TX provides traditional TCP/IP services while 1000 TX provides NFS
System software:The BESⅢ offline computing system will mainly adopt free
software to reduce the cost and to have an easier exchange with other experiments in the
world . The main components are the following:
RedHat/Linux as the system operation software;
Castor or MySQL or PostgreSQL for database system;
PBS for the batch system;
YP for user management and auto-mount for document management.
13.4 Offline Data Analysis System
The main task of the BESIII software system is to convert raw data of the detector
responses into physics results. It consists of a main framework, the data reconstruction and
calibration package, the Monte Carlo simulation of physics processes and detector
responses, the database management and interfaces, various utility packages, and user’s
physics analysis packages. It should also manage documents, software codes and libraries.
The system should take the advantages of the Object Oriented technology by using the
C++ computer language, while still keeps the possibility to incorporate some of the
existing BES Fortran software packages. The system would also be taken into account
practical needs, such as usability, stability and flexibility, and to accommodate conflicting
needs between experts and novices.
1. Framework of the BESIII Offline Software
In order to take advantages of the modern technology and utilize common tools of
other HEP experiments in the world, the main framework of the BESIII offline software
will be based on the Object-Oriented methodology and C++ language, and take into
account the following points:
It should support some of the existing BES packages written in Fortran language;
It should use as much as possible existing HEP libraries.
It should provide a uniform data management, code and library management, and
database access.
The BESF as the BES III software framework, based on the Belle analysis
Framework (BASF)[7], has been developed in the summer of 2003. In order to make the
framework more flexible and robust, be able to handle offline data and MC events as well
as online Event Filter (EF) system, some software components and infrastructures are
taken from other experiments such as the Service in Gaudi [8] and data management
infrastructure from the Babar software.
BESIII Detector
The major packages of the BESF framework are shown in Fig.13.4-1. In which the
BesKernel is the core part of the framework that implements the control on data
processing. It depends on other four packages: the EventIO package managing event input
and output, the UserInterface package providing friendly interface for running jobs, the
Panther[9] package that is an integral data management system and the BesEvent package
implementing the interface to the ProxyDict originally developed in the Babar experiment.
The ROOT and CERNLIB are the only two external libraries needed by Histogram
13.1 P
Fig.13.4-1 Software packages and dependencies in the BESF
2. Calibration and Reconstruction
Most of the sub-detectors of the BESIII are different from that of the BESII, therefore
the calibration and reconstruction code will mostly be re-written. Whether it is written in
C++ or in Fortran, the software system should have a well separated calibration and
reconstruction sequence, with a modular structure so that any changes of an intermediate
step will not result in modifications of related code in a later stage. If C++ is adopted,
some of the objectivity should be compromised, for example, data and operation should be
well separated.
The main tasks of the reconstruction include track finding and fitting, cluster finding,
shower fitting and reconstruction, scintillation timing reconstruction, muon track finding
in muon chambers and particle identification.
Data calibration will be done at various stages, both online and offline. Calibration
constants will be stored in a database. It is also foreseen to have several calibration
iterations so that data will be processed several times over a year.
3. Monte Carlo Simulation
Most of the event generators of the BESII can be re-used although some
modifications may be needed. The simulation of the detector response will be a new
package based on the GEANT4 program while a Fortran code based on the GEANT3
program in Fortran will be kept as a backup and for comparison. Detailed simulation of
the drift chamber resolution using output of Garfield will be investigated. Light transport
in scintillators of the Time-of-Flight system and the time resolution can be well simulated
using GEANT4.
The BESIII simulation packages based on Geant4[1], BOOST, consists of three main
parts, the event generator, the particle tracking and the detector response. The XML[5]
language will be used for the detector description. The “raw” data format is used for the
final output of BOOST. Right now, the hit information from most sub-detectors can be
used to test or tune the offline reconstruction program.
4. Common Tools and Libraries
Commonly used CERN libraries, both in C++ and in Fortran, will be used extensively.
Physics analysis will be based on HBOOK, PAW, PAW++, ROOT, MN_FIT, Fitver and so
Some of the BESII libraries in Fortran, such as Telesis for kinematical fitting, events
vertex fitting and event-kink fitting can be re-used.
The database of the BESIII contains the detector geometry, calibration constants,
detector running status and conditions, environment parameters, etc. Some of the tables in
the offline database are kept identical with that of the online database while some other
tables will only appear in one of the two databases. The database will be managed by a
free software based on SQL language, such as PostgreSQL, MySQL or MiniSQL.
Commercial software packages can also be used, as long as it is well received by the
HEP community. For example, the software code will be managed most likely by CVS,
RCVS, AFS or DFS and so on.
[1] http://cern.ch/geant4
[2] http://www.slac.stanford.edu/bfroot/computing/offline/simualtion/web
[3] http://cmsdoc.cern.ch/oscar
BESIII Detector
[4] http://www.thep.lu.se/~torbjorn/pythia.html
[5] http://gdml.web.cern.ch/gdml
[6] http://root.cern.ch
[7] Itoh, R., BASF - BELLE AnalysiS Framework, Talk given at Computing in
High-energy Physics (CHEP 97), Berlin, Germany, 7-11 Apr 1997
[8] Barrand, G. and others, GAUDI - A software architecture and framework for building
HEP data processing applications, Comput. Phys. Commun., 140(2001) 45-55
[9] Shojiro Nagayama, Panther User’s guide version 3.0
[10] http://lhcb.web.cern.ch/lhcb/

Report this document