2014 The 7 International Conference on Machine Vision 2014 the 4

Document technical information

Format pdf
Size 2.2 MB
First found Jun 9, 2017

Document content analysis

not defined
no text concepts found


Hugh Jackman
Hugh Jackman

wikipedia, lookup




2014 The 7th International Conference on Machine Vision
(ICMV 2014)
2014 the 4 International Conference on Communication and Network
(ICCNS 2014)
2014 International Conference on Digital Signal Processing
(ICDSP 2014)
* ICMV 2014 conference papers will be published by SPIE, which will be included in SPIE
Digital Library and provided to the Web of Science Conference Proceedings Citation
Index-Science, Scopus, Ei Compendex, Inspec, Google Scholar, Microsoft Academic
Search, and others, to ensure maximum awareness of the Proceedings. Authors will be
requested to upload your paper to SPIE before the conference, the conference
proceeding will be published after the conference and we will post it to your address.
*ICCNS 2014 conference papers are selected to be published in several journals, due to
the publication schedule, only a few papers will be published before the conference,
which can get the journal onsite, most paper are scheduled to be published after the
conference, we wil post your journal to your address when it published.
*ICDSP 2014 conference papers are selected to be published in several journals,
according to the publication schedule, only a few papers will be published after the
conference. The others can get the journals onsite.We will post the rest to your address.
We are pleased to welcome you to the 2014 SCIEI Milano conferecnes, which will takes
place at Novotel Milano Nord Ca Granda Hotel from November 19-21, 2014.
After several rounds review procedure, the program committee accepted those papers to
be published in Journals and conference proceedings. We wish to express our sincere
appreciation to all the individulas who have contribute to ICMV 2014, ICCNS 2014 and
ICDSP 2014 conference in various ways. Special thanks are extended to our colleagues in
program committee for their thorough review of all the submissions, which is vital to the
success of the conference, and also to the members in the organizing committee and the
volunteers who had delicated their time and efforts in planning, promoting, organizing
and helping the conference.Last but not least, our speacial thanks goes to invited
keynote speakers as well as all the authors for contributing their latest research to the
This conference program is highlighed by three keynote speakers: Prof. Antanas Verikas
from Halmstad University, Sweden, Prof. Branislav Vuksanovic, University of Portsmouth
from United Kingdom and Prof. Petia Radeva from University of Barcelona, Spain.
One best presentation will be selected from each session, evaluated from: Originality;
Applicability; Technical Merit; PPT; English. The best one will be announced at the end of
each Session, and awarded the certificate over the Dinner. The winners’ photos will be
updated on SCIEI official website: www.sciei.org.
Milan, the capital of the former Western Roman Empire and “The Moral Capital of Italy”
is not only home to some of Italy’s most prominent educational and research institutions,
but also the center of culture with a history of more than two millennia. We hope that
your stay in Milano will be enriching and memorable! The technical program will send
you back home motivated, enthusiastic, and full of innovative ideas.
We wish you a success conference and enjoyable visit in Italy.
Amanda F. Wu
Science and Engineering Institute
Honorary Chair
Prof. Petia Radeva, University of Barcelona, Spain
Conference Chair
Prof. Antanas Verikas, Halmstad University, Sweden
Dr. Branislav Vuksanovic, School of Engineering University of Portsmouth, UK
Prof. Garcia-Teodoro, Faculty of Computer Science and Telecommunications, Periodista
Daniel Saucedo Aranda, Spain (Workshop Chair of ICCNS)
Program Chair
Prof. Enrique Nava, University of Malaga, Spain
Prof. Andreas Nuchter, University of Wuerzburg, Germany
Prof. Alexander Bernstein, Institute for Information Transmission Problems (Kharkevich
Institite) Russian Academy of Sciences, Moscow, Russian Federation
Prof. Klaus Simon, Swiss Federal Laboratories for Materials Testing and Research (EMPA),
Prof. Sei-ichiro Kamata, Waseda University, Japan
Technical Committee
Prof. Aristidis Likas,Department of Computer Science,University of Ioannina, Greece
Prof. Mourad Zaied,REGIM,Tunisia
Prof. Reyer Zwiggelaar,Aberystwyth University,United Kingdom
Prof. Dmitry Nikolaev, Russian Academy of Science, Russia
Prof. Luca Iocchi,Sapienza University of Rome,Italy
Prof. Francesco Viti,University of Luxembourg,Luxembourg
Prof. Manuel F. González Penedo, Universidade da Coruña ,Spain
Prof. Qin Zhang, Communication University of China
Prof. Jianxiong WANG, Guangzhou University, China
Prof. Wafa AlSharafat, Al Al-Bayt University, Jordan
Prof. Mohamed El-Sayed Farag, Al-Azhar University, Egypt
Prof. Li Jun, Chongqing University, China
Prof. Anrong Xue, Jiangsu University, China
Prof. Cristina Ofelia STANCIU, Tibiscus University in Timisoara, Romania
Prof. Chi-Cheng Cheng, National Sun Yat-Sen University, Taiwan
Prof. Zahurin Samad, Universiti Sains Malaysia, Malaysia
Prof. Ming LIANG, University of Ottawa, Canada
Prof. Qassim Nasir, Electrical and Computer Engineering, UAE
Prof. Huwida E. Said, Zayed University, Dubai, UAE
Dr. Jinoh Kim,Texas A&M University-Commerce, USA
Dr. Murat Orhun Murat Orhun, Istanbul Bilgi University, Istanbul, Turkey
Dr. Weiliang LIN, The school of physical culture in Guangzhou University, China
Dr. Mohamed Basel Al Mourad, Zayed University UAE
Dr. Chiung Ching Ho, Multimedia University, Malaysia
Hotel Novotel Milano Nord Ca Granda
Address: Viale Suzzani 13, 20162 MILANO, ITALY
Tel: (+39)02/641151, Fax: (+39)02/66101961
Airport: LINATE Airport / MALPENSA Airport
Railway Station: Stazione Cadorna FNM / Stazione centrale / Stazione Garibaldi
Underground Station: CA GRANDA
Bus: Bus 42, Viale Suzzani
The Currency is EURO here, you can exchange foreign currency 25 hours at the airport,
or exchange at the bank, Money exchanger.
Milan is famous for its wealth of historical and modern sights - the Duomo, one of the
biggest and grandest Gothic cathedrals in the world, La Scala, one of the best
established opera houses in the globe, the Galleria Vittorio Emanuele, an ancient and
glamorous arcaded shopping gallery, the Brera art gallery, with some of the finest artistic
works in Europe, the Pirelli tower, a majestic example of 1960s modernist Italian
architecture, the San Siro, a huge and famed stadium, or the Castello Sforzesco, a grand
medieval castle and the UNESCO's World Heritage Site Santa Maria alle Grazie Basilica,
containing one of the world's most famous paintings: Leonardo da Vinci's The Last
Attention Please:
Please take care of your belongings in the public places, buses, metro. Don’t stay too late
in the city, don’t be alone in the remote area. be aware of the strangers who offer you
service, signature of charity,etc., at many scenic spots.
You can search more Tourist Information and Security tips online.
Ambulance: 118
Police: 113
, 230 V, 50 Hz
Oral Presentations
Timing: a maximum of 15 minutes total, including speaking time and discussion.
Please make sure your presentation is well timed. Please keep in mind that the
program is full and that the speaker after you would like their allocated time
available to them.
You can use CD or USB flash drive (memory stick), make sure you scanned viruses in
your own computer. Each speaker is required to meet her/his session chair in the
corresponding session rooms 10 minutes before the session starts and copy the slide
file(PPT or PDF) to the computer,
It is suggested that you email a copy of your presentation to your personal inbox as
a backup. If for some reason the files can’t be accessed from your flash drive, you
will be able to download them to the computer from your email.
Please note that each session room will be equipped with a LCD projector, screen,
point device, microphone, and a laptop with general presentation software such as
Microsoft PowerPoint and Adobe Reader. Please make sure that your files are
compatible and readable with our operation system by using commonly used fronts
and symbols. If you plan to use your own computer, please try the connection and
make sure it works before your presentation.
Movies: If your PowerPoint files contain movies please make sure that they are well
formatted and connected to the main files.
Poster Presentations
Maximum poster size is 36 inches wide by 48 inches high (3 ft. x 4 ft.)
Posters are required to be condensed and attractive. The characters should be large
enough so that they are visible from 1 meter apart.
Please note that during your poster session, the author should stay by your poster
paper to explain and discuss your paper with visiting delegates.
Dress code
Please wearing formal clothes or national characteristics of clothing
“Mining data with Random Forests: Adaptive models for prediction and data
Abstract: Random Forest (RF) is a general data mining tool and has been successfully
applied in a variety of fields for data classification, prediction, analysis of data similarities
and variable importance, replacing missing values, outlier detection and other tasks.
In this work, a new approach to building adaptive—data dependent—random forests and
committees of random forests is presented. Both forest/committee size and parameters
used to aggregate trees or forests can be data dependent. A new way to explore data
and decisions obtained from random forests is also suggested. Results of experimental
investigations concerning several real world problems substantiate effectiveness of the
developed techniques and demonstrate how insights into automatic decisions can be
obtained using information available from random forests. We also investigate
consistency, stability and generality of variable importance estimates obtained from
standard random forest architectures as well as suitability of random forests for solving a
data classification problem at hand.
About Prof. AntanasVerikas: He was awarded a PhD degree in
pattern recognition from Kaunas University of Technology, Lithuania.
Currently he is holding a professor position at both Halmstad University
Sweden, where he leads the Department of Intelligent Systems, and
Kaunas University of Technology, Lithuania. His research interests
include image processing, computer vision, pattern recognition, learning
systems, classification, fuzzy logic, and visual media technology. He published more than
150 peer reviewed articles in international journals and conference proceedings and
served as Program committee member in numerous international conferences. He is a
member of the International Pattern Recognition Society, European Neural Network
Society, International Association of Science and Technology for Development, and a
member of the IEEE.
“Image-based life-logging: what it’s about?”
Abstract: Recently appeared Life-logging technology consists in acquiring personal data
by wearing an electronic sensor over long periods of time. Wearable computing is
moving to the cusp of common-place consumer technology, being in the top of The Hype
Cycle in the Gartner industry analysis. The benefit of image-based life-logging acquired
by a wearable camera is that it gives rich information able to generate explanations and
visualize the circumstances of the person’s activities, scene, state, environment and
social context that definitely influence his/her way of life (contextual information). At the
same time, several natural privacy and legal concerns appear for people wearing devices
at all times that can record video and take pictures.
In this talk, I will present our research on Life-logging based on the use of wearable
cameras for recording person’s lifestyle. One of the challenges of image-based
Life-logging is how to extract meaningful semantic information from the huge amount of
images (4000 images acquired per day or more than 100.000 images acquired per
month). To this purpose, I will address several computer vision problems like: Video
segmentation and analysis; Event detection; Social interaction, and Lifestyle
characterization. We will discuss potential health applications of Life-logging like:
Reinforcing memory of people with mild-cognitive impairment, and Healthy habits
improvement by monitoring, analysis, and lifestyle characterization from life-logging
Dr. Petia Radeva (PhD 1993, Universitat Autonoma de Barcelona,
Spain) is a senior researcher and associate professor at UB. She
received her Ph.D. degree from the Universitat Autonoma de Barcelona
in 1998. Her present research interests are on development of
learning-based approaches (in particular, statistical methods) for
computer vision. She has led one EU project and 12 Spanish projects,
14 patents in the field of computer vision and medical imaging applied to health. Dr. Petia
Radeva has published more than 80 SCI-indexed articles and more than 150
international publications. She has an H-index of 29 with 3166 citations. Dr. Petia
Radeva was awarded the prize of "Antoni Caparros" for best technology transfer project
of University of Barcelona (2013) and the prize of "Jaume Casademont" (2012) for her
research activities in the field of lifelogging.
“Multivariate Data Analysis – PCA, ICA and All That Stuff”
Abstract: Multivariate or multidimensional data appear in all areas of science.
Multivariate data analysis first considers two aspects of data sets to be analysed –
number of data dimensions and a number of measurements available for a given data
set. For example, a data set consisting of the number of wins for a single football team at
each of several years is a single-dimensional (in this case, longitudinal) data set. A data
set consisting of the number of wins for several football teams in a single year is also a
single-dimensional (in this case, cross-sectional) data set. A data set consisting of the
number of wins for several football teams over several years is a two-dimensional data
Data dimensionality reduction is an important analysis technique usually applied to
multivariate data. The main aim is to try and preserve as much of information present in
this data whilst at the same time, reducing data dimensions of the original set. This
presentation will review some of dimensionality reduction techniques and present results
achieved by Dr Vuksanovic applying those techniques during his previous research on
various data sets.
Branislav Vuksanovic graduated from the University of Belgrade,
Serbia with degree in Electrical and Power Engineering. He holds MSc
degree in Measurement and Instrumentation from South Bank University,
London and a PhD in Active Noise Control from the University of
Huddersfield, UK.
Previously, he worked as a Project Engineer for Croatian Electricity Board in Osijek,
Croatia. During his academic career he worked as a Research Fellow at Sheffield and
Birmingham Universities on Optical Brain Imaging and Medical Video Compression
projects. He also worked as a Lecturer at the University of Derby where he was a
member of Sensors and Controls Research Group. Currently he works as a Senior
Lecturer at the University of Portsmouth, School of Engineering. He has published papers
in the field of active noise control, biomedical signal processing and pattern recognition
for intrusion detection and knowledge based authentication. He published one book in
Digital Electronics and Microcontrollers field.
Dr Branislav Vuksanovic is a member of IET, ILT and IACSIT. His current research
interests are in the application of pattern recognition techniques for power systems and
analysis of ground penetrating radar and ECG data.
- 10 -
Registration (Lobby) & PPT copy
Nov. 19th |
Add&Tel: Viale Suzzani 13, 20162 MILANO, ITALY| Tel:
*Collecting conference materials
*Delegates will get the certificate at the registration desk.
*The organizer won't provide accommodation, and we suggest
Nov. 20th 丨 Thursday | Morning
you make an early reservation.
Registration (Lobby)
Milano Room
Opening Remarks:
Prof. Antanas Verikas, Halmstad University, Sweden
Keynote Speech 1:
Prof. Petia Radeva, University of Barcelona &Computer Vision
Center, Spain
Keynote Speech 2:
Dr. Branislav Vuksanovic, University of Portsmouth, UK
Keynote Speech 3:
Prof. Antanas Verikas, Halmstad University, Sweden
Registration (Lobby)
Milano Room
Nov. 20th 丨 Thursday 丨 Afternoon
Session 1: computer Vision
Session 2: Image processing
Torino Room
Session3: Video Processing
Session 4: Medical Image Processing
Napoli Room
Session 5: Signal Processing
Session 6: Pattern Recognition
Calla Room
Session 7: Network and Communication
Session 8: Image Classification
- 11 -
Venue: Room 1
Chair: Prof. Antanas Verikas
Time: 9:00am-12:00am
Opening Remarks
Prof. Antanas Verikas,
Halmstad University, Sweden
Keynote Speech I
Prof. Petia Radeva,
University of Barcelona &Computer Vision
“Image-based life-logging: what it’s about?”
Keynote Speech II
Dr. Branislav Vuksanovic,
University of Portsmouth, UK
“Multivariate Data Analysis – PCA, ICA and All That Stuff”
Coffee Break & Group Photo
Keynote Speech III
Prof. Antanas Verikas,
Halmstad University, Sweden
“Mining data with Random Forests: Adaptive models for
prediction and data exploration”
*The Group Photo will be updated on the conference webpages and SCIEI official website:
**One best presentation will be selected from each session, the best one will be announced and
awarded the certificate at the end of each session, and the winners’ photos will be updated on
SCIEI official website: www.sciei.org.
***Best Presentation will be evaluated from: Originality; Applicability; Technical Merit; PPT;
**** Please arrive at the conference room 10 minutes earlier before the session starts, copy your
PPT to the laptop.
- 12 -
Venue: Milano Room
Chair: Prof. Antanas Verikas, Halmstad University, Sweden
Time: 13:30pm-16:00pm
Input Support System for Medical Records Created Using a Voice Memo Recorded by a Mobile
Mr Keisuke Kurumizawa, Hiromitsu Nishizaki, Kanae Nishizaki and Hitoshi Ikegami
University of Yamanashi, Japan
This paper describes a development of an input support system for medical records using voice
memos recorded by a mobile device. The goal of this research is to increase the efficiency of
creating medical records. The proposed prototype system enables medical staff to record
treatment memos using a mobile device soon after completing patient treatment and to edit a
voice memo transcribed by an automatic speech recognition (ASR) system. This system also has
ASR error correction technology and a user friendly interface for editing. We evaluated the
prototype system by assessing the voice quality recorded by a mobile device using an ASR
performance measure, measuring the time required to perform editing, and administering a
questionnaire to provide subjective assessments. The experimental results showed that the
proposed system was useful and that it may result in more efficient creation of medical records.
Optical Flow Based Velocity Estimation for Mobile Robots
Beijing University of Technology, China
This paper presents an optical flow based novel technique to perceive the instant motion velocity
of mobile robots. The primary focus of this study is to determine the robot’s ego-motion using
displacement field in temporally consecutive image pairs. In contrast to most previous
approaches for estimating velocity, we employ a polynomial expansion based dense optical flow
approach and propose a quadratic model based RANSAC refinement of flow fields to render our
method more robust with respect to noise and outliers. Accordingly, techniques for geometrical
transformation and interpretation of the inter-frame motion are presented. Advantages of our
proposal are validated by real experimental results conducted on Pioneer robot.
Seam Tracking with Adaptive Image Capture for Fine-tuning of a High Power Laser Welding
Olli Lahdenoja, Tero Säntti, Ari Paasio, Mika Laiho, Dr. Jonne Poikonen
Technology Research Center, University of Turku, Finland
This paper presents the development of methods for real-time fine-tuning of a high power laser
welding process of thick steel by using a compact smart camera system. When performing
- 13 -
welding in butt-joint configuration, the laser beam’s location needs to be adjusted exactly
according to the seam line in order to allow the injected energy to be absorbed uniformly into
both steel sheets. In this paper, on-line extraction of seam parameters is targeted by taking
advantage of a combination of dynamic image intensity compression, image segmentation with a
focal-plane processor ASIC, and Hough transform on an associated FPGA. Additional filtering
of Hough line candidates based on temporal windowing is further applied to reduce unrealistic
frame-to-frame tracking variations. The proposed methods are implemented in Matlab by using
image data captured with adaptive integration time. The simulations are performed in a hardware
oriented way to allow real-time implementation of the algorithms on the smart camera system.
Escaping Path Approach for Speckle Noise Reduction
Asst. Prof. Marek Szczepanski, Krystian Radlak
Silesian University of Technology, Poland
A novel fast _ltering technique for multiplicative noise removal in ultrasound images was
presented in this paper. The proposed algorithm utilizes concept of digital paths created on the
image grid presented in [1]adapted to the needs of multiplicative noise reduction. The new
approach uses special type of digital paths so called Escaping Path Model and modi_ed path
length calculation based on topological as well as gray-scale distances. The experiments
con_rmed that the proposed algorithm achieves a comparable results with the existing
state-of-the art denoising schemes in suppressing multiplicative noise in ultrasound images.
A Rotation Invariant Local Zernike Moment based Interest Point Detector
Mr. Gökhan Özbulak, Muhittin Gökmen
Istanbul Technical University, Turkey
Detection of interesting points in the image is an important phase when considering object
detection problem in computer vision. Corners are good candidates as such interest points. In this
study, by optimizing corner model of Ghosal based on local Zernike moments (LZM) and using
LZM representation Sariyanidi et.al presented, a rotation-invariant interest point detector is
proposed. The performance of proposed detector is evaluated by using Mikolajczyk's dataset
prepared for rotation-invariance and our method outperforms well-known methods such as SIFT
and SURF in terms of repeatability criterion.
Noncontact Surface Roughness Measurement using a Vision System
ErdinçKoçer, Erkan Horozoglu, Assoc. Prof. İlhan Asiltürk
Selcuk University,KONYA,Turkey
Surface roughness measurement is one of the basic measurement that determines the quality and
performance of the final product. After machined operations, tracer end tools are commonly used
in industry in order to measure the surface roughness that occurred on the surface. This
measurement technique has disadvantages such as user errors because it requires calibration of
the device occurring during measurement. In this study, measuring and evaluation techniques
were conducted by using display devices over surface image which occurred on the processed
surfaces. Surface measurement which performed by getting image makes easier measurement
process because it is non-contact, and does not cause any damage. Measurement of surface
- 14 -
roughness, and analysis was conducted more precise and accurate. Experimentally obtained
results of the measurements on the parts in contact with the device is improved compared with
the results of the non-contact image processing software, and satisfactory results were obtained.
Effect of High Altitude Aeronautical Platforms with Coginitive Relay for Radar Performance
Mr Ashagrie Getnet Flattie
Ethio-Telecom, Ethiopia
Cognitive radio (CR) is a promising technology for improving the utilization of wireless
spectrum resources. The key characteristic of CR system is that allowing unlicensed user to use
licensed spectrum bands opportunistically without affecting their performance. The use of
cooperative relay networks can help cognitive radios system to improve their utilization by
reducing their transmit power. Here High Altitude Aeronautical Platforms (HAAP) cognitive
relay are introduced in order to achieve cooperative diversity in the cognitive radar (CRs)
system. This system typically consists of the Primary Radar, Target, the Cognitive HAAP Relays,
and the Cognitive Controller. In this paper, the cooperative (Amplify And Forward AAF)
strategy will be considered, which achieves diversity by using Maximal Ratio Combining
(MRC). Performance metrics like probability of false alarm (Pf), probability of detection (Pd)
and signal to noise ratio are evaluated. Matlab software simulations were carried out and the
results illustrate that notable performance improvements compared to direct transmission (i.e.,
without HAAP cognitive relay assistance) are achieved by the proposed schemes, especially
substantial performance improves with the increase of the number of HAAPscognitive relay
Measuring the Engagement Level of Children for Multiple Intelligence Test using Kinect
Mr. Dongjin Lee, Woo-Han Yun, Chankyu Park, Hosub Yoon, Jaehong Kim and Cheong Hee
Electronics and Telecommunications Research Institute, South Korea
In this paper, we present an affect recognition system for measuring the engagement level of
children using the Kinect while performing a multiple intelligence test on a computer. First of
all, we recorded 12 children while solving the test and manually created a ground truth data for
the engagement levels of each child. For a feature extraction, Kinect for Windows SDK provides
support for a user segmentation and skeleton tracking so that we can get 3D joint positions of an
upper-body skeleton of a child. After analyzing movement of children, the engagement level of
children’s responses is classified into two class: high or low. We present the classification results
using the proposed features and identify the significant features in measuring the engagement.
Effect of Boron Addition on Mechanical Properties of 60sicr7 Stell
Assoc. Prof. Hayrettin DÜZCÜKOĞLU and Selman ÇETİNTÜRK
Selcuk Universty, Techonology Faculty, TURKEY
Boron, as an alloy element, and its compounds are used in a wide range. It is determined that,
boron and its compounds increases physical, chemical, mechanical and metallurgical properties
of materials. With bronzing, materials get some the properties such as high hardness, high wear,
- 15 -
less friction coefficient and high corrosion. In this study, effect on the mechanical properties of
boron addition of 60SiCr7 steel were investigated. 60SiCr7 of spring steel, which is relatively
low compared with treatment steel ductility by heat treatment in the treatment and in addition,
various amounts of boron (ppm ratio) and improving the mechanical properties by alloying has
been attempted. As a result of 15-30 ppm added steel in tensile and fracture toughness has
Using the predictive method of adaptive 4x4/8x8 Transformations to reduce the computation in
Motion Estimation
Mr Kuo Hsiang Chou and Chung-Ping Chung
Nation Chiao Tung University, Taiwan
In order to improve coding compression and display quality in bitstream playback, an H.264,
with an “adaptive transform with variable block” (ATVB), is able to achieve this. However, these
added compression and display benefits are offset by the demand for double the computation
power requirements. In addition to this computing power increase an extra round of Motion
Estimation (ME), when using 4x4 or 8x8 a transform module is also required. Therefore, this
paper proposes a savvy hypothesis to reduce the computation requirements between 30%-45%
using ATVB with 0.25db display distortion and 2.5% bitrate increment.
Correction of Spectral Interference of Ethanol and Glucose for Rice Wine Evaluation
Dr Satoru Suzuki, Satsuki Hosono, Pradeep K. W. Abeygunawardha, Ichiro Ishimaru, Kenji
Wada and Akira Nishiyama
Kagawa University, Japan
Rice wine, the traditional Japanese alcohol, contains ethanol and glucose, whose concentrations
are evaluated to manage their quality. Recently, infrared techniques have been introduced to rice
wine measurement, and absorption spectra are analyzed to monitor rice wine components during
fermentation. However, it is difficult to precisely evaluate ethanol and glucose concentrations
because of an overlapping of their absorption peaks. This paper proposes a spectral separation
method using a single absorption band for evaluating the ethanol and glucose concentrations in
rice wine. We evaluate their concentrations by separating a mixed spectrum into ethanol and
glucose spectra. The simulation results showed a decrease in the estimation errors of ethanol and
glucose concentrations from 5.40 ± 5.41 %(v/v) and 0.73 ± 0.73 g/dl, respectively, using
regression analysis (RA) to 0.46±0.65 %(v/v) and 0.40±0.57 g/dl, respectively, using the
proposed method
- 16 -
Venue: Milano Room
Time: 16:20pm-19:30pm
Sparse Decomposition Learning based Dynamic MRI Reconstruction
Peifei Zhu, Qieshi Zhang, and Prof. Sei-ichiro Kamata
Waseda University, Japan
Dynamic MRI is widely used for many clinical exams but slow data acquisition becomes a
serious problem. The application of Compressed Sensing (CS) demonstrated great potential to
increase imaging speed. However, the performance of CS is largely depending on the sparsity of
image sequence in the transform domain, where there are still a lot to be improved. In this work,
the sparsity is exploited by proposed Sparse Decomposition Learning (SDL) algorithm, which is
a combination of low-rank plus sparsity and Blind Compressed Sensing (BCS). With this
decomposition, only sparsity component is modeled as a sparse linear combination of temporal
basis functions. This enables coefficients to be sparser and remain more details of dynamic
components comparing learning the whole images. A reconstruction is performed on the
undersampled data where joint multicoil data consistency is enforced by combing Parallel
Imaging (PI). The experimental results show the proposed methods decrease about 15~20% of
Mean Square Error (MSE) compared to other existing methods.
A Comparative study of transform based secure image steganography
Assoc. Prof. Dr. Sushil Kumar
Rajdhani College, University of Delhi, India
In this paper, a comparative study of Contourlet-based Steganography based on Modified LSB
Varying mode in the gray images is proposed. The main goal of Steganography includes hiding
information or information file into another information file in an undetectable way both
perceptually and statistically. To provide an additional layer of security, Cryptography and
source encoding methods are used in conjunction with Steganography. In addition to proposing
Contourlet-based Steganography, we propose to use Self-Synchronizing variable length codes,
called T-codes as source encoder to obtain the secret data from the original embedding data. We
demonstrate through the experiments that Contourlet performs better than the CDF9/7 and
Slantlet in terms of PSNR, SSIM and KLDiv. Comparing Haar Transform with Contourlet it is
found that though Haar Wavelet based LSB varying mode method provides better PSNR, SSIM
and KLDiv values than Contourlet, the Contourlet lowers detectability and provides more
embedding capacity.
Segmentation of Color Images using Genetic Algorithm with Image Histogram
Ms. P. Sneha Latha, Pawan Kumar, Samruddhi Kahu, K. M. Bhurchandi
Visvesvaraya National Institute Of Technology, Nagpur, India
This paper proposes a family of color image segmentation algorithms using genetic approach
- 17 -
and color similarity threshold decided by human vision limitations and capabilities. Most of the
soft computing image segmentation techniques initially segment images using any of the
clustering techniques and then use genetic algorithms (GA) only as optimization tool. The
proposed technique directly uses GA for optimized segmentation of color images. Application of
GA on larger size color images is computationally heavy so they are applied on 4D-color image
histogram table. The proposed algorithms are applied on Berkeley segmentation database in
addition to general images. The performance of the proposed algorithms is benchmarked with
color histogram based segmentation and Fuzzy C-means Algorithm using Probabilistic Rand
Index (PRI). Results show that the proposed algorithms yield better analytical and visual results.
Interactive Object Segmentation Using Color Similarity based Nearest Neighbor Regions
Jun Zhang, and Dr. Qieshi Zhang
Waseda University, Japan
An effective object segmentation is an important task in computer vision. Due to the automatic
image segmentation is hard to segment the object from natural scenes, the interactive approach
becomes a good solution. In this paper, a color similarity measure based region mergence
approach is proposed with the interactive operation. Some local regions, which belong to the
background and object, need to be interactively marked respectively. To judge whether two
adjacent regions need to be merged or not, a color similarity measure is proposed with the help
of mark. Execute merging operation based on the marks in background and the two regions with
maximum similarity need to be merged until all candidate regions are examined. Consequently,
the object is segmented by ignoring the merged background. The experiments prove that the
proposed method can obtain more accurate result from the natural scenes.
Independent Transmission of Sign Language Interpreter in DVB - Assessment of Image
Mr. Petr Zatloukal, Martin Bernas, Lukáš Dvořák
Czech Technical University in Prague (CTU in Prague), Czech Republic
Sign language on television provides information to deaf that they cannot get from the audio
content. If we consider the transmission of the sign language interpreter over an independent
data stream, the aim is to ensure sufficient intelligibility and subjective image quality of the
interpreter with minimum bit rate. The work deals with the ROI-based video compression of
Czech sign language interpreter implemented to the x264 open source library. The results of this
approach are verified in subjective tests with the deaf. They examine the intelligibility of sign
language expressions containing minimal pairs for different levels of compression and various
resolution of image with interpreter and evaluate the subjective quality of the final image for a
good viewing experience.
Remotely Sensed Image Restoration Using Partial Differential Equations and Watershed
Ms. Avishan Nazari , Amin Zehtabian , Marco Gribaudo , Hassan Ghassemian
Department of Information Technology, Politecnico di Milano University, IRAN
- 18 -
This paper proposes a novel approach for remotely sensed image restoration. The main goal of
this study is to mitigate two most well-known types of noises from remote sensing images while
their important details such as edges are preserved. To this end, a novel method based on partial
differential equations is proposed. The parameters used in the proposed algorithm are adaptively
set regarding the type of noise and the texture of noisy datasets. Moreover, we propose to apply
a segmentation pre-processing step based on Watershed transformation to localize the denoising
process. The performance of the restoration techniques is measured using PSNR criterion. For
further assessment, we also feed the original/noisy/denoised images into SVM classifier and
explore the results.
Method of center localization for objects containing concentric arcs
Ms. Elena Kuznetsova, Evgeny Shvets, Dmitry Nikolayev
A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of
Sciences, Russian Federation
This paper proposes a method for automatic center location of objects containing concentric
arcs. The method utilizes structure tensor analysis and voting scheme optimized with Fast
Hough Transform. Two applications of the proposed method are considered: (i) wheel tracking
in video-based system for automatic vehicle classification and (ii) tree growth rings analysis on a
tree cross cut image.
High-accurate and Noise-tolerant Texture Descriptor
Mr. Alireza Akoushideh and Babak Mazloom-Nezhad Maybodi
Electrical and Computer Department, Shahid-Beheshti University G.C, Tehran, Iran
In this paper, we extend pyramid transform domain approach on local binary pattern (PLBP) to
make a high-accurate and noise-tolerant texture descriptor. We combine PLBP information of
sub-band images, which are attained using wavelet transform, in different resolution and make
some new descriptors. Multi-level and -resolution LBP(MPR_LBP), multi-level and -band LBP
(MPB_LBP), and multi-level, -band and -resolution LBP (MPBR_LBP) are our proposed
descriptors that are applied to unsupervised classification of texture images on Outex, UIUC,
and Scene-13 data sets. Experimental results show that the proposed descriptors not only
demonstrate acceptable texture classification accuracy with significantly lower feature length,
but also they are more noise-robustness to a number of recent state-of-the-art LBP extensions.
Improving Color Image Segmentation by Spatial-Color Pixel Clustering
Assoc. Prof. Henryk Palus and Mariusz Frackiewicz
Silesian University of Technology, Gliwice, Poland
Image segmentation is one of the most difficult steps in the computer vision process. Pixel
clustering is only one among many techniques used in image segmentation. In this paper is
proposed a new segmentation technique, making clustering in the five-dimensional feature space
built from three color components and two spatial co- ordinates. The advantages of taking into
account the information about the image structure in pixel clustering are shown. The proposed
5D k-means technique requires, similarly to other segmentation techniques, an additional
- 19 -
postprocessing to eliminate oversegmentation. Our approach is evaluated on different simple and
complex images.
A Review of State-of-the-art Speckle Reduction Techniques for Optical Coherence Tomography
Fingertip Scans
Mr. Luke Nicholas Darlow and Sharat Saurabh Akhoury and James Connan
Council for Industrial and Scientific Research (CSIR); and Rhodes University, South Africa
Standard surface fingerprint scanners are vulnerable to counterfeiting attacks and also failure
due to skin damage and distortion. Thus a high security and damage resistant means of
fingerprint acquisition is needed, providing scope for new approaches and technologies. Optical
Coherence Tomography (OCT) is a high resolution imaging technology that can be used to
image the human fingertip and allow for the extraction of a subsurface fingerprint. Being robust
toward spoofing and damage, the subsurface fingerprint is an attractive solution. However, the
nature of the OCT scanning process induces speckle: a correlative and multiplicative noise. Six
speckle reducing filters for the digital enhancement of OCT fingertip scans have been evaluated.
The optimized Bayesiannon-local means algorithm improved the structural similarity between
processed and reference images by 34%, increased the signal-to-noise ratio, and yielded the
most promising visual results. An adaptive wavelet approach, originally designed for ultrasound
imaging, and a speckle reducing anisotropic diffusion approach also yielded promising results. A
reformulation of these in future work, with an OCT-speckle specific model, may improve their
Kernel weights optimization for error diffusion halftoning method
Dr. Victor Fedoseev
Image Processing Systems Institute, RAS; Samara State Aerospace University, Russia
This paper describes a study to find the best error diffusion kernel for digital halftoning under
various restrictions on the number of non-zero kernel coefficients and their set of values. As an
objective measure of quality, WSNR was used. The problem of multidimensional optimization
was solved numerically using several well-known algorithms: Nelder–Mead, BFGS, and others.
The study found a kernel function that provides a quality gain of about 5% in comparison with
the best of the commonly used kernel introduced by Floyd and Steinberg. Other kernels obtained
allow to significantly reduce the computational complexity of the halftoning process without
reducing its quality.
High-Speed Segmentation-Driven High-Resolution Matching
Mr. Fredrik Ekstrand, Carl Ahlberg, Mikael Ekström, Giacomo Spampinato
Malardalen University, Sweden
This paper proposes a segmentation-based approach for matching of high-resolution stereo
images in
real time. The approach employs direct region matching in a raster scan fashion influenced by
scanline approaches, but with pixel decoupling. To enable real-time performance it is
implemented as a heterogeneous system of an FPGA and a sequential processor. Additionally,
the approach is designed for low resource usage in order to qualify as part of unified image
processing in an embedded system.
- 20 -
Venue: Torino Room
Time: 13:30pm-16:00pm
Reducing Motion Blur by Adjusting Integration Time for Scanning Camera with TDI CMOS
Mr Haengpal Heo and Sung Woong Ra
Korea Aerospace Research Institute, Korea
CMOS image sensors are being used for more and more remote sensing camera systems
because the system with CMOS can be compact and requires less power. Drawbacks of the
CMOS from the point of performance are being improved nowadays. TDI CMOS image sensors
are being utilized for high resolution earth observation camera systems. Intrinsic weak point of
the TDI CMOS, compared to the TDI CCD, is the motion blur but lots of ways are being
introduced to overcome it. As the motion blur is not critical in the TDI CCD camera thanks to
the multi-phased clocking scheme, some TDI CMOS samples the signal more than once to
mimic TDI CCD. One active area (photo diode) can be divided into two or more sections for
better synchronization. Physical gap of the mask can be placed between pixels to mitigate the
blur in the scanning direction. In this case, the motion blur can be minimized effectively, but the
amount of the signal that can be collected will be reduced. Motion blur compensation is being
achieved with the sacrifice of another design parameters. In this paper, alternative way of
utilizing TDI CMOS camera system is introduced. Thanks to the TDI function, SNR
performance is less challenging compared to the MTF that is degraded by the motion blur.
Therefore, the capability of the proximity electronics adjusting the integration time make it
possible to reduce the motion blur with the expense of abundant signal. Instead of physical
masking between pixels, if the integration time is reduced from the line time, the resultant
effects are the same but, the system can also be operated with full integration time. That means
it can be selectable. Depending on the characteristics of the target, SNR prioritized image or
MTF prioritized image can be selectively obtained
Feature Integration with Random Forests for Real-time Human activity
Hirokatsu Kataoka, Dr. Kiyoshi Hashimoto, Yoshimitsu, Aoki
Keio University, Japan
This paper presents an approach for real-time human activity recognition. Three different kinds
of features (flow, shape, and a keypoint-based feature) are applied in activity recognition. We
use random forests for feature integration and activity classification. A forest is created at each
feature that performs as a weak classifier. The international classification of functioning,
disability and health (ICF) proposed by WHO is applied in order to set the novel definition in
activity recognition. Experiments on human activity recognition using the proposed framework
show - 99.2% (Weizmann action dataset), 95.5% (KTH human actions dataset), and 54.6%
(UCF50 dataset) recognition accuracy with a real-time processing speed. The feature integration
and activity-class definition allow us to accomplish high-accuracy recognition match for the
state-of-the-art in real-time
- 21 -
A Combined Vision-Inertial Fusion Approach for 6-DoF Object Pose Estimation
Ms. Juan Li, Ana M. Bernardos, Paula Tarrí
o, JoséR. Casar
Universidad Politécnica de Madrid, Spain
The estimation of the 3D position and orientation of moving objects (‘pose’ estimation) is a
critical process for many applications in robotics, computer vision or mobile services. Although
major research efforts have been carried out to design accurate, fast and robust indoor pose
estimation systems, it remains as an open challenge to provide a low-cost, easy to deploy and
reliable solution. Addressing this issue, this paper describes a hybrid approach for 6 degrees of
freedom (6-DoF) pose estimation that fuses acceleration data and stereo vision to overcome the
respective weaknesses of single technology approaches. The system relies on COTS
technologies (standard webcams, accelerometers) and printable colored markers. It uses a set of
infrastructure cameras, located to have the object to be tracked visible most of the operation
time; the target object has to include an embedded accelerometer and be tagged with a fiducial
marker. This simple marker has been designed for easy detection and segmentation and it may
be adapted to different service scenarios (in shape and colors). Experimental results show that
the proposed system provides high accuracy, while satisfactorily dealing with the real-time
A new method for high-capacity information hiding in video robust against temporal
Vitaly Mitekin and Dr. Victor Fedoseev
Image Processing Systems Institute, RAS; Samara State Aerospace University, Russia
This paper presents a new method for high-capacity information hiding in digital video and
algorithms of embedding and extraction of hidden information based on this method. These
algorithms do not require temporal synchronization to provide robustness against both malicious
and non-malicious frame dropping (temporal desynchronization). At the same time, due to
randomized distribution of hidden information bits across the video frames, the proposed
method allows to increase the hiding capacity proportionally to the number of frames used for
information embedding. The proposed method is also robust against “watermark estimation”
attack aimed at estimation of hidden information without knowing the embedding key or
non-watermarked video. Presented experimental results demonstrate declared features of this
Generalization of the Viola-Jones method as a Decision Tree of Strong Classifiers for Real-time
Object Recognition in Video Stream
Alina Minkina, Dmitry Nikolaev, Mr. Sergey Usilin and Vladimir Kozyrev
Institute for Systems Analysis of Russian Academy of Sciences, Moscow, Russia
In this paper, we present a new modification of Viola-Jones complex classifiers. We describe a
complex classifier in the form of a decision tree and provide a method of training for such
classifiers. Performance impact of the tree structure is analyzed. Comparison is carried out of
precision and performance of the presented method with that of the classical cascade. Various
tree architectures are experimentally studied. The task of vehicle wheels detection on images
obtained from an automatic vehicle classification system is taken as an example.
- 22 -
Analysis to Feature-Based Video Stabilization/Registration Techniques within Application of
Traffic Data Collection
Mr. Mojtaba T. Sadat ; Francesco Viti
University of Luxembourg,Luxembourg
Machine vision is rapidly gaining popularity in the field of Intelligent Transportation Systems.
In particular, advantages are foreseen by the exploitation of Aerial Vehicles (AV) in delivering a
superior view on traffic phenomena. However, vibration on AVs makes it difficult to extract
moving objects on the ground. To partly overcome this issue, image stabilization/registration
procedures are adopted to correct and stitch multiple frames taken of the same scene but from
different positions, angles, or sensors. In this study, we examine the impact of multiple
feature-based techniques for stabilization, and we show that SURF detector outperforms the
others in terms of time efficiency and output similarity.
Experimental Comparison of Methods for Estimation the Observed Velocity of the Vehicle in
Video Stream
Mr. Ivan Konovalenko, Elena Kuznetsova
Institute for Information Transmission Problems Russian Academy of Sciences (IITP
RAS),Russian Federation
In this paper, we consider the problem of object's velocity estimation via video stream by
comparing three new methods of velocity estimation named as vertical edge algorithm,
modified Lucas-Kanade method and feature points algorithm. As an applied example the task of
automatic evaluation of vehicles' velocity via video stream on toll roads is chosen. We took
some videos from cameras mounted on the toll roads and marked them out to determine true
velocity. Comparison is carried out of performance in the correct velocity detection of the
proposed methods with each other. The relevance of this paper is practical implementation of
these methods overcoming all the difficulties of realization.
Video Partitioning by Segmenting Moving Object Trajectories
Asst. Prof. Neeta Nain, Deepa Modi
Malaviya National Institute of Technology Jaipur, India
Video partitioning may involve in number of applications and present solutions for monitoring
and tracking particular person trajectory and also helps in to generate semantic analysis of single
entity or of entire video. Many recent advances in object detection and tracking concern about
motion structure and data association used to assigned a label to trajectories and analyze them
independently. In this work we propose an approach for video portioning and a structure is
given to store motion structure of target set to monitor in video. Spatiotemporal tubes separate
individual objects that help to generate semantic analysis report for each object individually.
The semantic analysis system for video based on this framework provides not only efficient
synopsis generation but also spatial collision where the temporal consistency can be resolve for
representation of semantic knowledge of each object. For keeping low computational
complexity trajectories are generated online and classification, knowledge representation and
arrangement over spatial domain are suggested to perform in offline manner.
- 23 -
A Unified Approach for Development of Urdu Corpus for OCR and Demographic Purpose
Prakash Choudhary, Asst. Prof. Neeta Nain, Mushtaq Ahmed
Malaviya National Institute of Technology Jaipur, India
This paper presents a methodology for the development of an Urdu handwritten text image
Corpus and application of Corpus linguistics in the field of OCR and information retrieval from
handwritten document. Compared to other language scripts, Urdu script is little bit complicated
for data entry. To enter a single character it requires a combination of multiple keys entry. Here,
a mixed approach is proposed and demonstrated for building Urdu Corpus for OCR and
Demographic data collection. Demographic part of database could be used to train a system to
fetch the data automatically, which will be helpful to simplify existing manual data-processing
task involved in the field of data collection such as input forms like Passport, Ration Card,
Voting Card, AADHAR, Driving licence, Indian Railway Reservation, Census data etc. This
would increase the participation of Urdu language community in understanding and taking
benefit of the Government schemes. To make availability and applicability of database in a vast
area of corpus linguistics, we propose a methodology for data collection, mark-up, digital
transcription, and XML metadata information for benchmarking.
Accurate and Robust Spherical Camera Pose Estimation using Consistent Points
Mr. Christiano Couto Gava, Bernd Krolla and Didier Stricker
Technische Universitaet Kaiserslautern, Germany
This paper addresses the problem of multi-view camera pose estimation of high resolution, full
spherical images. A novel approach to simultaneously retrieve camera poses along with a sparse
point cloud is designed for large scale scenes. We introduce the concept of consistent points that
allows to dynamically select the most reliable 3D points for nonlinear pose refinement. In
contrast to classical bundle adjustment approaches, we propose to reduce the parameter search
space while jointly optimizing camera poses and scene geometry. Our method notably improves
accuracy and robustness of camera pose estimation, as shown by experiments carried out on real
image data.
Image Boundaries Detection: From Thresholding to Implicit Curve Evolution
Dr. Souleymane Balla-Arabé, Fan Yang, Vincent Brost
LE2I CNRS-UMR, Laboratory of Electronic, Computing and Imaging Sciences, University of
Burgundy, Dijon, France
The development of high dimensional large-scale imaging devices increases the need of fast,
robust and accurate image segmentation methods. Due to its intrinsic advantages such as the
ability to extract complex boundaries, while handling topological changes automatically, the
level set method (LSM) has been widely used in boundaries detection. Nevertheless, their
computational complexity limits their use for real time systems. Furthermore, most of the LSMs
share the limit of leading very often to a local minimum, while the effectiveness of many
computer vision applications depends on the whole image boundaries. In this paper, using the
image thresholding and the implicit curve evolution frameworks, we design a novel boundaries
detection model which handles the above related drawbacks of the LSMs. In order to accelerate
- 24 -
the method using the graphics processing units, we use the explicit and highly parallelizable
lattice Boltzmann method to solve the level set equation. The introduced algorithm is fast and
achieves global image segmentation in a spectacular manner. Experimental results on various
kinds of images demonstrate the effectiveness and the efficiency of the proposed method.
Venue: Torino Room
Time: 16:20pm-19:30pm
Lossless 4D Medical Images Compression with Motion Compensation and Lifting Wavelet
Ms Leila Belhadef and Zoulikha Mekkakia Maaza
Department of Computer Sciences University of Sciences and Technology of Oran USTO-MB,
The lossless compression of 4D medical images is largely used for reduce the important size of
this images and permit to reconstruct an identical image to the original image for diagnostic
purposes. In this paper, we present a lossless compression technique of 4D medical images
based on motion compensation and temporal filtering. The technique consists to apply 2D
integer wavelet transform followed motion compensation or lifting wavelet order to eliminate
efficiently the spatial and temporal redundancies between the 2D slices components a 4D
image. The wavelet coefficients obtained are coded with SPIHT3D algorithm. Experimental
results show that proposed technique can give better results compared to the classic technique
such as SPIHT3D in compression rate
Computer-Aided Diagnosis Method for MRI-guided Prostate Biopsy within the Peripheral
Zone using Grey Level Histograms
Mr. Andrik Rampun, Paul Malcolm and Reyer Zwiggelaar
Department of Computer Science, Aberystwyth University, UK
This paper describes a computer-aided diagnosis method for targeted prostate biopsies within
the peripheral zone in T2-Weighted MRI. We subdivided the peripheral zone into four regions
and compare each sub region's grey level histogram with malignant and normal histogram
models, and use specific metrics to estimate the presence of abnormality. The initial evaluation
based on 200 MRI slices taken from 40 different patients and we achieved 87% correct
classification rate with 89% and 86% sensitivity and specificity, respectively. The main
contribution of this paper is a novel approach of Computer Aided Diagnosis which is using
grey level histograms analysis between sub regions. In clinical point of view, the developed
method could assist clinicians to perform targeted biopsies which are better than the random
ones which are currently used.
X-ray Fluorescence Tomography: Jacobin matrix and confidence of the reconstructed images
Nikolaev Dmitry, Dr. Chukalina Marina
Institute of Microelectronics Technology and High Purity Materials RAS, Russia
- 25 -
The goal of the X-ray Fluorescence Computed Tomography (XFCT) is to give the quantitative
description of an object under investigation (sample) in terms of the element composition.
However, light and heavy elements inside the object give different contribution to the
attenuation of the X-ray probe and of the fluorescence. It leads to the elements got in the
shadow area do not give any contribution to the registered spectrum. Iterative reconstruction
procedures will try to set to zero the variables describing the element content in composition of
corresponding unit volumes as these variables do not change system's condition number.
Inversion of the XFCT Radon transform gives random values in these areas. To evaluate the
confidence of the reconstructed images we first propose, in addition to the reconstructed
images, to calculate a generalized image based on Jacobian matrix. This image highlights the
areas of doubt in case if there are exist. In the work we have attempted to prove the advisability
of such an approach. For this purpose, we analyzed in detail the process of tomographic
projection formation.
Filter-based Feature Selection and Support Vector Machine for False Positive Reduction in
Computer-Aided Mass Detection in Mammogram
Viet Dung Nguyen, Duc Thuan Nguyen, Tien Dung Nguyen, Viet Anh Phan and Mr. Quang
Doan Truong
Hanoi University of Science and Technology/ Tokyo Institute of Technology.
In this paper, a method for reducing false positive in computer-aided mass detection in
screening mammograms is proposed. A set of 32 features, including First Order Statistics
(FOS) features, Gray-Level Occurrence Matrix (GLCM) features, Block Difference Inverse
Probability (BDIP) features, and Block Variation of Local Correlation coefficients (BVLC) are
extracted from detected Regions-Of-Interest (ROIs). An optimal subset of 8 features is selected
from the full feature set by mean of a filter-based Sequential Backward Selection (SBS). Then,
Support Vector Machine (SVM) is utilized to classify the ROIs into massive regions or normal
regions. The method’s performance is evaluated using the area under the Receiver Operating
Characteristic (ROC) curve (AUC or AZ). On a dataset consisting about 2700 ROIs detected
from mini-MIAS database of mammograms, the proposed method achieves AZ=0.938.
The Brain MRI Classification Problem From Wavelets Perspective
Mr. Mohamed Mokhtar Bendib, Hayet Farida Merouani, Fatma Diaba
LRI Laboratory, Badji-Mokhtar University-Annaba,Algeria
Haar and Daubechies 4 (DB4) are the most used wavelets for brain MRI (Magnetic Resonance
Imaging) classification. The former is simple and fast to compute while the latter is more
complex and offers a better resolution. This paper explores the potential of both of them in
performing Normal versus Pathological discrimination on the one hand, and Multiclassification
on the other hand. The Whole Brain Atlas is used as a validation database and the Random
Forest (RF) algorithm is employed as a learning approach. The achieved results are discussed
and statistically compared.
- 26 -
Classification of Asthmatic Breath Sounds by Using Wavelet Transforms and Neural Networks
Ms Fatma Zehra Göğüş, Bekir Karlık and Güneş Harman
Selçuk University, Turkey
In this study, respiratory sounds of asthmatic patients and healthy individuals are analyzed and
classified to diagnose asthma. Normal and asthmatic breath sound signals are divided into
segments which include a single respiration cycle as inspiration and expiration. Analyses of
these sound segments are carried out by using both discrete wavelet transform (DWT) and
wavelet packet transform (WPT). Each sound segment is decomposed into frequency
sub-bands using DWT and WPT. Feature vectors are constructed by extracting statistical
features from the sub-bands. Artificial neural network (ANN) is used to classify respiratory
sound signals as normal and level of asthmatic diseases (mild asthma, moderate asthma and
severe asthma). The classification results of DWT and WPT are compared with each other in
terms of classification accuracy.
Electromyography Low Pass Filtering Effects on the Classification of Hand Movements in
Amputated Subjects
Dr Manfredo Atzori and Henning Müller
University of Applied Sciences Western Switzerland (HES-SO Valais), Switzerland
People with transradial hand amputations can have control capabilities of prosthetic hands via
surface electromyography (sEMG) but the control systems are limited and usually not natural.
In the scientific literature, the application of pattern recognition techniques to classify hand
movements in sEMG led to remarkable results but the evaluations are usually far from real life
applications with all uncertainties and noise. Therefore, there is a need to improve the
movement classification accuracy in real settings.
Smoothing the signal with a low pass filter is a common pre– processing procedure to remove
high–frequency noise. However, the filtering frequency modifies the signal strongly and can
therefore affect the classification results.
In this paper we analyze the dependence of the classification accuracy on the pre–processing
low–pass filtering frequency in 3 hand amputated subjects performing 50 different movements.
The results highlight two main interesting aspects. First, the filtering frequency strongly affects
the classification accuracy, and choosing the right frequency between 1Hz–5Hz can improve
the accuracy up to 5%. Second, different subjects obtain the best classification performance at
different frequencies. Theoretically these facts could affect all the similar classification
procedures re- ducing the classification uncertainity. Therefore, they contribute to set the field
closer to real life applications, which could deeply change the life of hand amputated subjects.
Analysis of Brain Signals for the Discrimination of Observations of Correct and Incorrect
Pantelis Asvestas, Ms Alexandra Korda, Irene Karanasiou, Spiros Kostopoulos, George
Matsopoulos and Errikos Ventouras
The aim of this paper is to present a methodology that is capable to discriminate between
- 27 -
observations of correct actions and observations of incorrect. Towards this end, Event-Related
Potentials (ERPs) were recorded from 47 locations on the scalp of 16 healthy volunteers, who
observed correct or incorrect actions of other subjects. The recorded signals were analyzed in
the frequency domain and the normalized signal power at various frequency bands was
calculated. Feature selection was applied in order to reduce the number of available features.
Finally, the obtained feature vectors were clustered using the fuzzy c-means algorithm resulting
in clustering accuracy 84.4%.
Feature Extraction of Probe Mark Image and Automatic Detection of Probing Pad Defects in
Semiconductor Using CSVM
Mr. Jeong-Hoon Lee, Jee-Hyong Lee
Semiconductor Division, Samsung Electronics / Sungkunkwan University, Korea,
As semiconductor micro-fabrication process continues to advance, the size of probing pads also
become smaller in a chip. A probe needle contacts each probing pad for electrical test.
However, probe needle may incorrectly touch probing pad. Such contact failures damage
probing pads and cause qualification problems. In order to detect contact failures, the current
system observes the probing marks on pads. Due to a low accuracy of the system, engineers
have to redundantly verify the result of the system once more, which causes low efficiency. We
suggest an approach for automatic defect detection to solve these problems using image
processing and CSVM. We develop significant features of probing marks to classify contact
failures more correctly. We reduce 38% of the workload of engineers.
Disparity Estimation from Monocular Image Sequence
Dr. Qieshi Zhang, and Sei-ichiro Kamata
Waseda University, Japan
This paper proposes a novel method for estimating disparity accurately. To achieve the ideal
result, an optimal adjusting framework is proposed to address the noise, occlusions, and
outliners. Different from the typical multi-view stereo (MVS) methods, the proposed approach
not only use the color constraint, but also use the geometric constraint associating multiple
frame from the image sequence. The result shows the disparity with a good visual quality that
most of the noise is eliminated, the errors in occlusion area are suppressed and the details of
scene objects are preserved.
A one-bit approach for image registration
Mr. An Nguyen, Mark Pickering, Andrew Lambert
University of New South Wales, Australia
Motion estimation or optic flow computation for automatic navigation and obstacle avoidance
programs running on Unmanned Aerial Vehicles (UAVs) is a challenging task. These
challenges come from the requirements of real-time processing speed and small light-weight
image processing hardware with very limited resources (especially memory space) embedded
on the UAVs. Solutions towards both simplifying computation and saving hardware resources
have recently received much interest. This paper presents an approach for image registration
using binary images which addresses these two requirements. This approach uses translational
- 28 -
information between two corresponding patches of binary images to estimate global motion.
These low bit-resolution images require a very small amount of memory space to store them
and allow simple logic operations such as XOR and AND to be used instead of more complex
computations such as subtractions and multiplications.
Atmospheric Correction of Hyperspectral Images Using Approximate Solution of MODTRAN
Transmittance Equation
Dr. Alexander Belov, Myasnikov V.V.
Samara State Aerospace University; Image Processing Systems Institute, Russian Academy of
Sciences, Russian Federation
The paper presents a method of atmospheric correction of remote sensing hyperspectral images.
The method based on approximate solution of MODTRAN transmittance equation using
simultaneous analysis of remote sensing hyperspectral image and “ideal” hyperspectral image
which is free from atmospheric distortions. Experimental results show that proposed method is
applicable to perform atmospheric correction.
Semi-automated segmentation of neuroblastoma nuclei using the Gradient Energy Tensor: a
user driven approach
Mr. Florian Kromp, Sabine Taschner-Mandl, Magdalena Schwarz, Johanna Blaha, Peter F.
Ambros, Michael Reiter
Vienna University of Technology; Children’s Cancer Research Institute; Labdia
Labordiagnostik GmbH, Austria
We propose a user-driven method for the segmentation of neuroblastoma nuclei in microscopic
fluorescence images involving the gradient energy tensor. Multispectral fluorescence images
contain intensity and spatial information about antigene expression, fluorescence in situ
hybridization (FISH) signals and nucleus morphology. The latter serves as basis for the
detection of single cells and the calculation of shape features, which are used to validate the
segmentation and to reject false detections. Accurate segmentation is difficult due to varying
staining intensities and aggregated cells. It requires several (meta-) parameters, which have a
strong influence on the segmentation results and have to be selected carefully for each sample
(or group of similar samples) by user interactions. Because our method is designed for
clinicians and biologists, who may have only limited image processing background, an
interactive parameter selection step allows the implicit tuning of parameter values. With this
simple but intuitive method, segmentation results with high precision for a large number of
cells can be achieved by minimal user interaction. The strategy was validated on
hand-segmented datasets of three neuroblastoma cell lines.
- 29 -
Venue: Napoli Room
Time: 13:30pm-16:00pm
Orthogonal wavelet moments and their multifractal invariants
Mr. Dmitry Uchaev, Denis Uchaev and Vasiliy Malinnikov
Moscow State University of Geodesy and Cartography, Russia
This paper introduces a new family of moments, namely orthogonal wavelet moments (OWMs),
which are orthogonal realization of wavelet moments (WMs). In contrast to WMs with
nonorthogonal kernel function, these moments can be used for multiresolution image
representation and image reconstruction. The paper also introduces multifractal invariants (MIs)
of OWMs which can be used instead of OWMs. Some reconstruction tests performed with
noise-free and noisy images demonstrate that MIs of OWMs can also be used for image
smoothing, sharpening and denoising. It is established that the reconstruction quality for MIs of
OWMs can be better than corresponding orthogonal moments (OMs) and reduces to the
reconstruction quality for the OMs if we use the zero scale level.
Voice Morphing based on Spectral Features and Prosodic Modification
Mr Abdul Qavi, Shoab Ahmed Khan and Kashif Basir
Center for Advanced Studies in Engineering, Islamabad, Pakistan
This paper is aimed at morphing the speech uttered by a source speaker in a manner that it
seems to be spoken by another target speaker – a new identity is given while preserving the
original content. The proposed method transforms the vocal tract parameters and glottal
excitation of the source speaker into target speaker’s acoustic characteristics. It relates to the
development of appropriate vocal tract models that can capture information specific to the
speaker and estimate the model parameters that closely relate to the model of the target speaker.
It detects the pitch, separates the glottal excitation and vocal tract spectral features. The glottal
excitation of the source is taken, voice/un-voice decision is made, the prosody information is
found, PSOLA is used to modify the pitch, the spectral features are found, and finally speech is
modified using target spectral features and prosody. The subjective experiment shows that the
proposed method improves the quality of conversion and contains the original vocal and glottal
characteristics of the target speaker.
FPGA based image processing for optical surface inspection with real time constraints
Mr. Ylber Hasani, Ernst Bodenstorfer, Jörg Brodersen, Konrad J. Mayer
AIT Austrian Institute of Technology GmbH, Austria
Today, high-quality printing products like banknotes, stamps, or vouchers, are automatically
checked by optical surface inspection systems. In a typical optical surface inspection system,
several digital cameras acquire the printing products with fine resolution from different viewing
angles and at multiple wavelengths of the visible and also near infrared spectrum of light. The
- 30 -
cameras deliver data streams with a huge amount of image data that have to be processed by an
image processing system in real time. Due to the printing industry’s demand for higher
throughput together with the necessity to check finer details of the print and its security features,
the data rates to be processed tend to explode. In this contribution, a solution is proposed, where
the image processing load is distributed between FPGAs and digital signal processors (DSPs) in
such a way that the strengths of both technologies can be exploited. The focus lies upon the
implementation of image processing algorithms in an FPGA and its advantages. In the presented
application, FPGA-based image-preprocessing enables real-time implementation of an optical
color surface inspection system with a spatial resolution of 100 μm and for object speeds over
10 m/s. For the implementation of image processing algorithms in the FPGA, pipeline
parallelism with clock frequencies up to 150 MHz together with spatial parallelism based on
multiple instantiations of modules for parallel processing of multiple data streams are exploited
for the processing of image data of two cameras and three color channels. Due to their
flexibility and their fast response times, it is shown that FPGAs are ideally suited for realizing a
configurable all-digital PLL for the processing of camera line-trigger signals with frequencies
about 100 kHz, using pure synchronous digital circuit design.
Improving Parametric Active Contours by Using Object Center Of Gravity Distance Map
Asst. Prof. Abdelkader MAROUF and Amrane HOUACINE
USTHB, Algiers
In this paper, we propose an improvement of the classical parametric active contours. The
method, presented here, consists in adding a new energy term based on the object center of
gravity distance map. This additional term acts as attraction forces that constrain the contour to
remain in the vicinity of the object. The distance map introduced here differs from the classical
one since it is not based on a binary image, but rather constitutes a simplified and very fast
version that relates only to one point, defined as the expected center of gravity of the object.
The additional forces, so introduced, act as a kind of balloon method with improved
convergence. The method is evaluated for object segmentation in images, and also for object
tracking. The center of gravity is computed from the initial contour for each image of the
sequence considered. Compared to the balloon method, the presented approach appears to be
faster and less prone to loops, as it behaves better for object tracking.
Vehicle Passes Detector Based on Multi-Sensor Analysis
D. Bocharov, Mr. Sidorchuk Dmitry, I.Konovalenko, I.Koptelov
Moscow Institute of Physics and Technology (State University), Visillect Service Limited,
The study concerned deals with a new approach to the problem of detecting vehicle passes in
vision-based automatic vehicle classification system. Essential non-affinity image variations
and signals from induction loop are the events that can be considered as detectors of an object
presence. We propose several vehicle detection techniques based on image processing and
induction loop signal analysis. Also we suggest a combined method based on multi-sensor
analysis to improve vehicle detection performance. Experimental results in complex outdoor
environments show that the proposed multi-sensor algorithm is effective for vehicles detection.
- 31 -
Evaluating Word Semantic Properties Using Sketch Engine
Asst. Prof. Velislava Stoykova
Institute for Bulgarian Language, Bulgarian Academy of Sciences, Bulgaria
The paper describes approach to use statistically-based tools incorporated into Sketch Engine
system for electronic text corpora processing to mining big textual data to search and extract
word semantic properties. It presents and compares series of word search experiments using
different statistical approaches and evaluates results of Bulgarian language EUROPARL 7
Corpus search with respect to extracted word semantic properties. Finally, the methodology is
extended for multilingual application using Slovak language EUROPARL 7 Corpus.
Optimized Curvelet-based Empirical Mode Decomposition
Dr. Renjie Wu, Qishi Zhang and Sei-ichiro Kamata
The recent years has seen immense improvement in the development of signal processing based
on Curvelet transform. The Curvelet transform provide a new multi-resolution representation.
The frame elements of Curvelets exhibit higher direction sensitivity and anisotropic than the
Wavelets, multi-Wavelets, steerable pyramids, and so on. These features are based on the
anisotropic notion of scaling. In practical instances, time series signals processing problem is
often encountered. To solve this problem, the time-frequency analysis based methods are
studied. However, the time-frequency analysis cannot always be trusted. Many of the new
methods were proposed. The Empirical Mode Decomposition (EMD) is one of them, and
widely used. The EMD aims to decompose into their building blocks functions that are the
superposition of a reasonably small number of components, well separated in the
time-frequency plane. And each component can be viewed as locally approximately harmonic.
However, it cannot solve the problem of directionality of high-dimensional. A reallocated
method of Curvelet transform (optimized Curvelet-based EMD) is proposed in this paper. We
introduce a definition for a class of functions that can be viewed as a superposition of a
reasonably small number of approximately harmonic components by optimized Curvelet family.
We analyze this algorithm and demonstrate its results on data. The experimental results prove
the effectiveness of our method.
An Agglomerative Approach for Shot Summarization Based on Content Homogeneity
Antonis Ioannidis, Vasileios Chasanis and Prof. Aristidis Likas
Department of Computer Science and Engineering, University of Ioannina, Greece
An efficient shot summarization method is presented based on agglomerative clustering of the
shot frames. Unlike other agglomerative methods, our approach relies on a cluster merging
criterion that computes the content homogeneity of a merged cluster. An important feature of
the proposed approach is the automatic estimation of the number of a shot's most representative
frames, called keyframes. The method starts by splitting each video sequence into small, equal
sized clusters (segments). Then, agglomerative clustering is performed, where from the current
set of clusters, a pair of clusters is selected and merged to form a larger unimodal
(homogeneous) cluster. The algorithm proceeds until no further cluster merging is possible. At
the end, the medoid of each of the final clusters is selected as keyframe and the set of keyframes
- 32 -
constitutes the summary of the shot. Numerical experiments demonstrate that our method
reasonable estimates the number of ground-truth keyframes, while extracting non-repetitive
keyframes that efficiently summarize the content of each shot.
Stereoscopic Roadside Curb Height Measurement Using V-Disparity
Mr. Florin Octavian Matu, Iskren Vlaykov, Mikkel Thogersen, Kamal Nasrollahi, Thomas
Aalborg University,Aalborg, Denmark
Managing road assets, such as roadside curbs, is one of the interests of municipalities. As an
interesting application of computer vision, this paper proposes a system for automated
measurement of the height of the roadside curbs. The developed system uses the spatial
information available in the disparity image obtained from a stereo setup. Data about the
geometry of the scene is extracted in the form of a row-wise histogram of the disparity map.
From parameterizing the two strongest lines, each pixel can be labeled as belonging to one
plane, either ground, sidewalk or curb candidates. Experimental results show that the system
can measure the height of the roadside curb with good accuracy and precision.
SubPatch: random kd-tree on sub-sampled patches for nearest neighbor field estimation
Fabrizio Pedersoli, Dr. Sergio Benini, Nicola Adami, Masahiro Okuda and Riccardo Leonardi
Dept. of Information Engineering, University of Brescia, ITALY
We propose a new method to compute the approximate nearest-neighbors field (ANNF)
between image pairs using random kd-tree and patch set sub-sampling. By exploiting image
coherence we demonstrate that it is possible to reduce the number of patches on which we
compute the ANNF, while maintaining high overall accuracy on the final result. Information on
missing patches is then recovered by interpolation and propagation of good matches. The
introduction of the sub-sampling factor on patch sets also allows for setting the desired trade off
between accuracy and speed, providing a flexibility that lacks in state-of-the-art methods. Tests
conducted on a public database prove that our algorithm achieves superior performance with
respect to PatchMatch (PM) and Coherence Sensitivity Hashing (CSH) algorithms in a
comparable computational time.
Genetic Algorithms for Mesh Surface Smoothing
Assoc. Prof. Mehmet Yasin Özsağlama, Mehmet Çunkaş
Selcuk University, Technology Faculty, Department of Electric-Electronics Engineering, Turkey
This paper presents a new 3D mesh smoothing algorithm which is based on evolutionary
methods. This method is a new optimization technique with Genetic algorithm. The main
approach is based on expanding the search space by generating new meshes as genetic
individuals. Features and shrinkage of models are preserved that are the main problems of
existing smoothing algorithms. So with this method, over-smoothing effects are reduced and
undesirable noises are effectively removed.
- 33 -
On improvements of neural network accuracy with fixed number of active neurons
Ms. Natalia Sokolova, Dmitry P. Nikolaev, Dmitry Polevoy
Smart Engines Ltd, Russia
In this paper an improvement possibility of multilayer perceptron based classifiers with using
composite classifier scheme with predictor function was exploited. Recognition of embossed
number characters on plastic cards in the image taken by mobile camera was used as a model
- 34 -
Venue: Napoli Room
Time: 16:20pm-19:30pm
Object Detection using Categorised 3D Edges
Ms. Lilita Kiforenko, Anders Glent Buch, Leon Bodenhagen and Norbert Krüger
University of Southern Denmark, Denmark
In this paper we present an object detection method that uses edge categorisation in combination
with a local multi-modal histogram descriptor, all based on RGB-D data. Our target application
is robust detection and pose estimation of known objects. We propose to apply a recently
introduced edge categorisation algorithm for describing objects in terms of its different edge
types. Relying on edge information allow our system to deal with objects with little or no texture
or surface variation. We show that edge categorisation improves matching performance due to
the higher level of discrimination, which is made possible by the explicit use of edge categories
in the feature descriptor. We quantitatively compare our approach with the state-of-the-art
template based Linemod method, which also provides an effective way of dealing with
texture-less objects, tests were performed on our own object dataset. Our results show that
detection based on edge local multi-modal histogram descriptor outperforms Linemod with a
significantly smaller amount of templates.
Auto-SEIA: Simultaneous optimization of image processing and machine learning algorithms
Dr. Valentina Negro Maggio, Luca Iocchi
Sapienza University of Rome, Italy
Object classification from images is an important task for machine vision and it is a crucial
ingredient for many computer vision applications, ranging from security and surveillance to
marketing. Image based object classification techniques properly integrate image processing and
machine learning (i.e., classification) procedures. In this paper we present a system for
automatic simultaneous optimization of algorithms and parameters for object classification from
images. More specifically, the proposed system is able to process a dataset of labelled images
and to return a best configuration of image processing and classification algorithms and of their
parameters with respect to the accuracy of classification. Experiments with real public datasets
are used to demonstrate the effectiveness of the developed system.
Information Based Universal Feature Extraction
Mr. Mohammad Amiri , Rüdiger Brause
Goethe University, Germany
In many real world image based pattern recognition tasks, the extraction and usage of
task-relevant features are the most crucial part of the diagnosis. In the standard approach, they
mostly remain task-specific, although humans who perform such a task always use the same
- 35 -
image features, trained in early childhood. It seems that universal feature sets exist, but they are
not yet systematically found. In our contribution, we tried to find those universal image feature
sets that are valuable for most image related tasks. In our approach, we trained a neural network
by natural and non-natural images of objects and background, using a Shannon
information-based algorithm and learning constraints. The goal was to extract those features that
give the most valuable information for classification of visual objects hand-written digits. This
will give a good start and performance increase for all other image learning tasks, implementing
a transfer learning approach. As result, in our case we found that we could indeed extract
features which are valid in all three kinds of tasks.
Interactive change detection based on dissimilarity image and decision tree
Ms Yan Wang, Alain Crouzil, Jean-Baptiste Puel
University Toulouse 3 Paul Sabatier, France
Our study mainly focus on detecting changed regions in two images of the same scene taken by
digital cameras at different times. The images taken by digital cameras generally provide less
information than multi-channel remote sensing images. Moreover, the application-dependent
insignificant changes, such as shadows or clouds, may cause the failure of the classical methods
based on image differences. The machine learning approach seems to be promising, but the lack
of a sufficient volume of training data for photographic landscape observatories discards a lot of
methods. So we investigate in this work the interactive learning approach and provide a
discriminative model that is a 16-dimensional feature space comprising the textural appearance
and contextual information. Dissimilarity measures in different neighborhood sizes are used to
detect the difference within the neighborhood of an image pair. To detect changes between two
images, the user designates change and non-change samples (pixel sets) in the images using a
selection tool. This data is used to train a classifier using decision tree training method which is
then applied to all the other pixels of the image pair. The experiments have proved the potential
of the proposed approach.
Real time rectangular document detection
Ms.Natalya Skoryukina, Dmitry P. Nikolaev, Alexander Sheshkus, Dmitry Polevoy
National University of Science and Technology "MISIS",Domodedovo,Russian Federation
In this paper we propose an algorithm for real-time rectangular document borders detection in
mobile device based applications. The proposed algorithm is based on combinatorial assembly
of possible quadrangle candidates from a set of line segments and projective document
reconstruction using the known focal length. Fast Hough Transform is used for line detection.
1D modification of edge detector is proposed for the algorithm.
Automatic Identification of Vessel Crossovers in Retinal Images
Ms. Maria Luisa Sánchez Brea, Noelia Barreira Rodrí
guez, Manuel F. González Penedo and
Brais Cancela Barizo
University of A Coruña, Spain
Crossovers and bifurcations are interest points of the retinal vascular tree useful to diagnose
- 36 -
diseases. Specifically, detecting these interest points and identifying which of them are crossings
will give us the opportunity to search for arteriovenous nicking, this is, an alteration of the
vessel tree where an artery is crossed by a vein and the former compresses the later. These
formations are a clear indicative of hypertension, among other medical problems. There are
several studies that have attempted to define an accurate and reliable method to detect and
classify these relevant points. In this article, we propose a new method to identify crossovers.
Our approach is based on segmenting the vascular tree and analyzing the surrounding area of
each interest point. The minimal path between vessel points in this area is computed in order to
identify the connected vessel segments and, as a result, to distinguish between bifurcations and
crossovers. Our method was tested using retinographies from public databases DRIVE and
VICAVR, obtaining an accuracy of 90%.
Diamond recognition algorithm using two-channel X-ray radiographic separator
Dmitry P. Nikolaev, Andrey Gladkov, Timofey Chernov, Mr. Konstantin Bulatov
Institute for Systems Analysis RAS, National University of Science and Technology “MISIS”,
Russian Federation
In this paper real time classification method for two-channel X-ray radiographic diamond
separation is discussed. Proposed method does not require direct hardware calibration but uses
sample images as a train dataset. It includes online dynamic time warping algorithm for
inter-channel synchronization. Additionally, algorithms of online source signal control are
discussed, including X-ray intensity control, optical noise detection and sensor occlusion
Improving text recognition by distinguishing scene and overlay text
Mr. Bernhard Quehl, Haojin Yang, Harald Sack
Hasso Plattner Institute, Potsdam, Germany
Video texts are closely related to the content of a video. They provide a valuable source for
indexing and interpretation of video data. Text detection and recognition task in images or
videos typically distinguished between overlay and scene text. Overlay text is artificially
superimposed on the image at the time of editing and scene text is text captured by the recording
system. Typically, OCR systems are specialized on one kind of text type. However, in video
images both types of text can be found. In this paper, we propose a method to automatically
distinguish between overlay and scene text to dynamically control and optimize post processing
steps following text detection. Based on a feature combination a Support Vector Machine (SVM)
is trained to classify scene and overlay text. We show how this distinction in overlay and scene
text improves the word recognition rate. Accuracy of the proposed methods has been evaluated
by using publicly available test data sets.
LBP and SIFT based Facial Expression Recognition
Mr. Ömer Sümer, Ece Olcay Güneş
Istanbul Technical University, Germany
This study compares the performance of local binary patterns (LBP) and scale invariant feature
- 37 -
transform (SIFT) with support vector machines (SVM) in automatic classification of discrete
facial expressions. Facial expression recognition is a multiclass classification problem and seven
classes; happiness, anger, sadness, disgust, surprise, fear and comtempt are classified. Using
SIFT feature vectors and linear SVM, 93,1% mean accuracy is acquired on CK+ database. On
the other hand, the performance of LBP-based classifier with linear SVM is reported on SFEW
using strictly person independent (SPI) protocol. Seven-class mean accuracy on SFEW is
59.76%. Experiments on both databases showed that LBP features can be used in a fairly
descriptive way if a good localization of facial points and partitioning strategy are followed.
Apply Lightweight Recognition Algorithms in Optical Music Recognition
Mr. Viet-Khoi Pham, Hai-Dang Nguyen, Tung-Anh Nguyen Khac, Minh-Triet Tran
University of Science, VNU-HCM, Vietnam
The problems of digitalization and transformation of musical scores into machine-readable
format are necessary to be solved since they help people to enjoy music, to learn music, to
conserve music sheets, and even to assist music composers. However, the results of existing
methods still require improvements for higher accuracy. Therefore, the authors propose
lightweight algorithms for Optical Music Recognition to help people to recognize and
automatically play musical scores. In our proposal, after removing staff lines and extracting
symbols, each music symbol is represented as a grid of identical  ∗  cells, and the features
are extracted and classified with multiple lightweight SVM classifiers. Through experiments, the
authors find that the size of 10 ∗ 12 cells yields the highest precision value. Experimental
results on the dataset consisting of 4929 music symbols taken from 18 modern music sheets in
the Synthetic Score Database show that our proposed method is able to classify printed musical
scores with accuracy up to 99.56%.
Road Shape Recognition Based On Scene Self-Similarity
Vassili Postnikov, Darya Krohina and Mr. Viktor Prun
MIPT, Russia
A method of determining of the road shape and direction is proposed. The road can potentially
have curved shape as well as be seen unclearly due to weather effects or relief features. The
proposed method uses video taken from frontal camera that is rigidly placed in car as an input
data. The method is based on self-similarity of typical road image, i.e. the smaller image inside
the road is close to downscaled initial image.
A comparative study of local descriptors for Arabic character recognition on mobile devices
Ms. Maroua Tounsi, Ikram Moalla, Adel M. Alimi, Frank Lebouregois
INSA of Lyon / University of Sfax, Tunisia
Nowadays, the number of mobile applications based on image registration and recognition is
increasing. Most interesting applications include mobile translator which can read text
characters in the real world and translates it into the native language instantaneously. In this
context, we aim to recognize characters in natural scenes by computing significant points so
called key points or features/interest points in the image. So, it will be important to compare and
- 38 -
evaluate features descriptors in terms of matching accuracy and processing time in a particular
context of natural scene images.
In this paper, we were interested on comparing the efficiency of the binary features as
alternatives to the traditional SIFT and SURF in matching Arabic characters descended from
natural scenes. We demonstrate that the binary descriptor ORB yields not only to similar results
in terms of matching characters performance that the famous SIFT but also to faster computation
suitable to mobile applications.
Thermal Face Recognition Using Moments Invariants
Dr Naser Zaer, Faris Baker and Rabie Dib
Arab Open University, Kuwait
Face recognition using different imaging modalities, particularly infrared imaging sensors, has
become an area of growing interest. The use of thermal IR images can improve the performance
of face recognition in uncontrolled illumination conditions. In this paper, we present a new
technique for face recognition based on statistical calculations of thermal images. We propose
the use of moments invariants which become one of the most important shape descriptors. The
proposed feature vector consists of 11 different moments, where three of them are geometric
moments and the rest eight are central geometric moments that offer robustness against
variability due to changes in localized regions of the faces. The new method has been tested on a
new database comprising of images of different expressions, different lightings, and were taken
within different time-lapse. The work is reinforced by a discussion of body and face physiology
behind thermal face recognition
Automatic Enrollment for Gait-based Person Re-identification
Mr. Javier Ortells, Raúl Martí
n-Félez and Ramón A. Mollineda
Universitat Jaume I, Spain
Automatic enrollment involves a critical decision-making process within people re-identification
context. However, this process has been traditionally undervalued. This paper studies the
problem of automatic person enrollment from a realistic perspective relying on gait analysis.
Experiments simulating random flows of people with considerable appearance variations
between different observations of a person have been conducted, modeling both short- and
long-term scenarios. Promising results based on ROC analysis show that automatically enrolling
people by their gait is affordable with high success rates.
- 39 -
Venue: Calla Room
Time: 13:30pm-16:00pm
Multi-Line Grid based graphical password on Recolor, Reshape and Hidden icons
Dr. Ali Shabani, Mazyar Shabani, Noris Ismail, Nur Syarizan Mohd Akbar, Arash Habibi
Limkokwing University of creative technology, Malaysia
Today user authentication stands out as one of the most essential areas in information security
which has several ways of being implemented. From time in memorial authentication schemes
that apply strong text-based passwords have been typically expected to offer some assurance of
security. But committing to memory such strong passwords can prove to be quite a daunting
task thus forcing users to resort to writing them down on pieces of papers or even storing them
onto a computer file. As a means of thwarting such habits, graphical authentication has been
proposed as a replacement for text based authentication. This has been spurred by the fact the
humans have a natural inclination to remember images more easily than text.
CluSiBotHealer: Botnet Detection Through Similarity Analysis of Clusters
Assitant professor Pijush Barthakur, Manoj Dahal, and Mrinal Kanti Ghose
Sikkim Manipal university, India
Botnets are responsible for most of the security threats in the Internet. Botnet attacks often
leverage on their coordinated structures among bots spread over a vast geographical area. In this
paper, we propose CluSiBotHealer, a novel framework for detection of Peer-to-Peer (P2P)
botnets through data mining technique. P2P botnets are more resilient structure of botnets
(re)designed to overcome single point of failure of centralized botnets. Our proposed system is
based on clustering of C&C flows within a monitored network for suspected bots. Leveraging
on similarity of packet structures and flow structures of frequently exchanged C&C flows
within a P2P botnet, our proposed system initially uses clustering of flows and then Jaccard
similarity coefficient on sample sets derived from clusters for accurate detection of bots. Ours
is a very effective and novel framework which can be used for proactive detection of P2P bots
within a monitored network. We empirically validated our model on traces collected from three
different P2P botnets namely Nugache, Waledac and P2P Zeus.
An Innovative UDP Port Scanning Technique
Mr. Sumit Kumar and Sithu D Sudarsan
ABB Global Industries and Services Limited, India
In this paper, we address the challenge of speeding up UDP port scan. We propose a novel
scanning technique to perform significantly faster UDP port scanning when the target machine
is directly connected to port scanner by utilizing multiple IP addresses. Our experiments show
- 40 -
that both Linux and Windows systems could be scanned faster. In particular the speed up in
tested Linux systems using our scanner are about 19000% in comparison with traditional port
scan method.
Mobile Botnets Development: Issues and Solutions
Paolo Farina, Enrico Cambiaso, Dr. Gianluca Papaleo, Maurizio Aiello
CNR-IEIIT, National Research Council, Genoa, Italy
Due to their limited capabilities, mobile devices have rarely been adopted as attack vectors. In
this paper, we consider the execution of coordinated and distributed attacks perpetrated by
mobile devices (mobile botnet). We first describe current botnets architectures, analyzing their
strengths and weaknesses. Then, we identify problems deriving from the development of a
mobile botnet. Appropriate solutions to such problems have been proposed, thus providing an
important resource during design and development stages of a mobile botnet.
Secure Object Stores (SOS): Non-Volatile Memory Architecture for Secure Computing
Mr. Rodel Felipe Miguel, Sivaraman Sundaram, and K. M. M. Aung
Institute for Infocomm Research, Singapore
Next generation non-volatile memory (NVM) technologies will change the design of major
operating system components and how applications will be written because it deviates from the
volatility and capacity assumptions of primary memory in the conventional computer systems
design. A program’s persistent data and run-time objects that contain sensitive information and
without proper security mechanisms in place, it exposes to critical attacks. In this paper, we will
introduce Secure Object Stores (SOS), which is an object-based mechanism to access the
NVM. We also illustrate different use-cases that can take advantage of the SOS.
Complying with Security Requirements in Cloud Storage Systems
Rodrigo Roman, Mr. Rodel Miguel, Jianying Zhou and Eugene Phua
Institute for Infocomm Research, Singapore
One of the necessary steps to ensure the security of cloud storage systems is to adequately
protect the infrastructure itself – the hardware and software that implements the storage
services. Starting from an analysis of the security requirements that affect these storage systems,
this paper studies the different strategies and approaches that are currently used to fulfill such
requirements. The goal of this paper is twofold. Firstly, we aim to analyze the security
components that should be used to provide a basic level of protection to storage systems,
examining the actual technologies that are used to construct them. Secondly, we aim to identify
gaps in the provisioning of security services, highlighting any areas that need of further
- 41 -
A Vision of a Future IoT Architecture Supporting Messaging, Storage, and Computation
Asst. Prof. Van Nguyen, and Audrey Gendreau
Saint Leo University, USA
In the future, things in the Internet of Things should be able to provide solutions for any task
that we ask them to perform. Imagine that we would have hundreds of billions of devices
connected to the Internet, each of these devices possibly has hundreds of gigabytes of storage
and memory, a magnitude of giga instructions per second, and a bandwidth of gigabits per
second. When these devices are connected in order to holistically work together, we would
almost have an unbounded capacity. This paper introduces an architecture that utilizes all the
available resources of the things on the Internet in hope to provide an unprecedented
computational power to benefit society. There are three important features that a thing in this
architecture could support: messaging, storage, and/or computation. With this architecture, a
thing would be capable of discovering, working, and sharing with other things on the Internet.
This architecture is based on the service-oriented concept to provide ease of use and
A New Hardware Implementation of Base 2 Logarithm for FPGA
Ahmed M. Mansour, Mr Ali M. El-Sawy, Moataz S. Aziz, and Ahmed T. Sayed
Wasiela, Egypt
Logarithms reduce products to sums and powers to products; they play an important role in
signal processing, communication and information theory. They are primarily used for hardware
calculations, handling multiplications, divisions, powers, and roots effectively. There are three
commonly used bases for logarithms; the logarithm with base-10 is called the common
logarithm, the natural logarithm with base-e and the binary logarithm with base-2. This paper
demonstrates different methods of calculation for log2 showing the complexity of each and
finds out the most accurate and efficient besides giving insights to their hardware design. We
present a new method called Floor Shift for fast calculation of log2, and then we combine this
algorithm with Taylor series to improve the accuracy of the output, we illustrate that by using
two examples. We finally compare the algorithms and conclude with our remarks
Reconfigurable Planar inverted F Antenna for Cognitive Radio & Mobile Systems
Dr S. Manoj and Ashwin. G. Kothari
Dept. of Electronics Engineering, VNIT, Nagpur, India
This paper proposes a method of controlling the operation of the frequency reconfigurable
planar invert F antenna (FRPIFA antenna) by FPGA. In order to tune the antenna operating
frequency, the operation of PIN diodes – electronic switches are carried out through switches
present in the FPGA board rather than a predefined code, which provides high flexibility for
testing & performance. A very low return loss & hence low VSWR values is obtained. We were
able to tune to frequencies as low as 766 MHz and frequencies as high as 8.4 GHz. It also
covers the ISM band. This can be used in cognitive radio module as a reconfigurable antenna.
- 42 -
Introducing Orthus a Dual-headed Authentication Protocol
Mr Dean Rogers
LJMU, Germany
This document describes a messaging architecture and internal message components of an
authentication protocol which has been called ‘Orthus’. For insecure closed LAN networks
Kerberos is the most popular authentication protocol, and is currently in official release Version
V [1]. Kerberos’ objectives include protecting the privacy of message transfers necessary to
achieve authentication, together with safe-guards against replay and man-in-the-middle, MitM,
attacks. Orthus is intended to operate precisely this environment, here however, the
Authentication Server, instead of delivering a ticket to the Client for use with the Ticket
Granting Server, delivers that ticket directly to the TGS, and the TGS then delivers service
granting tickets directly to the client, offering a simpler message flow, therefore providing fewer
opportunities for message corruption or interception.
Message quantization scheme for nonbinary LDPC decoder FPGA implementation
Dr Wojciech Sulek
Silesian University of Technology, Poland
The non-binary Low Density Parity Check (LDPC) codes over Galois Fields GF(q = 2p) have
evolved from the binary LDPC codes that are today an industry standard for channel coding.
The performance of short block length codes is significantly higher for non-binary LDPC, at the
cost of increased decoding complexity. The efficient decoder hardware implementation is still a
challenging task. Most of the recently proposed hardware realizations are ASIC-oriented as they
employ multiplierless computation units. This article concerns a different decoder design
approach that is specifically intended for an FPGA implementation. The reformulated
mixed-domain FFT-BP decoding algorithm is applied that does not exclude the multiplication
units. This allows mapping a part of the algorithm to the multiplier cores embedded in an
FPGA. In this article we concentrate on the important issue of the proper selection of the
numeric precision employed. We present the partially parallel extension of the mixed-domain
decoder that operates for the structured codes. Then we carefully analyze the finite precision
effects on the decoding performance. By simulation and synthesis results we show that it is
advantageous to segment the decoder dataflow into 3 parts with different precision. The
provided results also facilitate the precision selection for maximum performance or for some
performance-complexity tradeoff.
BER Perfomance Analysis of a New Hybrid Relay Selection Protocol
Dr Khalid Mohamed Alajel
Al-Mergib University, Libya
Recently, cooperative communication have attracted significant attention to tackle the
limitations imposed by multiple-input-multiple-output (MIMO) technology. To eliminate these
limitations and increase spectral efficiency, best relay selection technique was proposed. In this
paper, the performance of a new hybrid relay selection protocol is investigated. In particular,
closed-form expressions for bit error probability (BEP) are developed. The results of the BEP
were presented to demonstrate the proposed system.
- 43 -
Venue: Calla Room
Time: 16:20pm-19:30pm
Discrimination, Classification and Filtering of Ground Echoes in Meteorological Radar Images
Based on Textural Analysis of a Training Dataset of Ground Echoes
Mr Abdenasser Djafri and Boualem Haddad
University of Sciences and Technology – Houari Boumediene, Algiers, Algeria
This paper deals with the processing of backscattered echoes of meteorological radars. Theses
echoes contain information about the nature of precipitations, such as their localization and
intensity. Moreover, these backscattered echoes do not give necessarily always information
about clouds. Because of the electromagnetic wave’s propagation laws and the secondary lobes
of the receiving antennas, the meteorological radars often display echoes backscattered from
the earth surface like mountains and buildings. These echoes are called ground echoes and they
are known by meteorologists as clutter. The aim of our work in this paper is to find a solution
to overcome this problem using a surgical processing of the radar images. The solution we
propose is to use a training dataset of ground echoes images taken in clear weather, perform
calculations on them and compare with precipitation echoes. Through this experience, we
noticed that some criterions such as energy, the location and MaxMinFactor could be retained
in order to distinguish efficiently between the two types of echoes
Pedestrian Detection System Based on HOG and a Modified Version of CSS
Daniel Luis Cosmo, Evandro Ottoni Teatini Salles, Asst. Prof. Patrick Marques Ciarelli
Federal University of Espí
rito Santo, Brazil
This paper describes a complete pedestrian detection system based on sliding windows. Two
feature vector extraction techniques are used: HOG (Histogram of Oriented Gradient) and CSS
(Color Self Similarities), and to classify windows we use linear SVM (Support Vector
Machines). Besides these techniques, we use mean shift and hierarchical clustering, to fuse
multiple overlapping detections. The results we obtain on the dataset INRIA Person shows that
the proposed system, using only HOG descriptors, achieves better results over similar systems,
with a log average miss rate equal to 43%, against 46%, due to the cutting of final detections to
better adapt them to the modified annotations. The addition of the modified CSS increases the
efficiency of the system, leading to a log average miss rate equal to 39%.
Sub-word Image Clustering in Farsi Printed Books
Mr. Mohammad Reza Soheili, Ehsanollah Kabir and Didier Stricker
German Research Center for Artificial Intelligence(DFKI), Germany
Most OCR systems are designed for the recognition of a single page. In case of unfamiliar font
faces, low quality papers and degraded prints, the performance of these products drops sharply.
However, an OCR system can use redundancy of word occurrences in large documents to
- 44 -
improve recognition results. In this paper, we propose a sub-word image clustering method for
the applications dealing with large printed documents. We assume that the whole document is
printed by a unique unknown font with low quality print. Our proposed method finds clusters
of equivalent sub-word images with an incremental algorithm. Due to the low print quality, we
propose an image matching algorithm for measuring the distance between two sub-word
images, based on Hamming distance and the ratio of the area to the perimeter of the connected
components. We built a ground-truth dataset of more than 111000 sub-word images to evaluate
our method. All of these images were extracted from an old Farsi book. We cluster all of these
sub-words, including isolated letters and even punctuation marks. Then all centers of created
clusters are labeled manually. We show that all sub-words of the book can be recognized with
more than 99.7% accuracy by assigning the label of each cluster center to all of its members.
Fusing the RGB Channels of Images for Maximizing the Between Class Distances
Ali GÜNEŞ, Mr. Efkan DURMUŞ, Assistant Prof. Habil KALKAN and Ahmet Seçkin
Suleyman Demirel University, TURKEY
In many machine vision applications, objects or scenes are imaged in color (red, green and
blue) but then transformed into grayscale images before processing. One can use equal weights
for the contribution of the color components to gary scale image or can use the unequal
weights provided by the luminance mapping of the National Television Standards Committee
(NTSC) standard. NTSC weights, which basically enhance the visual properties of the images,
may not perform well for classification purposes. In this study, we propose an adaptive
color-to-grayscale conversion approach which increases the accuracy of the image
classification problems. The method optimizes the contribution of the color components which
increases the between-class distances of the images in opponent classes. It’s observed from the
experimental results that the proposed method increases the distances of the images in classes
between 1% and 87% depending on the dataset which results increases in classification
accuracies between 1% and 4% on benchmark classifiers.
Classification of similar but differently paced activities in the KTH dataset
Ms. Shreeya Sengupta, Hui Wang, Piyush Ojha and William Blackburn
University of Ulster,Northern Ireland, United Kingdom
The KTH video dataset [1] contains three activities -- walking, jogging and running -- which
are very similar but are carried out at a different natural pace. We show that explicit inclusion
of a feature which may be interpreted as a measure of the overall state of motion in a frame
improves a classifier's ability to discriminate between these activities.
An approach for combining multiple descriptors for image classification
Mr. Duc Toan TRAN, Bart JANSEN, Rudi DEKLERCK, Olivier DEBEIR
VUB, Belgium
Recently, efficient image descriptors have shown promise for image classification tasks.
Moreover, methods based on the combination of multiple image features provide better
- 45 -
performance compared to methods based on a single feature. This work presents a simple and
efficient approach for combining multiple image descriptors. We first employ a Naive-Bayes
Nearest-Neighbor scheme to evaluate four widely used descriptors. For all features,
“Image-to-Class” distances are directly computed without descriptor quantization. Since
distances measured by different metrics can be of different nature and they may not be on the
same numerical scale, a normalization step is essential to transform these distances into a
common domain prior to combining them. Our experiments conducted on a challenging
database indicate that z-score normalization followed by a simple sum of distances fusion
technique can significantly improve the performance compared to applications in which
individual features are used. It was also observed that our experimental results on the Caltech
101 dataset outperform other previous results.
Vision-based industrial Automatic Vehicle Classifier
Mr. Timur Khanipov, Ivan Koptelov, Anton Grigoryev, Elena Kuznetsova, Dmitriy
IITP RAS, Russia
The paper describes the automatic motor vehicle video stream based classification system.
The system determines vehicle type at payment collection plazas on toll roads. Classification is
performed in accordance with a preconfigured set of rules which determine type by number of
wheel axles, vehicle length, height over the first axle and full height. These characteristics are
calculated using various computer vision algorithms: contour detectors, correlational analysis,
fast Hough transform, Viola-Jones detectors, connected components analysis, elliptic shapes
detectors and others. Input data contains video streams and induction loop signals. Output
signals are vehicle enter and exit events, vehicle type, motion direction, speed and the above
mentioned features.
An Iterative Undersampling of Extremely Imbalanced Data Using CSVM
Mr. Jong Bum Lee, Jee Hyong Lee
Semiconductor Division, Samsung Electronics College of Information & Communication
Engineering, Sungkunkwan University, Suwon, Korea
Semiconductor is a major component of electronic devices and is required very high reliability
and productivity. If defective chip predict in advance, the product quality will be improved and
productivity will increases by reduction of test cost. However, the performance of the
classifiers about defective chips is very poor due to semiconductor data is extremely
imbalance, as roughly 1:1000. In this paper, the iterative under sampling method using CSVM
is employed to deal with the class imbalanced. The main idea is to select the informative
majority class samples around the decision boundary determined by classify. Our experimental
results are reported to demonstrate that our method outperforms the other sampling methods in
regard with the accuracy of defective chip in highly imbalanced data.
Comparison of two algorithms modifications of projective-invariant recognition of the plane
boundaries with the one concavity
Natalia Pritula, Dmitrii P. Nikolaev, Mr. Alexander Sheshkus, Mikhail Pritula, Petr P.
- 46 -
ISA RAS, Russian Federation
In this paper we present two algorithms modifications of projective-recognition of the plane
boundaries with one concavity. The input images are created with orthographic pinhole camera
with a fixed focal length. Thus variety of the possible projective transformations is limited. The
first modification considers the task more generally, the other uses prior information about
camera model. A hypothesis that the second modification has better accuracy is being checked.
Results of around 20000 numeral experiments that confirm the hypothesis are included.
Automatic Emotional Expression Analysis from Eye Area
Ms. Betül AKKOÇ and Ahmet ARSLAN
Selçuk University,TURKEY
Eyes play an important role in expressing emotions in nonverbal communication. In the
present study, emotional expression classification was performed based on the features that
were automatically extracted from the eye area. Fırst, the face area and the eye area were
automatically extracted from the captured image. Afterwards, the parameters to be used for the
analysis through discrete wavelet transformation were obtained from the eye area. Using these
parameters, emotional expression analysis was performed through artificial intelligence
techniques. As the result of the experimental studies, 6 universal emotions consisting of
expressions of happiness, sadness, surprise, disgust, anger and fear were classified at a success
rate of 84% using artificial neural networks.
Memory-efficient Large-scale Linear Support Vector Machine
Mr. Abdullah Alrajeh, Akiko Takeda and Mahesan Niranjan
University of Southampton, United Kingdom
Stochastic gradient descent has been advanced as a computationally e_cient method for
large-scale problems. In classi_cation problems, many proposed linear support vector machines
are very e_ective. However, they assume that the data is already in memory which might be
not always the case. Recent work suggests a classical method that divides such a problem into
smaller blocks then solves the sub-problems iteratively. We show that a simple modi_cation of
shrinking the dataset early will produce signi_cant saving in computation and memory. We
further _nd that on problems larger than previously considered, our approach is able to reach
solutions on top-end desktop machines while competing methods cannot.
Search-Free License Plate Localization Based On Saliency And Local Variance Estimation
Mr. AMIN SAFAEI, H.L. Tang and S. Sanei
Faculty of Engineering and Physical Sciences, University of Surrey, UK
In recent years, the performance and accuracy of automatic license plate number recognition
(ALPR) systems have greatly improved, however the increasing number of applications for
such systems have made ALPR research more challenging than ever. The inherent
computational complexity of search dependent algorithms remains a major problem for current
- 47 -
ALPR systems. This paper proposes a novel search-free method of localization based on the
estimation of saliency and local variance. Gabor functions are then used to validate the choice
of candidate license plate. The algorithm was applied to three image datasets with different
levels of complexity and the results compared with a number of benchmark methods,
particularly in terms of speed. The proposed method outperforms the state of the art methods
and can be used for real time applications.
- 48 -
A Robust SIFT-Based Descriptor for Video Classification
MS. Raziyeh Salarifard, Mahshid Alsadat Hosseini, Mahmood Karimian and Shohreh Kasaei
Sharif University of Technology,Iran
Voluminous amount of videos in today’s world has made the subject of objective (or
semi-objective) classification of videos to be very popular. Among the various descriptors used
for video classification, SIFT and LIFT can lead to highly accurate classifiers. But, SIFT
descriptor does not consider video motion and LIFT is time-consuming. In this paper, a robust
descriptor for semi-supervised classification based on video content is proposed. It holds the
benefits of LIFT and SIFT descriptors and overcomes their shortcomings to some extent. For
extracting this descriptor, the SIFT descriptor is first used and the motion of the extracted
keypoints are then employed to improve the accuracy of the subsequent classification stage. As
SIFT descriptor is scale invariant, the proposed method is also robust toward zooming. Also,
using the global motion of keypoints in videos helps to neglect the local motions caused during
video capturing by the cameraman. In comparison to other works that consider the motion and
mobility of videos, the proposed descriptor requires less computations. Obtained results on the
TRECVIT 2006 dataset show that the proposed method achieves more accurate results in
comparison with SIFT in content-based video classifications by about 15 percent.
Unified Framework of Single-frame Face Super-resolution across Multiple Modalities
Xiang Ma, Junhui Liu, Mr. Wenmin Li
School of Information Engineering, Chang’an University, Xi’an, China
Face hallucination in a single modality environment has been heavily studied, in real-world
environments under multiple modalities is still in its early stage. This paper presents a unified
framework to solve face hallucination problem across multiple modalities i.e. different
expressions, poses, illuminations. Almost all of the state-of-the-art face super-resolution
methods only generate a single output with the same modality of the low-resolution input. Our
proposed framework is able to generate multiple outputs of different new modalities from only
a single low-resolution input. It includes a global transformation with diagonal loading for
modeling the mappings among different new facial modalities, and a local position-patch based
method with weights compensation for incorporating image details. Experimental results
illustrate the superiority of our framework.
Error Analysis of Rigid Body Posture Measurement System Based on Circular Feature Points
Ju Huo, Dr. Jiashan Cui, Ning Yang
Department of Electrical Engineering, Harbin Institute of Technology, Harbin 150001, China
For monocular vision pose parameters determine the problem, feature-based target feature
points on the plane quadrilateral, an improved two-stage iterative algorithm is proposed to
improve the optimization of rigid body posture measurement calculating model. Monocular
vision rigid body posture measurement system is designed; experimentally in each coordinate
- 49 -
system determined coordinate a unified method to unify the each feature point measure
coordinates; theoretical analysis sources of error from rigid body posture measurement system
simulation experiments. Combined with the actual experimental analysis system under the
condition of simulation error of pose accuracy of measurement, gives the comprehensive error
of measurement system, for improving measurement precision of certain theoretical guiding
A novel Palmprint Representations for Palmprint recognition
Hengjian Li , Jiwen Dong , Jinping Li, Lei Wang
Shandong Provincial Key Laboratory of Network based Intelligent Computing, University of
Jinan; Shandong Provincial Key Laboratory of computer Network, Shandong Computer
Science Center, China
In this paper, we propose a novel palmprint recognition algorithm. Firstly, the palmprint images
are represented by the anisotropic filter. The filters are built on Gaussian functions along one
direction, and on second derivative of Gaussian functions in the orthogonal direction. Also, this
choice is motivated by the optimal joint spatial and frequency localization of the
Gaussian kernel. Therefore,they can better approximate the edge or line of palmprint images. A
palmprint image is processed with a bank of anisotropic filters at different scales and rotations
for robust palmprint features extraction. Once these features are extracted, subspace analysis is
then applied to the feature vectors for dimension reduction as well as class separability.
Experimental results on a public palmprint database show that the accuracy could be
improved by the proposed novel representations, compared with Gabor.
Autonomous Landing of an helicopter UAV with a Ground-Based multisensory fusion System
Dr. Zhou Dianle, Zhong Zhiwei, Zhang Daibing, Shen Lincheng, Yan Chengping
College of Mechatronics and Automation National University of Defense Technology,
Changsha, China
In this study, this paper focus on the vision-based autonomous helicopter unmanned aerial
vehicle (UAV) landing problems. This paper proposed a multisensory fusion to autonomous
landing of an UAV. The systems include an infrared camera, an Ultra-wideband radar that
measure distance between UAV and Ground-Based system, an PAN-Tilt Unit(PTU). In order to
identify all weather UAV targets, we use infrared cameras. To reduce the complexity of the
stereovision or one-cameral calculating the target of three-dimensional coordinates, using the
ultra-wideband radar distance module provides visual depth information, real-time Image-PTU
tracking UAV and calculate the UAV three-dimensional coordinates. Compared to the DGPS,
the test results show that the paper is effectiveness and robustness.
- 50 -
Mr. Ahmed olusegun isaac
Onyis Nigeria Limited, Nigeria
Mr. Amin Zehtabian
Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran
Mr. Denis Uchaev
Moscow State University of Geodesy and Cartography,Russia
Assoc. Prof. Francesco Viti
University of Luxembourg, Luxembourg
National Institute of Advanced Industrial Science and Technology (AIST),JAPAN
Assoc. Prof. Dmitry P. Nikolaev
Institute for Information Transmission Problems (Kharkevich Institute) RAS, Russian
Vassili Postnikov
Dr. Takumi Kobayashi
National Institute of Advanced Industrial Science and Technology, Japan
- 51 -

Similar documents


Report this document