May 6, 2019

3347 words 16 mins read

Paper Group ANR 386

Hierarchical Manifold Clustering on Diffusion Maps for Connectomics (MIT 18.S096 final project). Learning to Track at 100 FPS with Deep Regression Networks. Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages. Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networ …

Hierarchical Manifold Clustering on Diffusion Maps for Connectomics (MIT 18.S096 final project)


Title	Hierarchical Manifold Clustering on Diffusion Maps for Connectomics (MIT 18.S096 final project)
Authors	Gergely Odor
Abstract	In this paper, we introduce a novel algorithm for segmentation of imperfect boundary probability maps (BPM) in connectomics. Our algorithm can be a considered as an extension of spectral clustering. Instead of clustering the diffusion maps with traditional clustering algorithms, we learn the manifold and compute an estimate of the minimum normalized cut. We proceed by divide and conquer. We also introduce a novel criterion for determining if further splits are necessary in a component based on it’s topological properties. Our algorithm complements the currently popular agglomeration approaches in connectomics, which overlook the geometrical aspects of this segmentation problem.
Tasks
Published	2016-07-20
URL	http://arxiv.org/abs/1607.06318v1
PDF	http://arxiv.org/pdf/1607.06318v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-manifold-clustering-on-diffusion
Repo
Framework

Learning to Track at 100 FPS with Deep Regression Networks


Title	Learning to Track at 100 FPS with Deep Regression Networks
Authors	David Held, Sebastian Thrun, Silvio Savarese
Abstract	Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker’s state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker is the first neural-network tracker that learns to track generic objects at 100 fps.
Tasks
Published	2016-04-06
URL	http://arxiv.org/abs/1604.01802v2
PDF	http://arxiv.org/pdf/1604.01802v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-track-at-100-fps-with-deep
Repo
Framework

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages


Title	Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages
Authors	Krupakar Hans, R S Milton
Abstract	The advent of the attention mechanism in neural machine translation models has improved the performance of machine translation systems by enabling selective lookup into the source sentence. In this paper, the efficiencies of translation using bidirectional encoder attention decoder models were studied with respect to translation involving morphologically rich languages. The English - Tamil language pair was selected for this analysis. First, the use of Word2Vec embedding for both the English and Tamil words improved the translation results by 0.73 BLEU points over the baseline RNNSearch model with 4.84 BLEU score. The use of morphological segmentation before word vectorization to split the morphologically rich Tamil words into their respective morphemes before the translation, caused a reduction in the target vocabulary size by a factor of 8. Also, this model (RNNMorph) improved the performance of neural machine translation by 7.05 BLEU points over the RNNSearch model used over the same corpus. Since the BLEU evaluation of the RNNMorph model might be unreliable due to an increase in the number of matching tokens per sentence, the performances of the translations were also compared by means of human evaluation metrics of adequacy, fluency and relative ranking. Further, the use of morphological segmentation also improved the efficacy of the attention mechanism.
Tasks	Machine Translation
Published	2016-12-07
URL	http://arxiv.org/abs/1612.02482v2
PDF	http://arxiv.org/pdf/1612.02482v2.pdf
PWC	https://paperswithcode.com/paper/improving-the-performance-of-neural-machine
Repo
Framework

Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches


Title	Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches
Authors	Erik Rodner, Marcel Simon, Robert B. Fisher, Joachim Denzler
Abstract	In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) how can we predict CNN sensitivity?; and (3) can we increase the robustness of CNNs with respect to image degradations? To answer the first question, we provide an extensive empirical sensitivity analysis of commonly used CNN architectures (AlexNet, VGG19, GoogleNet) across various types of image degradations. This allows for predicting CNN performance for new domains comprised by images of lower quality or captured from a different viewpoint. We also show how the sensitivity of CNN outputs can be predicted for single images. Furthermore, we demonstrate that input layer dropout or pre-filtering during test time only reduces CNN sensitivity for high levels of degradation. Experiments for fine-grained recognition tasks reveal that VGG19 is more robust to severe image degradations than AlexNet and GoogleNet. However, small intensity noise can lead to dramatic changes in CNN performance even for VGG19.
Tasks
Published	2016-10-21
URL	http://arxiv.org/abs/1610.06756v1
PDF	http://arxiv.org/pdf/1610.06756v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-recognition-in-the-noisy-wild
Repo
Framework

Predicting Enemy’s Actions Improves Commander Decision-Making


Title	Predicting Enemy’s Actions Improves Commander Decision-Making
Authors	Michael Ownby, Alexander Kott
Abstract	The Defense Advanced Research Projects Agency (DARPA) Real-time Adversarial Intelligence and Decision-making (RAID) program is investigating the feasibility of “reading the mind of the enemy” - to estimate and anticipate, in real-time, the enemy’s likely goals, deceptions, actions, movements and positions. This program focuses specifically on urban battles at echelons of battalion and below. The RAID program leverages approximate game-theoretic and deception-sensitive algorithms to provide real-time enemy estimates to a tactical commander. A key hypothesis of the program is that these predictions and recommendations will make the commander more effective, i.e. he should be able to achieve his operational goals safer, faster, and more efficiently. Realistic experimentation and evaluation drive the development process using human-in-the-loop wargames to compare humans and the RAID system. Two experiments were conducted in 2005 as part of Phase I to determine if the RAID software could make predictions and recommendations as effectively and accurately as a 4-person experienced staff. This report discusses the intriguing and encouraging results of these first two experiments conducted by the RAID program. It also provides details about the experiment environment and methodology that were used to demonstrate and prove the research goals.
Tasks	Decision Making
Published	2016-07-22
URL	http://arxiv.org/abs/1607.06759v1
PDF	http://arxiv.org/pdf/1607.06759v1.pdf
PWC	https://paperswithcode.com/paper/predicting-enemys-actions-improves-commander
Repo
Framework

Automated Word Prediction in Bangla Language Using Stochastic Language Models


Title	Automated Word Prediction in Bangla Language Using Stochastic Language Models
Authors	Md. Masudul Haque, Md. Tarek Habib, Md. Mokhlesur Rahman
Abstract	Word completion and word prediction are two important phenomena in typing that benefit users who type using keyboard or other similar devices. They can have profound impact on the typing of disable people. Our work is based on word prediction on Bangla sentence by using stochastic, i.e. N-gram language model such as unigram, bigram, trigram, deleted Interpolation and backoff models for auto completing a sentence by predicting a correct word in a sentence which saves time and keystrokes of typing and also reduces misspelling. We use large data corpus of Bangla language of different word types to predict correct word with the accuracy as much as possible. We have found promising results. We hope that our work will impact on the baseline for automated Bangla typing.
Tasks	Language Modelling
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07803v1
PDF	http://arxiv.org/pdf/1602.07803v1.pdf
PWC	https://paperswithcode.com/paper/automated-word-prediction-in-bangla-language
Repo
Framework

Laplacian LRR on Product Grassmann Manifolds for Human Activity Clustering in Multi-Camera Video Surveillance


Title	Laplacian LRR on Product Grassmann Manifolds for Human Activity Clustering in Multi-Camera Video Surveillance
Authors	Boyue Wang, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin
Abstract	In multi-camera video surveillance, it is challenging to represent videos from different cameras properly and fuse them efficiently for specific applications such as human activity recognition and clustering. In this paper, a novel representation for multi-camera video data, namely the Product Grassmann Manifold (PGM), is proposed to model video sequences as points on the Grassmann manifold and integrate them as a whole in the product manifold form. Additionally, with a new geometry metric on the product manifold, the conventional Low Rank Representation (LRR) model is extended onto PGM and the new LRR model can be used for clustering non-linear data, such as multi-camera video data. To evaluate the proposed method, a number of clustering experiments are conducted on several multi-camera video datasets of human activity, including Dongzhimen Transport Hub Crowd action dataset, ACT 42 Human action dataset and SKIG action dataset. The experiment results show that the proposed method outperforms many state-of-the-art clustering methods.
Tasks	Activity Recognition, Human Activity Recognition
Published	2016-06-13
URL	http://arxiv.org/abs/1606.03838v1
PDF	http://arxiv.org/pdf/1606.03838v1.pdf
PWC	https://paperswithcode.com/paper/laplacian-lrr-on-product-grassmann-manifolds
Repo
Framework

Survey on the attention based RNN model and its applications in computer vision


Title	Survey on the attention based RNN model and its applications in computer vision
Authors	Feng Wang, David M. J. Tax
Abstract	The recurrent neural networks (RNN) can be used to solve the sequence to sequence problem, where both the input and the output have sequential structures. Usually there are some implicit relations between the structures. However, it is hard for the common RNN model to fully explore the relations between the sequences. In this survey, we introduce some attention based RNN models which can focus on different parts of the input for each output item, in order to explore and take advantage of the implicit relations between the input and the output items. The different attention mechanisms are described in detail. We then introduce some applications in computer vision which apply the attention based RNN models. The superiority of the attention based RNN model is shown by the experimental results. At last some future research directions are given.
Tasks
Published	2016-01-25
URL	http://arxiv.org/abs/1601.06823v1
PDF	http://arxiv.org/pdf/1601.06823v1.pdf
PWC	https://paperswithcode.com/paper/survey-on-the-attention-based-rnn-model-and
Repo
Framework

Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization


Title	Novel Word Embedding and Translation-based Language Modeling for Extractive Speech Summarization
Authors	Kuan-Yu Chen, Shih-Hung Liu, Berlin Chen, Hsin-Min Wang, Hsin-Hsi Chen
Abstract	Word embedding methods revolve around learning continuous distributed vector representations of words with neural networks, which can capture semantic and/or syntactic cues, and in turn be used to induce similarity measures among words, sentences and documents in context. Celebrated methods can be categorized as prediction-based and count-based methods according to the training objectives and model architectures. Their pros and cons have been extensively analyzed and evaluated in recent studies, but there is relatively less work continuing the line of research to develop an enhanced learning method that brings together the advantages of the two model families. In addition, the interpretation of the learned word representations still remains somewhat opaque. Motivated by the observations and considering the pressing need, this paper presents a novel method for learning the word representations, which not only inherits the advantages of classic word embedding methods but also offers a clearer and more rigorous interpretation of the learned word representations. Built upon the proposed word embedding method, we further formulate a translation-based language modeling framework for the extractive speech summarization task. A series of empirical evaluations demonstrate the effectiveness of the proposed word representation learning and language modeling techniques in extractive speech summarization.
Tasks	Language Modelling, Representation Learning
Published	2016-07-22
URL	http://arxiv.org/abs/1607.06532v1
PDF	http://arxiv.org/pdf/1607.06532v1.pdf
PWC	https://paperswithcode.com/paper/novel-word-embedding-and-translation-based
Repo
Framework

Correlated-PCA: Principal Components’ Analysis when Data and Noise are Correlated


Title	Correlated-PCA: Principal Components’ Analysis when Data and Noise are Correlated
Authors	Namrata Vaswani, Han Guo
Abstract	Given a matrix of observed data, Principal Components Analysis (PCA) computes a small number of orthogonal directions that contain most of its variability. Provably accurate solutions for PCA have been in use for a long time. However, to the best of our knowledge, all existing theoretical guarantees for it assume that the data and the corrupting noise are mutually independent, or at least uncorrelated. This is valid in practice often, but not always. In this paper, we study the PCA problem in the setting where the data and noise can be correlated. Such noise is often also referred to as “data-dependent noise”. We obtain a correctness result for the standard eigenvalue decomposition (EVD) based solution to PCA under simple assumptions on the data-noise correlation. We also develop and analyze a generalization of EVD, cluster-EVD, that improves upon EVD in certain regimes.
Tasks
Published	2016-08-15
URL	http://arxiv.org/abs/1608.04320v2
PDF	http://arxiv.org/pdf/1608.04320v2.pdf
PWC	https://paperswithcode.com/paper/correlated-pca-principal-components-analysis
Repo
Framework

Recurrent neural network models for disease name recognition using domain invariant features


Title	Recurrent neural network models for disease name recognition using domain invariant features
Authors	Sunil Kumar Sahu, Ashish Anand
Abstract	Hand-crafted features based on linguistic and domain-knowledge play crucial role in determining the performance of disease name recognition systems. Such methods are further limited by the scope of these features or in other words, their ability to cover the contexts or word dependencies within a sentence. In this work, we focus on reducing such dependencies and propose a domain-invariant framework for the disease name recognition task. In particular, we propose various end-to-end recurrent neural network (RNN) models for the tasks of disease name recognition and their classification into four pre-defined categories. We also utilize convolution neural network (CNN) in cascade of RNN to get character-based embedded features and employ it with word-embedded features in our model. We compare our models with the state-of-the-art results for the two tasks on NCBI disease dataset. Our results for the disease mention recognition task indicate that state-of-the-art performance can be obtained without relying on feature engineering. Further the proposed models obtained improved performance on the classification task of disease names.
Tasks	Feature Engineering
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09371v1
PDF	http://arxiv.org/pdf/1606.09371v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-models-for-disease
Repo
Framework


Title	Design and implementation of image processing system for Lumen social robot-humanoid as an exhibition guide for Electrical Engineering Days 2015
Authors	Setyaki Sholata Sya, Ary Setijadi Prihatmanto
Abstract	Lumen Social Robot is a humanoid robot development with the purpose that it could be a good friend to all people. In this year, the Lumen Social Robot is being developed into a guide in the exhibition and in the seminar of the Final Exam of undergraduate and graduate students in Electrical Engineering ITB, named Electrical Engineering Days 2015. In order to be the guide in that occasion, Lumen is supported by several things. They are Nao robot components, servers, and multiple processor systems. The image processing system is a processing application system that allows Lumen to recognize and determine an object from the image taken from the camera eye. The image processing system is provided with four modules. They are face detection module to detect a person’s face, face recognition module to recognize a person’s face, face tracking module to follow a person’s face, and human detection module to detect humans based on the upper parts of person’s body. Face detection module and human detection module are implemented by using the library harcascade.xml on EMGU CV. Face recognition module is implemented by adding the database for the face that has been detected and store it in that database. Face tracking module is implemented by using the Smooth Gaussian filter to the image. —– Lumen Sosial Robot merupakan sebuah pengembangan robot humanoid agar dapat menjadi teman bagi banyak orang. Sistem pengolahan citra merupakan sistem aplikasi pengolah yang bertujuan Lumen dapat mengenali dan mengetahui suatu objek pada citra yang diambil dari camera mata Lumen. System pengolahan citra dilengkapi dengan empat buah modul, yaitu modul face detection untuk mendeteksi wajah seseorang, modul face recognition untuk mengenali wajah orang tersebut, modul face tracking untuk mengikuti wajah seseorang, dan modul human detection untuk mendeteksi manusia berdasarkan bagian tubuh atas orang
Tasks	Face Detection, Face Recognition, Human Detection
Published	2016-07-16
URL	http://arxiv.org/abs/1607.04760v1
PDF	http://arxiv.org/pdf/1607.04760v1.pdf
PWC	https://paperswithcode.com/paper/design-and-implementation-of-image-processing
Repo
Framework

The Learning and Prediction of Application-level Traffic Data in Cellular Networks


Title	The Learning and Prediction of Application-level Traffic Data in Cellular Networks
Authors	Rongpeng Li, Zhifeng Zhao, Jianchao Zheng, Chengli Mei, Yueming Cai, Honggang Zhang
Abstract	Traffic learning and prediction is at the heart of the evaluation of the performance of telecommunications networks and attracts a lot of attention in wired broadband networks. Now, benefiting from the big data in cellular networks, it becomes possible to make the analyses one step further into the application level. In this paper, we firstly collect a significant amount of application-level traffic data from cellular network operators. Afterwards, with the aid of the traffic “big data”, we make a comprehensive study over the modeling and prediction framework of cellular network traffic. Our results solidly demonstrate that there universally exist some traffic statistical modeling characteristics, including ALPHA-stable modeled property in the temporal domain and the sparsity in the spatial domain. Meanwhile, the results also demonstrate the distinctions originated from the uniqueness of different service types of applications. Furthermore, we propose a new traffic prediction framework to encompass and explore these aforementioned characteristics and then develop a dictionary learning-based alternating direction method to solve it. Besides, we validate the prediction accuracy improvement and the robustness of the proposed framework through extensive simulation results.
Tasks	Dictionary Learning, Traffic Prediction
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04778v2
PDF	http://arxiv.org/pdf/1606.04778v2.pdf
PWC	https://paperswithcode.com/paper/the-learning-and-prediction-of-application
Repo
Framework

Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks


Title	Unconstrained Still/Video-Based Face Verification with Deep Convolutional Neural Networks
Authors	Jun-Cheng Chen, Rajeev Ranjan, Swami Sankaranarayanan, Amit Kumar, Ching-Hui Chen, Vishal M. Patel, Carlos D. Castillo, Rama Chellappa
Abstract	Over the last five years, methods based on Deep Convolutional Neural Networks (DCNNs) have shown impressive performance improvements for object detection and recognition problems. This has been made possible due to the availability of large annotated datasets, a better understanding of the non-linear mapping between input images and class labels as well as the affordability of GPUs. In this paper, we present the design details of a deep learning system for unconstrained face recognition, including modules for face detection, association, alignment and face verification. The quantitative performance evaluation is conducted using the IARPA Janus Benchmark A (IJB-A), the JANUS Challenge Set 2 (JANUS CS2), and the LFW dataset. The IJB-A dataset includes real-world unconstrained faces of 500 subjects with significant pose and illumination variations which are much harder than the Labeled Faces in the Wild (LFW) and Youtube Face (YTF) datasets. JANUS CS2 is the extended version of IJB-A which contains not only all the images/frames of IJB-A but also includes the original videos for evaluating the video-based face verification system. Some open issues regarding DCNNs for face verification problems are then discussed.
Tasks	Face Detection, Face Recognition, Face Verification, Object Detection
Published	2016-05-09
URL	http://arxiv.org/abs/1605.02686v3
PDF	http://arxiv.org/pdf/1605.02686v3.pdf
PWC	https://paperswithcode.com/paper/unconstrained-stillvideo-based-face
Repo
Framework

Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification


Title	Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification
Authors	Shu Kong, Surangi Punyasena, Charless Fowlkes
Abstract	We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology. We introduce a novel criteria for selecting meaningful and discriminative exemplar patches. We optimize this function during training using a greedy submodular function optimization framework that gives a near-optimal solution with bounded approximation error. We use these selected exemplars as a dictionary basis and propose a spatially-aware sparse coding method to match testing images for identification while maintaining global shape correspondence. To accelerate the coding process for fast matching, we introduce a relaxed form that uses spatially-aware soft-thresholding during coding. Finally, we carry out an experimental study that demonstrates the effectiveness and efficiency of our exemplar selection and classification mechanisms, achieving $86.13%$ accuracy on a difficult fine-grained species classification task distinguishing three types of fossil spruce pollen.
Tasks	Dictionary Learning
Published	2016-05-03
URL	http://arxiv.org/abs/1605.00775v1
PDF	http://arxiv.org/pdf/1605.00775v1.pdf
PWC	https://paperswithcode.com/paper/spatially-aware-dictionary-learning-and
Repo
Framework