October 21, 2019

3025 words 15 mins read

Paper Group AWR 107

CVABS: Moving Object Segmentation with Common Vector Approach for Videos. Unsupervised Learning of GMM with a Uniform Background Component. NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification. Triad-based Neural Network for Coreference Resolution. Music Transformer. FastNet. Inferring network …

CVABS: Moving Object Segmentation with Common Vector Approach for Videos


Title	CVABS: Moving Object Segmentation with Common Vector Approach for Videos
Authors	Şahin Işık, Kemal Özkan, Ömer Nezih Gerek
Abstract	Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism.
Tasks	Semantic Segmentation
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08412v1
PDF	http://arxiv.org/pdf/1810.08412v1.pdf
PWC	https://paperswithcode.com/paper/cvabs-moving-object-segmentation-with-common
Repo	https://github.com/isahhin/cvabs
Framework	none

Unsupervised Learning of GMM with a Uniform Background Component


Title	Unsupervised Learning of GMM with a Uniform Background Component
Authors	Sida Liu, Adrian Barbu
Abstract	Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes that data comes from a mixture of a number of Gaussians as well as a uniform ``background’’ component assumed to contain outliers and other non-interesting observations. We develop a novel method based on robust loss minimization that performs well in clustering such GMM with a uniform background. We give theoretical guarantees for our clustering algorithm to obtain best clustering results with high probability. Besides, we show that the result of our algorithm does not depend on initialization or local optima, and the parameter tuning is an easy task. By numeric simulations, we demonstrate that our algorithm enjoys high accuracy and achieves the best clustering results given a large enough sample size. Finally, experimental comparisons with typical clustering methods on real datasets witness the potential of our algorithm in real applications. \|
Tasks
Published	2018-04-08
URL	https://arxiv.org/abs/1804.02744v4
PDF	https://arxiv.org/pdf/1804.02744v4.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-mixture-models-with
Repo	https://github.com/newstar1993/CRLM
Framework	none

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification


Title	NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification
Authors	Rongcheng Lin, Jing Xiao, Jianping Fan
Abstract	This paper introduces a fast and efficient network architecture, NeXtVLAD, to aggregate frame-level features into a compact feature vector for large-scale video classification. Briefly speaking, the basic idea is to decompose a high-dimensional feature into a group of relatively low-dimensional vectors with attention before applying NetVLAD aggregation over time. This NeXtVLAD approach turns out to be both effective and parameter efficient in aggregating temporal information. In the 2nd Youtube-8M video understanding challenge, a single NeXtVLAD model with less than 80M parameters achieves a GAP score of 0.87846 in private leaderboard. A mixture of 3 NeXtVLAD models results in 0.88722, which is ranked 3rd over 394 teams. The code is publicly available at https://github.com/linrongc/youtube-8m.
Tasks	Video Classification, Video Understanding
Published	2018-11-12
URL	http://arxiv.org/abs/1811.05014v1
PDF	http://arxiv.org/pdf/1811.05014v1.pdf
PWC	https://paperswithcode.com/paper/nextvlad-an-efficient-neural-network-to
Repo	https://github.com/linrongc/youtube-8m
Framework	tf

Triad-based Neural Network for Coreference Resolution


Title	Triad-based Neural Network for Coreference Resolution
Authors	Yuanliang Meng, Anna Rumshisky
Abstract	We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution. The system simultaneously accepts three mentions as input, taking mutual dependency and logical constraints of all three mentions into account, and thus makes more accurate predictions than the traditional pairwise approach. Depending on system choices, the affinity scores can be further used in clustering or mention ranking. Our experiments show that a standard hierarchical clustering using the scores produces state-of-art results with gold mentions on the English portion of CoNLL 2012 Shared Task. The model does not rely on many handcrafted features and is easy to train and use. The triads can also be easily extended to polyads of higher orders. To our knowledge, this is the first neural network system to model mutual dependency of more than two members at mention level.
Tasks	Coreference Resolution
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06491v1
PDF	http://arxiv.org/pdf/1809.06491v1.pdf
PWC	https://paperswithcode.com/paper/triad-based-neural-network-for-coreference
Repo	https://github.com/text-machine-lab/entity-coref
Framework	tf

Music Transformer


Title	Music Transformer
Authors	Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
Abstract	Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.
Tasks	Music Modeling
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04281v3
PDF	http://arxiv.org/pdf/1809.04281v3.pdf
PWC	https://paperswithcode.com/paper/music-transformer
Repo	https://github.com/scpark20/Music-GPT-2
Framework	tf

FastNet


Title	FastNet
Authors	John Olafenwa, Moses Olafenwa
Abstract	Inception and the Resnet family of Convolutional Neural Network archi-tectures have broken records in the past few years, but recent state of the art models have also incurred very high computational cost in terms of training, inference and model size. Making the deployment of these models on Edge devices, impractical. In light of this, we present a new novel architecture that is designed for high computational efficiency on both GPUs and CPUs, and is highly suited for deployment on Mobile Applications, Smart Cameras, Iot devices and controllers as well as low cost drones. Our architecture boasts competitive accuracies on standard Datasets even out-performing the original Resnet. We present below the motivation for this research, the architecture of the network, single test accuracies on CIFAR 10 and CIFAR 100 , a detailed comparison with other well-known architectures and link to an implementation in Keras.
Tasks
Published	2018-01-17
URL	http://arxiv.org/abs/1802.02186v1
PDF	http://arxiv.org/pdf/1802.02186v1.pdf
PWC	https://paperswithcode.com/paper/fastnet
Repo	https://github.com/johnolafenwa/FastNet
Framework	tf

Inferring network connectivity from event timing patterns


Title	Inferring network connectivity from event timing patterns
Authors	Jose Casadiego, Dimitra Maoutsa, Marc Timme
Abstract	Reconstructing network connectivity from the collective dynamics of a system typically requires access to its complete continuous-time evolution although these are often experimentally inaccessible. Here we propose a theory for revealing physical connectivity of networked systems only from the event time series their intrinsic collective dynamics generate. Representing the patterns of event timings in an event space spanned by inter-event and cross-event intervals, we reveal which other units directly influence the inter-event times of any given unit. For illustration, we linearize an event space mapping constructed from the spiking patterns in model neural circuits to reveal the presence or absence of synapses between any pair of neurons as well as whether the coupling acts in an inhibiting or activating (excitatory) manner. The proposed model-independent reconstruction theory is scalable to larger networks and may thus play an important role in the reconstruction of networks from biology to social science and engineering.
Tasks	Time Series
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09974v2
PDF	http://arxiv.org/pdf/1803.09974v2.pdf
PWC	https://paperswithcode.com/paper/inferring-network-connectivity-from-event
Repo	https://github.com/networkinference/ESL
Framework	none

Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification


Title	Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification
Authors	Vivek Kumar Singh, Santiago Romani, Hatem A. Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Jordina Torrents Barrena, Saddam Abdulwahab, Adel Saleh, Miguel Arquez, Meritxell Arenas, Domenec Puig
Abstract	This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be similar to the ground truth. Experiments performed on dozens of malignant tumors extracted from the public DDSM dataset and from our in-house private dataset confirm our hypothesis with very high Dice coefficient and Jaccard index (>94% and >89%, respectively) outperforming the scores obtained by other state-of-the-art approaches. Furthermore, in order to detect portray significant morphological features of the segmented tumor, a specific Convolutional Neural Network (CNN) have also been designed for classifying the segmented tumor areas into four types (irregular, lobular, oval and round), which provides an overall accuracy about 72% with the DDSM dataset.
Tasks
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10207v2
PDF	http://arxiv.org/pdf/1805.10207v2.pdf
PWC	https://paperswithcode.com/paper/conditional-generative-adversarial-and
Repo	https://github.com/ankit-ai/GAN_breast_mammography_segmentation
Framework	tf

ADVIO: An authentic dataset for visual-inertial odometry


Title	ADVIO: An authentic dataset for visual-inertial odometry
Authors	Santiago Cortés, Arno Solin, Esa Rahtu, Juho Kannala
Abstract	The lack of realistic and open benchmarking datasets for pedestrian visual-inertial odometry has made it hard to pinpoint differences in published methods. Existing datasets either lack a full six degree-of-freedom ground-truth or are limited to small spaces with optical tracking systems. We take advantage of advances in pure inertial navigation, and develop a set of versatile and challenging real-world computer vision benchmark sets for visual-inertial odometry. For this purpose, we have built a test rig equipped with an iPhone, a Google Pixel Android phone, and a Google Tango device. We provide a wide range of raw sensor data that is accessible on almost any modern-day smartphone together with a high-quality ground-truth track. We also compare resulting visual-inertial tracks from Google Tango, ARCore, and Apple ARKit with two recent methods published in academic forums. The data sets cover both indoor and outdoor cases, with stairs, escalators, elevators, office environments, a shopping mall, and metro station.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09828v1
PDF	http://arxiv.org/pdf/1807.09828v1.pdf
PWC	https://paperswithcode.com/paper/advio-an-authentic-dataset-for-visual
Repo	https://github.com/AaltoVision/ADVIO
Framework	none

Unsupervised Neural Text Simplification


Title	Unsupervised Neural Text Simplification
Authors	Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan
Abstract	The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is composed of a shared encoder and a pair of attentional-decoders and gains knowledge of simplification through discrimination based-losses and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows that the proposed model can perform text-simplification at both lexical and syntactic levels, competitive to existing supervised methods. Addition of a few labelled pairs also improves the performance further.
Tasks	Denoising, Text Simplification
Published	2018-10-18
URL	https://arxiv.org/abs/1810.07931v6
PDF	https://arxiv.org/pdf/1810.07931v6.pdf
PWC	https://paperswithcode.com/paper/unsupervised-neural-text-simplification
Repo	https://github.com/subramanyamdvss/UnsupNTS
Framework	pytorch

Deformable ConvNets v2: More Deformable, Better Results


Title	Deformable ConvNets v2: More Deformable, Better Results
Authors	Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
Abstract	The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.
Tasks	Instance Segmentation, Object Detection, Semantic Segmentation
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11168v2
PDF	http://arxiv.org/pdf/1811.11168v2.pdf
PWC	https://paperswithcode.com/paper/deformable-convnets-v2-more-deformable-better
Repo	https://github.com/qilei123/DeformableConvV2
Framework	mxnet

Collapse of Deep and Narrow Neural Nets


Title	Collapse of Deep and Narrow Neural Nets
Authors	Lu Lu, Yanhui Su, George Em Karniadakis
Abstract	Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks (NNs) will converge to erroneous mean or median states of the target function depending on the loss with high probability. Deep and narrow NNs are encountered in solving partial differential equations with high-order derivatives. We demonstrate this collapse of such NNs both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region for designing NNs that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem. Asymmetric initializations may reduce the probability of collapse but do not totally eliminate it.
Tasks
Published	2018-08-15
URL	http://arxiv.org/abs/1808.04947v2
PDF	http://arxiv.org/pdf/1808.04947v2.pdf
PWC	https://paperswithcode.com/paper/collapse-of-deep-and-narrow-neural-nets
Repo	https://github.com/ericpts/vae-res
Framework	tf

Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow


Title	Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow
Authors	Qiao Zheng, Hervé Delingette, Nicholas Ayache
Abstract	We propose a method to classify cardiac pathology based on a novel approach to extract image derived features to characterize the shape and motion of the heart. An original semi-supervised learning procedure, which makes efficient use of a large amount of non-segmented images and a small amount of images segmented manually by experts, is developed to generate pixel-wise apparent flow between two time points of a 2D+t cine MRI image sequence. Combining the apparent flow maps and cardiac segmentation masks, we obtain a local apparent flow corresponding to the 2D motion of myocardium and ventricular cavities. This leads to the generation of time series of the radius and thickness of myocardial segments to represent cardiac motion. These time series of motion features are reliable and explainable characteristics of pathological cardiac motion. Furthermore, they are combined with shape-related features to classify cardiac pathologies. Using only nine feature values as input, we propose an explainable, simple and flexible model for pathology classification. On ACDC training set and testing set, the model achieves 95% and 94% respectively as classification accuracy. Its performance is hence comparable to that of the state-of-the-art. Comparison with various other models is performed to outline some advantages of our model.
Tasks	Cardiac Segmentation, Time Series
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03433v2
PDF	http://arxiv.org/pdf/1811.03433v2.pdf
PWC	https://paperswithcode.com/paper/explainable-cardiac-pathology-classification
Repo	https://github.com/julien-zheng/CardiacMotionFlow
Framework	tf

Single Shot Scene Text Retrieval


Title	Single Shot Scene Text Retrieval
Authors	Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas
Abstract	Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed.
Tasks	Image Retrieval, Scene Understanding
Published	2018-08-27
URL	http://arxiv.org/abs/1808.09044v1
PDF	http://arxiv.org/pdf/1808.09044v1.pdf
PWC	https://paperswithcode.com/paper/single-shot-scene-text-retrieval
Repo	https://github.com/lluisgomez/single-shot-str
Framework	tf

BOLD5000: A public fMRI dataset of 5000 images


Title	BOLD5000: A public fMRI dataset of 5000 images
Authors	Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff
Abstract	Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that integrate neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enable fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science - the intertwined study of biological and computer vision.
Tasks	Scene Understanding
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01281v1
PDF	http://arxiv.org/pdf/1809.01281v1.pdf
PWC	https://paperswithcode.com/paper/bold5000-a-public-fmri-dataset-of-5000-images
Repo	https://github.com/nchang430/BOLD5000-Scripts
Framework	none