October 21, 2019

3025 words 15 mins read

Paper Group AWR 107

Paper Group AWR 107

CVABS: Moving Object Segmentation with Common Vector Approach for Videos. Unsupervised Learning of GMM with a Uniform Background Component. NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification. Triad-based Neural Network for Coreference Resolution. Music Transformer. FastNet. Inferring network …

CVABS: Moving Object Segmentation with Common Vector Approach for Videos

Title CVABS: Moving Object Segmentation with Common Vector Approach for Videos
Authors Şahin Işık, Kemal Özkan, Ömer Nezih Gerek
Abstract Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism.
Tasks Semantic Segmentation
Published 2018-10-19
URL http://arxiv.org/abs/1810.08412v1
PDF http://arxiv.org/pdf/1810.08412v1.pdf
PWC https://paperswithcode.com/paper/cvabs-moving-object-segmentation-with-common
Repo https://github.com/isahhin/cvabs
Framework none

Unsupervised Learning of GMM with a Uniform Background Component

Title Unsupervised Learning of GMM with a Uniform Background Component
Authors Sida Liu, Adrian Barbu
Abstract Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes that data comes from a mixture of a number of Gaussians as well as a uniform ``background’’ component assumed to contain outliers and other non-interesting observations. We develop a novel method based on robust loss minimization that performs well in clustering such GMM with a uniform background. We give theoretical guarantees for our clustering algorithm to obtain best clustering results with high probability. Besides, we show that the result of our algorithm does not depend on initialization or local optima, and the parameter tuning is an easy task. By numeric simulations, we demonstrate that our algorithm enjoys high accuracy and achieves the best clustering results given a large enough sample size. Finally, experimental comparisons with typical clustering methods on real datasets witness the potential of our algorithm in real applications. |
Tasks
Published 2018-04-08
URL https://arxiv.org/abs/1804.02744v4
PDF https://arxiv.org/pdf/1804.02744v4.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-mixture-models-with
Repo https://github.com/newstar1993/CRLM
Framework none

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

Title NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification
Authors Rongcheng Lin, Jing Xiao, Jianping Fan
Abstract This paper introduces a fast and efficient network architecture, NeXtVLAD, to aggregate frame-level features into a compact feature vector for large-scale video classification. Briefly speaking, the basic idea is to decompose a high-dimensional feature into a group of relatively low-dimensional vectors with attention before applying NetVLAD aggregation over time. This NeXtVLAD approach turns out to be both effective and parameter efficient in aggregating temporal information. In the 2nd Youtube-8M video understanding challenge, a single NeXtVLAD model with less than 80M parameters achieves a GAP score of 0.87846 in private leaderboard. A mixture of 3 NeXtVLAD models results in 0.88722, which is ranked 3rd over 394 teams. The code is publicly available at https://github.com/linrongc/youtube-8m.
Tasks Video Classification, Video Understanding
Published 2018-11-12
URL http://arxiv.org/abs/1811.05014v1
PDF http://arxiv.org/pdf/1811.05014v1.pdf
PWC https://paperswithcode.com/paper/nextvlad-an-efficient-neural-network-to
Repo https://github.com/linrongc/youtube-8m
Framework tf

Triad-based Neural Network for Coreference Resolution

Title Triad-based Neural Network for Coreference Resolution
Authors Yuanliang Meng, Anna Rumshisky
Abstract We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution. The system simultaneously accepts three mentions as input, taking mutual dependency and logical constraints of all three mentions into account, and thus makes more accurate predictions than the traditional pairwise approach. Depending on system choices, the affinity scores can be further used in clustering or mention ranking. Our experiments show that a standard hierarchical clustering using the scores produces state-of-art results with gold mentions on the English portion of CoNLL 2012 Shared Task. The model does not rely on many handcrafted features and is easy to train and use. The triads can also be easily extended to polyads of higher orders. To our knowledge, this is the first neural network system to model mutual dependency of more than two members at mention level.
Tasks Coreference Resolution
Published 2018-09-18
URL http://arxiv.org/abs/1809.06491v1
PDF http://arxiv.org/pdf/1809.06491v1.pdf
PWC https://paperswithcode.com/paper/triad-based-neural-network-for-coreference
Repo https://github.com/text-machine-lab/entity-coref
Framework tf

Music Transformer

Title Music Transformer
Authors Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck
Abstract Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.
Tasks Music Modeling
Published 2018-09-12
URL http://arxiv.org/abs/1809.04281v3
PDF http://arxiv.org/pdf/1809.04281v3.pdf
PWC https://paperswithcode.com/paper/music-transformer
Repo https://github.com/scpark20/Music-GPT-2
Framework tf

FastNet

Title FastNet
Authors John Olafenwa, Moses Olafenwa
Abstract Inception and the Resnet family of Convolutional Neural Network archi-tectures have broken records in the past few years, but recent state of the art models have also incurred very high computational cost in terms of training, inference and model size. Making the deployment of these models on Edge devices, impractical. In light of this, we present a new novel architecture that is designed for high computational efficiency on both GPUs and CPUs, and is highly suited for deployment on Mobile Applications, Smart Cameras, Iot devices and controllers as well as low cost drones. Our architecture boasts competitive accuracies on standard Datasets even out-performing the original Resnet. We present below the motivation for this research, the architecture of the network, single test accuracies on CIFAR 10 and CIFAR 100 , a detailed comparison with other well-known architectures and link to an implementation in Keras.
Tasks
Published 2018-01-17
URL http://arxiv.org/abs/1802.02186v1
PDF http://arxiv.org/pdf/1802.02186v1.pdf
PWC https://paperswithcode.com/paper/fastnet
Repo https://github.com/johnolafenwa/FastNet
Framework tf

Inferring network connectivity from event timing patterns

Title Inferring network connectivity from event timing patterns
Authors Jose Casadiego, Dimitra Maoutsa, Marc Timme
Abstract Reconstructing network connectivity from the collective dynamics of a system typically requires access to its complete continuous-time evolution although these are often experimentally inaccessible. Here we propose a theory for revealing physical connectivity of networked systems only from the event time series their intrinsic collective dynamics generate. Representing the patterns of event timings in an event space spanned by inter-event and cross-event intervals, we reveal which other units directly influence the inter-event times of any given unit. For illustration, we linearize an event space mapping constructed from the spiking patterns in model neural circuits to reveal the presence or absence of synapses between any pair of neurons as well as whether the coupling acts in an inhibiting or activating (excitatory) manner. The proposed model-independent reconstruction theory is scalable to larger networks and may thus play an important role in the reconstruction of networks from biology to social science and engineering.
Tasks Time Series
Published 2018-03-27
URL http://arxiv.org/abs/1803.09974v2
PDF http://arxiv.org/pdf/1803.09974v2.pdf
PWC https://paperswithcode.com/paper/inferring-network-connectivity-from-event
Repo https://github.com/networkinference/ESL
Framework none

Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification

Title Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification
Authors Vivek Kumar Singh, Santiago Romani, Hatem A. Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Jordina Torrents Barrena, Saddam Abdulwahab, Adel Saleh, Miguel Arquez, Meritxell Arenas, Domenec Puig
Abstract This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be similar to the ground truth. Experiments performed on dozens of malignant tumors extracted from the public DDSM dataset and from our in-house private dataset confirm our hypothesis with very high Dice coefficient and Jaccard index (>94% and >89%, respectively) outperforming the scores obtained by other state-of-the-art approaches. Furthermore, in order to detect portray significant morphological features of the segmented tumor, a specific Convolutional Neural Network (CNN) have also been designed for classifying the segmented tumor areas into four types (irregular, lobular, oval and round), which provides an overall accuracy about 72% with the DDSM dataset.
Tasks
Published 2018-05-25
URL http://arxiv.org/abs/1805.10207v2
PDF http://arxiv.org/pdf/1805.10207v2.pdf
PWC https://paperswithcode.com/paper/conditional-generative-adversarial-and
Repo https://github.com/ankit-ai/GAN_breast_mammography_segmentation
Framework tf

ADVIO: An authentic dataset for visual-inertial odometry

Title ADVIO: An authentic dataset for visual-inertial odometry
Authors Santiago Cortés, Arno Solin, Esa Rahtu, Juho Kannala
Abstract The lack of realistic and open benchmarking datasets for pedestrian visual-inertial odometry has made it hard to pinpoint differences in published methods. Existing datasets either lack a full six degree-of-freedom ground-truth or are limited to small spaces with optical tracking systems. We take advantage of advances in pure inertial navigation, and develop a set of versatile and challenging real-world computer vision benchmark sets for visual-inertial odometry. For this purpose, we have built a test rig equipped with an iPhone, a Google Pixel Android phone, and a Google Tango device. We provide a wide range of raw sensor data that is accessible on almost any modern-day smartphone together with a high-quality ground-truth track. We also compare resulting visual-inertial tracks from Google Tango, ARCore, and Apple ARKit with two recent methods published in academic forums. The data sets cover both indoor and outdoor cases, with stairs, escalators, elevators, office environments, a shopping mall, and metro station.
Tasks
Published 2018-07-25
URL http://arxiv.org/abs/1807.09828v1
PDF http://arxiv.org/pdf/1807.09828v1.pdf
PWC https://paperswithcode.com/paper/advio-an-authentic-dataset-for-visual
Repo https://github.com/AaltoVision/ADVIO
Framework none

Unsupervised Neural Text Simplification

Title Unsupervised Neural Text Simplification
Authors Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan
Abstract The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is composed of a shared encoder and a pair of attentional-decoders and gains knowledge of simplification through discrimination based-losses and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows that the proposed model can perform text-simplification at both lexical and syntactic levels, competitive to existing supervised methods. Addition of a few labelled pairs also improves the performance further.
Tasks Denoising, Text Simplification
Published 2018-10-18
URL https://arxiv.org/abs/1810.07931v6
PDF https://arxiv.org/pdf/1810.07931v6.pdf
PWC https://paperswithcode.com/paper/unsupervised-neural-text-simplification
Repo https://github.com/subramanyamdvss/UnsupNTS
Framework pytorch

Deformable ConvNets v2: More Deformable, Better Results

Title Deformable ConvNets v2: More Deformable, Better Results
Authors Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
Abstract The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation.
Tasks Instance Segmentation, Object Detection, Semantic Segmentation
Published 2018-11-27
URL http://arxiv.org/abs/1811.11168v2
PDF http://arxiv.org/pdf/1811.11168v2.pdf
PWC https://paperswithcode.com/paper/deformable-convnets-v2-more-deformable-better
Repo https://github.com/qilei123/DeformableConvV2
Framework mxnet

Collapse of Deep and Narrow Neural Nets

Title Collapse of Deep and Narrow Neural Nets
Authors Lu Lu, Yanhui Su, George Em Karniadakis
Abstract Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks (NNs) will converge to erroneous mean or median states of the target function depending on the loss with high probability. Deep and narrow NNs are encountered in solving partial differential equations with high-order derivatives. We demonstrate this collapse of such NNs both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region for designing NNs that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem. Asymmetric initializations may reduce the probability of collapse but do not totally eliminate it.
Tasks
Published 2018-08-15
URL http://arxiv.org/abs/1808.04947v2
PDF http://arxiv.org/pdf/1808.04947v2.pdf
PWC https://paperswithcode.com/paper/collapse-of-deep-and-narrow-neural-nets
Repo https://github.com/ericpts/vae-res
Framework tf

Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow

Title Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow
Authors Qiao Zheng, Hervé Delingette, Nicholas Ayache
Abstract We propose a method to classify cardiac pathology based on a novel approach to extract image derived features to characterize the shape and motion of the heart. An original semi-supervised learning procedure, which makes efficient use of a large amount of non-segmented images and a small amount of images segmented manually by experts, is developed to generate pixel-wise apparent flow between two time points of a 2D+t cine MRI image sequence. Combining the apparent flow maps and cardiac segmentation masks, we obtain a local apparent flow corresponding to the 2D motion of myocardium and ventricular cavities. This leads to the generation of time series of the radius and thickness of myocardial segments to represent cardiac motion. These time series of motion features are reliable and explainable characteristics of pathological cardiac motion. Furthermore, they are combined with shape-related features to classify cardiac pathologies. Using only nine feature values as input, we propose an explainable, simple and flexible model for pathology classification. On ACDC training set and testing set, the model achieves 95% and 94% respectively as classification accuracy. Its performance is hence comparable to that of the state-of-the-art. Comparison with various other models is performed to outline some advantages of our model.
Tasks Cardiac Segmentation, Time Series
Published 2018-11-08
URL http://arxiv.org/abs/1811.03433v2
PDF http://arxiv.org/pdf/1811.03433v2.pdf
PWC https://paperswithcode.com/paper/explainable-cardiac-pathology-classification
Repo https://github.com/julien-zheng/CardiacMotionFlow
Framework tf

Single Shot Scene Text Retrieval

Title Single Shot Scene Text Retrieval
Authors Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas
Abstract Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed.
Tasks Image Retrieval, Scene Understanding
Published 2018-08-27
URL http://arxiv.org/abs/1808.09044v1
PDF http://arxiv.org/pdf/1808.09044v1.pdf
PWC https://paperswithcode.com/paper/single-shot-scene-text-retrieval
Repo https://github.com/lluisgomez/single-shot-str
Framework tf

BOLD5000: A public fMRI dataset of 5000 images

Title BOLD5000: A public fMRI dataset of 5000 images
Authors Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff
Abstract Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that integrate neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enable fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science - the intertwined study of biological and computer vision.
Tasks Scene Understanding
Published 2018-09-05
URL http://arxiv.org/abs/1809.01281v1
PDF http://arxiv.org/pdf/1809.01281v1.pdf
PWC https://paperswithcode.com/paper/bold5000-a-public-fmri-dataset-of-5000-images
Repo https://github.com/nchang430/BOLD5000-Scripts
Framework none
comments powered by Disqus