Paper Group AWR 107
CVABS: Moving Object Segmentation with Common Vector Approach for Videos. Unsupervised Learning of GMM with a Uniform Background Component. NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification. Triad-based Neural Network for Coreference Resolution. Music Transformer. FastNet. Inferring network …
CVABS: Moving Object Segmentation with Common Vector Approach for Videos
Title | CVABS: Moving Object Segmentation with Common Vector Approach for Videos |
Authors | Şahin Işık, Kemal Özkan, Ömer Nezih Gerek |
Abstract | Background modelling is a fundamental step for several real-time computer vision applications that requires security systems and monitoring. An accurate background model helps detecting activity of moving objects in the video. In this work, we have developed a new subspace based background modelling algorithm using the concept of Common Vector Approach with Gram-Schmidt orthogonalization. Once the background model that involves the common characteristic of different views corresponding to the same scene is acquired, a smart foreground detection and background updating procedure is applied based on dynamic control parameters. A variety of experiments is conducted on different problem types related to dynamic backgrounds. Several types of metrics are utilized as objective measures and the obtained visual results are judged subjectively. It was observed that the proposed method stands successfully for all problem types reported on CDNet2014 dataset by updating the background frames with a self-learning feedback mechanism. |
Tasks | Semantic Segmentation |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08412v1 |
http://arxiv.org/pdf/1810.08412v1.pdf | |
PWC | https://paperswithcode.com/paper/cvabs-moving-object-segmentation-with-common |
Repo | https://github.com/isahhin/cvabs |
Framework | none |
Unsupervised Learning of GMM with a Uniform Background Component
Title | Unsupervised Learning of GMM with a Uniform Background Component |
Authors | Sida Liu, Adrian Barbu |
Abstract | Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes that data comes from a mixture of a number of Gaussians as well as a uniform ``background’’ component assumed to contain outliers and other non-interesting observations. We develop a novel method based on robust loss minimization that performs well in clustering such GMM with a uniform background. We give theoretical guarantees for our clustering algorithm to obtain best clustering results with high probability. Besides, we show that the result of our algorithm does not depend on initialization or local optima, and the parameter tuning is an easy task. By numeric simulations, we demonstrate that our algorithm enjoys high accuracy and achieves the best clustering results given a large enough sample size. Finally, experimental comparisons with typical clustering methods on real datasets witness the potential of our algorithm in real applications. | |
Tasks | |
Published | 2018-04-08 |
URL | https://arxiv.org/abs/1804.02744v4 |
https://arxiv.org/pdf/1804.02744v4.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-mixture-models-with |
Repo | https://github.com/newstar1993/CRLM |
Framework | none |
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification
Title | NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification |
Authors | Rongcheng Lin, Jing Xiao, Jianping Fan |
Abstract | This paper introduces a fast and efficient network architecture, NeXtVLAD, to aggregate frame-level features into a compact feature vector for large-scale video classification. Briefly speaking, the basic idea is to decompose a high-dimensional feature into a group of relatively low-dimensional vectors with attention before applying NetVLAD aggregation over time. This NeXtVLAD approach turns out to be both effective and parameter efficient in aggregating temporal information. In the 2nd Youtube-8M video understanding challenge, a single NeXtVLAD model with less than 80M parameters achieves a GAP score of 0.87846 in private leaderboard. A mixture of 3 NeXtVLAD models results in 0.88722, which is ranked 3rd over 394 teams. The code is publicly available at https://github.com/linrongc/youtube-8m. |
Tasks | Video Classification, Video Understanding |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.05014v1 |
http://arxiv.org/pdf/1811.05014v1.pdf | |
PWC | https://paperswithcode.com/paper/nextvlad-an-efficient-neural-network-to |
Repo | https://github.com/linrongc/youtube-8m |
Framework | tf |
Triad-based Neural Network for Coreference Resolution
Title | Triad-based Neural Network for Coreference Resolution |
Authors | Yuanliang Meng, Anna Rumshisky |
Abstract | We propose a triad-based neural network system that generates affinity scores between entity mentions for coreference resolution. The system simultaneously accepts three mentions as input, taking mutual dependency and logical constraints of all three mentions into account, and thus makes more accurate predictions than the traditional pairwise approach. Depending on system choices, the affinity scores can be further used in clustering or mention ranking. Our experiments show that a standard hierarchical clustering using the scores produces state-of-art results with gold mentions on the English portion of CoNLL 2012 Shared Task. The model does not rely on many handcrafted features and is easy to train and use. The triads can also be easily extended to polyads of higher orders. To our knowledge, this is the first neural network system to model mutual dependency of more than two members at mention level. |
Tasks | Coreference Resolution |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06491v1 |
http://arxiv.org/pdf/1809.06491v1.pdf | |
PWC | https://paperswithcode.com/paper/triad-based-neural-network-for-coreference |
Repo | https://github.com/text-machine-lab/entity-coref |
Framework | tf |
Music Transformer
Title | Music Transformer |
Authors | Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Ian Simon, Curtis Hawthorne, Andrew M. Dai, Matthew D. Hoffman, Monica Dinculescu, Douglas Eck |
Abstract | Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter. |
Tasks | Music Modeling |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04281v3 |
http://arxiv.org/pdf/1809.04281v3.pdf | |
PWC | https://paperswithcode.com/paper/music-transformer |
Repo | https://github.com/scpark20/Music-GPT-2 |
Framework | tf |
FastNet
Title | FastNet |
Authors | John Olafenwa, Moses Olafenwa |
Abstract | Inception and the Resnet family of Convolutional Neural Network archi-tectures have broken records in the past few years, but recent state of the art models have also incurred very high computational cost in terms of training, inference and model size. Making the deployment of these models on Edge devices, impractical. In light of this, we present a new novel architecture that is designed for high computational efficiency on both GPUs and CPUs, and is highly suited for deployment on Mobile Applications, Smart Cameras, Iot devices and controllers as well as low cost drones. Our architecture boasts competitive accuracies on standard Datasets even out-performing the original Resnet. We present below the motivation for this research, the architecture of the network, single test accuracies on CIFAR 10 and CIFAR 100 , a detailed comparison with other well-known architectures and link to an implementation in Keras. |
Tasks | |
Published | 2018-01-17 |
URL | http://arxiv.org/abs/1802.02186v1 |
http://arxiv.org/pdf/1802.02186v1.pdf | |
PWC | https://paperswithcode.com/paper/fastnet |
Repo | https://github.com/johnolafenwa/FastNet |
Framework | tf |
Inferring network connectivity from event timing patterns
Title | Inferring network connectivity from event timing patterns |
Authors | Jose Casadiego, Dimitra Maoutsa, Marc Timme |
Abstract | Reconstructing network connectivity from the collective dynamics of a system typically requires access to its complete continuous-time evolution although these are often experimentally inaccessible. Here we propose a theory for revealing physical connectivity of networked systems only from the event time series their intrinsic collective dynamics generate. Representing the patterns of event timings in an event space spanned by inter-event and cross-event intervals, we reveal which other units directly influence the inter-event times of any given unit. For illustration, we linearize an event space mapping constructed from the spiking patterns in model neural circuits to reveal the presence or absence of synapses between any pair of neurons as well as whether the coupling acts in an inhibiting or activating (excitatory) manner. The proposed model-independent reconstruction theory is scalable to larger networks and may thus play an important role in the reconstruction of networks from biology to social science and engineering. |
Tasks | Time Series |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.09974v2 |
http://arxiv.org/pdf/1803.09974v2.pdf | |
PWC | https://paperswithcode.com/paper/inferring-network-connectivity-from-event |
Repo | https://github.com/networkinference/ESL |
Framework | none |
Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification
Title | Conditional Generative Adversarial and Convolutional Networks for X-ray Breast Mass Segmentation and Shape Classification |
Authors | Vivek Kumar Singh, Santiago Romani, Hatem A. Rashwan, Farhan Akram, Nidhi Pandey, Md. Mostafa Kamal Sarker, Jordina Torrents Barrena, Saddam Abdulwahab, Adel Saleh, Miguel Arquez, Meritxell Arenas, Domenec Puig |
Abstract | This paper proposes a novel approach based on conditional Generative Adversarial Networks (cGAN) for breast mass segmentation in mammography. We hypothesized that the cGAN structure is well-suited to accurately outline the mass area, especially when the training data is limited. The generative network learns intrinsic features of tumors while the adversarial network enforces segmentations to be similar to the ground truth. Experiments performed on dozens of malignant tumors extracted from the public DDSM dataset and from our in-house private dataset confirm our hypothesis with very high Dice coefficient and Jaccard index (>94% and >89%, respectively) outperforming the scores obtained by other state-of-the-art approaches. Furthermore, in order to detect portray significant morphological features of the segmented tumor, a specific Convolutional Neural Network (CNN) have also been designed for classifying the segmented tumor areas into four types (irregular, lobular, oval and round), which provides an overall accuracy about 72% with the DDSM dataset. |
Tasks | |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10207v2 |
http://arxiv.org/pdf/1805.10207v2.pdf | |
PWC | https://paperswithcode.com/paper/conditional-generative-adversarial-and |
Repo | https://github.com/ankit-ai/GAN_breast_mammography_segmentation |
Framework | tf |
ADVIO: An authentic dataset for visual-inertial odometry
Title | ADVIO: An authentic dataset for visual-inertial odometry |
Authors | Santiago Cortés, Arno Solin, Esa Rahtu, Juho Kannala |
Abstract | The lack of realistic and open benchmarking datasets for pedestrian visual-inertial odometry has made it hard to pinpoint differences in published methods. Existing datasets either lack a full six degree-of-freedom ground-truth or are limited to small spaces with optical tracking systems. We take advantage of advances in pure inertial navigation, and develop a set of versatile and challenging real-world computer vision benchmark sets for visual-inertial odometry. For this purpose, we have built a test rig equipped with an iPhone, a Google Pixel Android phone, and a Google Tango device. We provide a wide range of raw sensor data that is accessible on almost any modern-day smartphone together with a high-quality ground-truth track. We also compare resulting visual-inertial tracks from Google Tango, ARCore, and Apple ARKit with two recent methods published in academic forums. The data sets cover both indoor and outdoor cases, with stairs, escalators, elevators, office environments, a shopping mall, and metro station. |
Tasks | |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09828v1 |
http://arxiv.org/pdf/1807.09828v1.pdf | |
PWC | https://paperswithcode.com/paper/advio-an-authentic-dataset-for-visual |
Repo | https://github.com/AaltoVision/ADVIO |
Framework | none |
Unsupervised Neural Text Simplification
Title | Unsupervised Neural Text Simplification |
Authors | Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan |
Abstract | The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is composed of a shared encoder and a pair of attentional-decoders and gains knowledge of simplification through discrimination based-losses and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows that the proposed model can perform text-simplification at both lexical and syntactic levels, competitive to existing supervised methods. Addition of a few labelled pairs also improves the performance further. |
Tasks | Denoising, Text Simplification |
Published | 2018-10-18 |
URL | https://arxiv.org/abs/1810.07931v6 |
https://arxiv.org/pdf/1810.07931v6.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-neural-text-simplification |
Repo | https://github.com/subramanyamdvss/UnsupNTS |
Framework | pytorch |
Deformable ConvNets v2: More Deformable, Better Results
Title | Deformable ConvNets v2: More Deformable, Better Results |
Authors | Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai |
Abstract | The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects. Through an examination of its adaptive behavior, we observe that while the spatial support for its neural features conforms more closely than regular ConvNets to object structure, this support may nevertheless extend well beyond the region of interest, causing features to be influenced by irrelevant image content. To address this problem, we present a reformulation of Deformable ConvNets that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training. The modeling power is enhanced through a more comprehensive integration of deformable convolution within the network, and by introducing a modulation mechanism that expands the scope of deformation modeling. To effectively harness this enriched modeling capability, we guide network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features. With the proposed contributions, this new version of Deformable ConvNets yields significant performance gains over the original model and produces leading results on the COCO benchmark for object detection and instance segmentation. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.11168v2 |
http://arxiv.org/pdf/1811.11168v2.pdf | |
PWC | https://paperswithcode.com/paper/deformable-convnets-v2-more-deformable-better |
Repo | https://github.com/qilei123/DeformableConvV2 |
Framework | mxnet |
Collapse of Deep and Narrow Neural Nets
Title | Collapse of Deep and Narrow Neural Nets |
Authors | Lu Lu, Yanhui Su, George Em Karniadakis |
Abstract | Recent theoretical work has demonstrated that deep neural networks have superior performance over shallow networks, but their training is more difficult, e.g., they suffer from the vanishing gradient problem. This problem can be typically resolved by the rectified linear unit (ReLU) activation. However, here we show that even for such activation, deep and narrow neural networks (NNs) will converge to erroneous mean or median states of the target function depending on the loss with high probability. Deep and narrow NNs are encountered in solving partial differential equations with high-order derivatives. We demonstrate this collapse of such NNs both numerically and theoretically, and provide estimates of the probability of collapse. We also construct a diagram of a safe region for designing NNs that avoid the collapse to erroneous states. Finally, we examine different ways of initialization and normalization that may avoid the collapse problem. Asymmetric initializations may reduce the probability of collapse but do not totally eliminate it. |
Tasks | |
Published | 2018-08-15 |
URL | http://arxiv.org/abs/1808.04947v2 |
http://arxiv.org/pdf/1808.04947v2.pdf | |
PWC | https://paperswithcode.com/paper/collapse-of-deep-and-narrow-neural-nets |
Repo | https://github.com/ericpts/vae-res |
Framework | tf |
Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow
Title | Explainable cardiac pathology classification on cine MRI with motion characterization by semi-supervised learning of apparent flow |
Authors | Qiao Zheng, Hervé Delingette, Nicholas Ayache |
Abstract | We propose a method to classify cardiac pathology based on a novel approach to extract image derived features to characterize the shape and motion of the heart. An original semi-supervised learning procedure, which makes efficient use of a large amount of non-segmented images and a small amount of images segmented manually by experts, is developed to generate pixel-wise apparent flow between two time points of a 2D+t cine MRI image sequence. Combining the apparent flow maps and cardiac segmentation masks, we obtain a local apparent flow corresponding to the 2D motion of myocardium and ventricular cavities. This leads to the generation of time series of the radius and thickness of myocardial segments to represent cardiac motion. These time series of motion features are reliable and explainable characteristics of pathological cardiac motion. Furthermore, they are combined with shape-related features to classify cardiac pathologies. Using only nine feature values as input, we propose an explainable, simple and flexible model for pathology classification. On ACDC training set and testing set, the model achieves 95% and 94% respectively as classification accuracy. Its performance is hence comparable to that of the state-of-the-art. Comparison with various other models is performed to outline some advantages of our model. |
Tasks | Cardiac Segmentation, Time Series |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03433v2 |
http://arxiv.org/pdf/1811.03433v2.pdf | |
PWC | https://paperswithcode.com/paper/explainable-cardiac-pathology-classification |
Repo | https://github.com/julien-zheng/CardiacMotionFlow |
Framework | tf |
Single Shot Scene Text Retrieval
Title | Single Shot Scene Text Retrieval |
Authors | Lluís Gómez, Andrés Mafla, Marçal Rusiñol, Dimosthenis Karatzas |
Abstract | Textual information found in scene images provides high level semantic information about the image and its context and it can be leveraged for better scene understanding. In this paper we address the problem of scene text retrieval: given a text query, the system must return all images containing the queried text. The novelty of the proposed model consists in the usage of a single shot CNN architecture that predicts at the same time bounding boxes and a compact text representation of the words in them. In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database. Our experiments demonstrate that the proposed architecture outperforms previous state-of-the-art while it offers a significant increase in processing speed. |
Tasks | Image Retrieval, Scene Understanding |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.09044v1 |
http://arxiv.org/pdf/1808.09044v1.pdf | |
PWC | https://paperswithcode.com/paper/single-shot-scene-text-retrieval |
Repo | https://github.com/lluisgomez/single-shot-str |
Framework | tf |
BOLD5000: A public fMRI dataset of 5000 images
Title | BOLD5000: A public fMRI dataset of 5000 images |
Authors | Nadine Chang, John A. Pyles, Abhinav Gupta, Michael J. Tarr, Elissa M. Aminoff |
Abstract | Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that integrate neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enable fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science - the intertwined study of biological and computer vision. |
Tasks | Scene Understanding |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01281v1 |
http://arxiv.org/pdf/1809.01281v1.pdf | |
PWC | https://paperswithcode.com/paper/bold5000-a-public-fmri-dataset-of-5000-images |
Repo | https://github.com/nchang430/BOLD5000-Scripts |
Framework | none |