February 1, 2020

3045 words 15 mins read

Paper Group AWR 91

Unsupervised Label Noise Modeling and Loss Correction. On improving deep learning generalization with adaptive sparse connectivity. Deep Density-aware Count Regressor. Generative Adversarial Networks for text using word2vec intermediaries. Learning to Optimize Multigrid PDE Solvers. Attention model for articulatory features detection. Graph Embedde …

Unsupervised Label Noise Modeling and Loss Correction


Title	Unsupervised Label Noise Modeling and Loss Correction
Authors	Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, Kevin McGuinness
Abstract	Despite being robust to small amounts of label noise, convolutional neural networks trained with stochastic gradient methods have been shown to easily fit random labels. When there are a mixture of correct and mislabelled targets, networks tend to fit the former before the latter. This suggests using a suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled. Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss). We further adapt mixup augmentation to drive our approach a step further. Experiments on CIFAR-10/100 and TinyImageNet demonstrate a robustness to label noise that substantially outperforms recent state-of-the-art. Source code is available at https://git.io/fjsvE
Tasks
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11238v2
PDF	https://arxiv.org/pdf/1904.11238v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-label-noise-modeling-and-loss
Repo	https://github.com/PaulAlbert31/LabelNoiseCorrection
Framework	pytorch

On improving deep learning generalization with adaptive sparse connectivity


Title	On improving deep learning generalization with adaptive sparse connectivity
Authors	Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy
Abstract	Large neural networks are very successful in various tasks. However, with limited data, the generalization capabilities of deep neural networks are also very limited. In this paper, we empirically start showing that intrinsically sparse neural networks with adaptive sparse connectivity, which by design have a strict parameter budget during the training phase, have better generalization capabilities than their fully-connected counterparts. Besides this, we propose a new technique to train these sparse models by combining the Sparse Evolutionary Training (SET) procedure with neurons pruning. Operated on MultiLayer Perceptron (MLP) and tested on 15 datasets, our proposed technique zeros out around 50% of the hidden neurons during training, while having a linear number of parameters to optimize with respect to the number of neurons. The results show a competitive classification and generalization performance.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11626v1
PDF	https://arxiv.org/pdf/1906.11626v1.pdf
PWC	https://paperswithcode.com/paper/on-improving-deep-learning-generalization
Repo	https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks
Framework	tf

Deep Density-aware Count Regressor


Title	Deep Density-aware Count Regressor
Authors	Zhuojun Chen, Junhao Cheng, Yuchen Yuan, Dongping Liao, Yizhou Li, Jiancheng Lv
Abstract	We seek to improve crowd counting as we perceive limits of currently prevalent density map estimation approach on both prediction accuracy and time efficiency. We show that a CNN regressing a global count trained with density map supervision can make more accurate prediction. We introduce multilayer gradient fusion for training a density-aware global count regressor. More specifically, on training stage, a backbone network receives gradients from multiple branches to learn the density information, whereas those branches are to be detached to accelerate inference. By taking advantages of such method, our model improves benchmark results on public datasets and exhibits itself to be a new solution to crowd counting problem in practice.
Tasks	Crowd Counting
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03314v2
PDF	https://arxiv.org/pdf/1908.03314v2.pdf
PWC	https://paperswithcode.com/paper/deep-density-aware-count-regressor
Repo	https://github.com/GeorgeChenZJ/deepcount
Framework	tf

Generative Adversarial Networks for text using word2vec intermediaries


Title	Generative Adversarial Networks for text using word2vec intermediaries
Authors	Akshay Budhkar, Krishnapriya Vishnubhotla, Safwan Hossain, Frank Rudzicz
Abstract	Generative adversarial networks (GANs) have shown considerable success, especially in the realistic generation of images. In this work, we apply similar techniques for the generation of text. We propose a novel approach to handle the discrete nature of text, during training, using word embeddings. Our method is agnostic to vocabulary size and achieves competitive results relative to methods with various discrete gradient estimators.
Tasks	Word Embeddings
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02293v1
PDF	http://arxiv.org/pdf/1904.02293v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-for-text
Repo	https://github.com/adventure2165/GAN2vec
Framework	none

Learning to Optimize Multigrid PDE Solvers


Title	Learning to Optimize Multigrid PDE Solvers
Authors	Daniel Greenfeld, Meirav Galun, Ron Kimmel, Irad Yavneh, Ronen Basri
Abstract	Constructing fast numerical solvers for partial differential equations (PDEs) is crucial for many scientific disciplines. A leading technique for solving large-scale PDEs is using multigrid methods. At the core of a multigrid solver is the prolongation matrix, which relates between different scales of the problem. This matrix is strongly problem-dependent, and its optimal construction is critical to the efficiency of the solver. In practice, however, devising multigrid algorithms for new problems often poses formidable challenges. In this paper we propose a framework for learning multigrid solvers. Our method learns a (single) mapping from a family of parameterized PDEs to prolongation operators. We train a neural network once for the entire class of PDEs, using an efficient and unsupervised loss function. Experiments on a broad class of 2D diffusion problems demonstrate improved convergence rates compared to the widely used Black-Box multigrid scheme, suggesting that our method successfully learned rules for constructing prolongation matrices.
Tasks
Published	2019-02-25
URL	https://arxiv.org/abs/1902.10248v3
PDF	https://arxiv.org/pdf/1902.10248v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-optimize-multigrid-pde-solvers
Repo	https://github.com/danielgreenfeld3/Learning-to-optimize-multigrid-solvers
Framework	tf

Attention model for articulatory features detection


Title	Attention model for articulatory features detection
Authors	Ievgen Karaulov, Dmytro Tkanov
Abstract	Articulatory distinctive features, as well as phonetic transcription, play important role in speech-related tasks: computer-assisted pronunciation training, text-to-speech conversion (TTS), studying speech production mechanisms, speech recognition for low-resourced languages. End-to-end approaches to speech-related tasks got a lot of traction in recent years. We apply Listen, Attend and Spell~(LAS)~\cite{Chan-LAS2016} architecture to phones recognition on a small small training set, like TIMIT~\cite{TIMIT-1992}. Also, we introduce a novel decoding technique that allows to train manners and places of articulation detectors end-to-end using attention models. We also explore joint phones recognition and articulatory features detection in multitask learning setting.
Tasks	Manner Of Articulation Detection, Speech Recognition
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01914v1
PDF	https://arxiv.org/pdf/1907.01914v1.pdf
PWC	https://paperswithcode.com/paper/attention-model-for-articulatory-features
Repo	https://github.com/sciforce/phones-las
Framework	tf

Graph Embedded Pose Clustering for Anomaly Detection


Title	Graph Embedded Pose Clustering for Anomaly Detection
Authors	Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, Shai Avidan
Abstract	We propose a new method for anomaly detection of human actions. Our method works directly on human pose graphs that can be computed from an input video sequence. This makes the analysis independent of nuisance parameters such as viewpoint or illumination. We map these graphs to a latent space and cluster them. Each action is then represented by its soft-assignment to each of the clusters. This gives a kind of “bag of words” representation to the data, where every action is represented by its similarity to a group of base action-words. Then, we use a Dirichlet process based mixture, that is useful for handling proportional data such as our soft-assignment vectors, to determine if an action is normal or not. We evaluate our method on two types of data sets. The first is a fine-grained anomaly detection data set (e.g. ShanghaiTech) where we wish to detect unusual variations of some action. The second is a coarse-grained anomaly detection data set (e.g.,\ a Kinetics-based data set) where few actions are considered normal, and every other action should be considered abnormal. Extensive experiments on the benchmarks show that our method performs considerably better than other state of the art methods.
Tasks	Anomaly Detection
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11850v1
PDF	https://arxiv.org/pdf/1912.11850v1.pdf
PWC	https://paperswithcode.com/paper/graph-embedded-pose-clustering-for-anomaly
Repo	https://github.com/amirmk89/gepc
Framework	pytorch

IStego100K: Large-scale Image Steganalysis Dataset


Title	IStego100K: Large-scale Image Steganalysis Dataset
Authors	Zhongliang Yang, Ke Wang, Sai Ma, Yongfeng Huang, Xiangui Kang, Xianfeng Zhao
Abstract	In order to promote the rapid development of image steganalysis technology, in this paper, we construct and release a multivariable large-scale image steganalysis dataset called IStego100K. It contains 208,104 images with the same size of 1024*1024. Among them, 200,000 images (100,000 cover-stego image pairs) are divided as the training set and the remaining 8,104 as testing set. In addition, we hope that IStego100K can help researchers further explore the development of universal image steganalysis algorithms, so we try to reduce limits on the images in IStego100K. For each image in IStego100K, the quality factors is randomly set in the range of 75-95, the steganographic algorithm is randomly selected from three well-known steganographic algorithms, which are J-uniward, nsF5 and UERD, and the embedding rate is also randomly set to be a value of 0.1-0.4. In addition, considering the possible mismatch between training samples and test samples in real environment, we add a test set (DS-Test) whose source of samples are different from the training set. We hope that this test set can help to evaluate the robustness of steganalysis algorithms. We tested the performance of some latest steganalysis algorithms on IStego100K, with specific results and analysis details in the experimental part. We hope that the IStego100K dataset will further promote the development of universal image steganalysis technology. The description of IStego100K and instructions for use can be found at https://github.com/YangzlTHU/IStego100K
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05542v1
PDF	https://arxiv.org/pdf/1911.05542v1.pdf
PWC	https://paperswithcode.com/paper/istego100k-large-scale-image-steganalysis
Repo	https://github.com/YangzlTHU/IStego100K
Framework	none

SiamVGG: Visual Tracking using Deeper Siamese Networks


Title	SiamVGG: Visual Tracking using Deeper Siamese Networks
Authors	Yuhong Li, Xiaofan Zhang
Abstract	Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16, with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge.
Tasks	Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02804v2
PDF	http://arxiv.org/pdf/1902.02804v2.pdf
PWC	https://paperswithcode.com/paper/siamvgg-visual-tracking-using-deeper-siamese
Repo	https://github.com/leeyeehoo/SiamVGG
Framework	pytorch

Exploring Randomly Wired Neural Networks for Image Recognition


Title	Exploring Randomly Wired Neural Networks for Image Recognition
Authors	Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He
Abstract	Neural networks for image recognition have evolved through extensive manual design from simple chain-like models to structures with multiple wiring paths. The success of ResNets and DenseNets is due in large part to their innovative wiring plans. Now, neural architecture search (NAS) studies are exploring the joint optimization of wiring and operation types, however, the space of possible wirings is constrained and still driven by manual design despite being searched. In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks. To do this, we first define the concept of a stochastic network generator that encapsulates the entire network generation process. Encapsulation provides a unified view of NAS and randomly wired networks. Then, we use three classical random graph models to generate randomly wired graphs for networks. The results are surprising: several variants of these random generators yield network instances that have competitive accuracy on the ImageNet benchmark. These results suggest that new efforts focusing on designing better network generators may lead to new breakthroughs by exploring less constrained search spaces with more room for novel design.
Tasks	Image Classification, Neural Architecture Search
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01569v2
PDF	http://arxiv.org/pdf/1904.01569v2.pdf
PWC	https://paperswithcode.com/paper/exploring-randomly-wired-neural-networks-for
Repo	https://github.com/leaderj1001/RandWireNN
Framework	pytorch

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings


Title	Accurate and Scalable Version Identification Using Musically-Motivated Embeddings
Authors	Furkan Yesiler, Joan Serrà, Emilia Gómez
Abstract	The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.
Tasks	Data Augmentation
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12551v1
PDF	https://arxiv.org/pdf/1910.12551v1.pdf
PWC	https://paperswithcode.com/paper/accurate-and-scalable-version-identification
Repo	https://github.com/furkanyesiler/move
Framework	pytorch

Fast Sparse ConvNets


Title	Fast Sparse ConvNets
Authors	Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan
Abstract	Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher efficiency, but also higher accuracy, and found wide adoption in the field. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new, the conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we open-source for the benefit of the community as part of the XNNPACK library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by $1.3-2.4\times$ – equivalent to approximately one entire generation of MobileNet-family improvement. We hope that our findings will facilitate wider adoption of sparsity as a tool for creating efficient and accurate deep learning architectures.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09723v1
PDF	https://arxiv.org/pdf/1911.09723v1.pdf
PWC	https://paperswithcode.com/paper/fast-sparse-convnets-1
Repo	https://github.com/google/XNNPACK
Framework	tf

AFS: An Attention-based mechanism for Supervised Feature Selection


Title	AFS: An Attention-based mechanism for Supervised Feature Selection
Authors	Ning Gui, Danni Ge, Ziyin Hu
Abstract	As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature se-lection patterns adjusted by backpropagation during the train-ing process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also intro-duced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algo-rithms upon both MNIST, noisy MNIST and several datasets with small samples.
Tasks	Feature Selection
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11074v1
PDF	http://arxiv.org/pdf/1902.11074v1.pdf
PWC	https://paperswithcode.com/paper/afs-an-attention-based-mechanism-for
Repo	https://github.com/upup123/AAAI-2019-AFS
Framework	tf

Seesaw-Net: Convolution Neural Network With Uneven Group Convolution


Title	Seesaw-Net: Convolution Neural Network With Uneven Group Convolution
Authors	Jintao Zhang
Abstract	In this paper, we are interested in boosting the representation capability of convolution neural networks which utilizing the inverted residual structure. Based on the success of Inverted Residual structure[Sandler et al. 2018] and Interleaved Low-Rank Group Convolutions[Sun et al. 2018], we rethink this two pattern of neural network structure, rather than NAS(Neural architecture search) method[Zoph and Le 2017; Pham et al. 2018; Liu et al. 2018b], we introduce uneven point-wise group convolution, which provide a novel search space for designing basic blocks to obtain better trade-off between representation capability and computational cost. Meanwhile, we propose two novel information flow patterns that will enable cross-group information flow for multiple group convolution layers with and without any channel permute/shuffle operation. Dense experiments on image classification task show that our proposed model, named Seesaw-Net, achieves state-of-the-art(SOTA) performance with limited computation and memory cost. Our code will be open-source and available together with pre-trained models.
Tasks	Image Classification, Neural Architecture Search
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03672v5
PDF	https://arxiv.org/pdf/1905.03672v5.pdf
PWC	https://paperswithcode.com/paper/190503672
Repo	https://github.com/cvtower/SeesawNet-pytorch-reimplement
Framework	pytorch

Learning to Reconstruct People in Clothing from a Single RGB Camera


Title	Learning to Reconstruct People in Clothing from a Single RGB Camera
Authors	Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, Gerard Pons-Moll
Abstract	We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach.
Tasks
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05885v2
PDF	http://arxiv.org/pdf/1903.05885v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-reconstruct-people-in-clothing
Repo	https://github.com/thmoa/octopus
Framework	tf