Paper Group AWR 91
Unsupervised Label Noise Modeling and Loss Correction. On improving deep learning generalization with adaptive sparse connectivity. Deep Density-aware Count Regressor. Generative Adversarial Networks for text using word2vec intermediaries. Learning to Optimize Multigrid PDE Solvers. Attention model for articulatory features detection. Graph Embedde …
Unsupervised Label Noise Modeling and Loss Correction
Title | Unsupervised Label Noise Modeling and Loss Correction |
Authors | Eric Arazo, Diego Ortego, Paul Albert, Noel E. O’Connor, Kevin McGuinness |
Abstract | Despite being robust to small amounts of label noise, convolutional neural networks trained with stochastic gradient methods have been shown to easily fit random labels. When there are a mixture of correct and mislabelled targets, networks tend to fit the former before the latter. This suggests using a suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled. Specifically, we propose a beta mixture to estimate this probability and correct the loss by relying on the network prediction (the so-called bootstrapping loss). We further adapt mixup augmentation to drive our approach a step further. Experiments on CIFAR-10/100 and TinyImageNet demonstrate a robustness to label noise that substantially outperforms recent state-of-the-art. Source code is available at https://git.io/fjsvE |
Tasks | |
Published | 2019-04-25 |
URL | https://arxiv.org/abs/1904.11238v2 |
https://arxiv.org/pdf/1904.11238v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-label-noise-modeling-and-loss |
Repo | https://github.com/PaulAlbert31/LabelNoiseCorrection |
Framework | pytorch |
On improving deep learning generalization with adaptive sparse connectivity
Title | On improving deep learning generalization with adaptive sparse connectivity |
Authors | Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy |
Abstract | Large neural networks are very successful in various tasks. However, with limited data, the generalization capabilities of deep neural networks are also very limited. In this paper, we empirically start showing that intrinsically sparse neural networks with adaptive sparse connectivity, which by design have a strict parameter budget during the training phase, have better generalization capabilities than their fully-connected counterparts. Besides this, we propose a new technique to train these sparse models by combining the Sparse Evolutionary Training (SET) procedure with neurons pruning. Operated on MultiLayer Perceptron (MLP) and tested on 15 datasets, our proposed technique zeros out around 50% of the hidden neurons during training, while having a linear number of parameters to optimize with respect to the number of neurons. The results show a competitive classification and generalization performance. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11626v1 |
https://arxiv.org/pdf/1906.11626v1.pdf | |
PWC | https://paperswithcode.com/paper/on-improving-deep-learning-generalization |
Repo | https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks |
Framework | tf |
Deep Density-aware Count Regressor
Title | Deep Density-aware Count Regressor |
Authors | Zhuojun Chen, Junhao Cheng, Yuchen Yuan, Dongping Liao, Yizhou Li, Jiancheng Lv |
Abstract | We seek to improve crowd counting as we perceive limits of currently prevalent density map estimation approach on both prediction accuracy and time efficiency. We show that a CNN regressing a global count trained with density map supervision can make more accurate prediction. We introduce multilayer gradient fusion for training a density-aware global count regressor. More specifically, on training stage, a backbone network receives gradients from multiple branches to learn the density information, whereas those branches are to be detached to accelerate inference. By taking advantages of such method, our model improves benchmark results on public datasets and exhibits itself to be a new solution to crowd counting problem in practice. |
Tasks | Crowd Counting |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03314v2 |
https://arxiv.org/pdf/1908.03314v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-density-aware-count-regressor |
Repo | https://github.com/GeorgeChenZJ/deepcount |
Framework | tf |
Generative Adversarial Networks for text using word2vec intermediaries
Title | Generative Adversarial Networks for text using word2vec intermediaries |
Authors | Akshay Budhkar, Krishnapriya Vishnubhotla, Safwan Hossain, Frank Rudzicz |
Abstract | Generative adversarial networks (GANs) have shown considerable success, especially in the realistic generation of images. In this work, we apply similar techniques for the generation of text. We propose a novel approach to handle the discrete nature of text, during training, using word embeddings. Our method is agnostic to vocabulary size and achieves competitive results relative to methods with various discrete gradient estimators. |
Tasks | Word Embeddings |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02293v1 |
http://arxiv.org/pdf/1904.02293v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-for-text |
Repo | https://github.com/adventure2165/GAN2vec |
Framework | none |
Learning to Optimize Multigrid PDE Solvers
Title | Learning to Optimize Multigrid PDE Solvers |
Authors | Daniel Greenfeld, Meirav Galun, Ron Kimmel, Irad Yavneh, Ronen Basri |
Abstract | Constructing fast numerical solvers for partial differential equations (PDEs) is crucial for many scientific disciplines. A leading technique for solving large-scale PDEs is using multigrid methods. At the core of a multigrid solver is the prolongation matrix, which relates between different scales of the problem. This matrix is strongly problem-dependent, and its optimal construction is critical to the efficiency of the solver. In practice, however, devising multigrid algorithms for new problems often poses formidable challenges. In this paper we propose a framework for learning multigrid solvers. Our method learns a (single) mapping from a family of parameterized PDEs to prolongation operators. We train a neural network once for the entire class of PDEs, using an efficient and unsupervised loss function. Experiments on a broad class of 2D diffusion problems demonstrate improved convergence rates compared to the widely used Black-Box multigrid scheme, suggesting that our method successfully learned rules for constructing prolongation matrices. |
Tasks | |
Published | 2019-02-25 |
URL | https://arxiv.org/abs/1902.10248v3 |
https://arxiv.org/pdf/1902.10248v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-optimize-multigrid-pde-solvers |
Repo | https://github.com/danielgreenfeld3/Learning-to-optimize-multigrid-solvers |
Framework | tf |
Attention model for articulatory features detection
Title | Attention model for articulatory features detection |
Authors | Ievgen Karaulov, Dmytro Tkanov |
Abstract | Articulatory distinctive features, as well as phonetic transcription, play important role in speech-related tasks: computer-assisted pronunciation training, text-to-speech conversion (TTS), studying speech production mechanisms, speech recognition for low-resourced languages. End-to-end approaches to speech-related tasks got a lot of traction in recent years. We apply Listen, Attend and Spell~(LAS)~\cite{Chan-LAS2016} architecture to phones recognition on a small small training set, like TIMIT~\cite{TIMIT-1992}. Also, we introduce a novel decoding technique that allows to train manners and places of articulation detectors end-to-end using attention models. We also explore joint phones recognition and articulatory features detection in multitask learning setting. |
Tasks | Manner Of Articulation Detection, Speech Recognition |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01914v1 |
https://arxiv.org/pdf/1907.01914v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-model-for-articulatory-features |
Repo | https://github.com/sciforce/phones-las |
Framework | tf |
Graph Embedded Pose Clustering for Anomaly Detection
Title | Graph Embedded Pose Clustering for Anomaly Detection |
Authors | Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, Shai Avidan |
Abstract | We propose a new method for anomaly detection of human actions. Our method works directly on human pose graphs that can be computed from an input video sequence. This makes the analysis independent of nuisance parameters such as viewpoint or illumination. We map these graphs to a latent space and cluster them. Each action is then represented by its soft-assignment to each of the clusters. This gives a kind of “bag of words” representation to the data, where every action is represented by its similarity to a group of base action-words. Then, we use a Dirichlet process based mixture, that is useful for handling proportional data such as our soft-assignment vectors, to determine if an action is normal or not. We evaluate our method on two types of data sets. The first is a fine-grained anomaly detection data set (e.g. ShanghaiTech) where we wish to detect unusual variations of some action. The second is a coarse-grained anomaly detection data set (e.g.,\ a Kinetics-based data set) where few actions are considered normal, and every other action should be considered abnormal. Extensive experiments on the benchmarks show that our method performs considerably better than other state of the art methods. |
Tasks | Anomaly Detection |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.11850v1 |
https://arxiv.org/pdf/1912.11850v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-embedded-pose-clustering-for-anomaly |
Repo | https://github.com/amirmk89/gepc |
Framework | pytorch |
IStego100K: Large-scale Image Steganalysis Dataset
Title | IStego100K: Large-scale Image Steganalysis Dataset |
Authors | Zhongliang Yang, Ke Wang, Sai Ma, Yongfeng Huang, Xiangui Kang, Xianfeng Zhao |
Abstract | In order to promote the rapid development of image steganalysis technology, in this paper, we construct and release a multivariable large-scale image steganalysis dataset called IStego100K. It contains 208,104 images with the same size of 1024*1024. Among them, 200,000 images (100,000 cover-stego image pairs) are divided as the training set and the remaining 8,104 as testing set. In addition, we hope that IStego100K can help researchers further explore the development of universal image steganalysis algorithms, so we try to reduce limits on the images in IStego100K. For each image in IStego100K, the quality factors is randomly set in the range of 75-95, the steganographic algorithm is randomly selected from three well-known steganographic algorithms, which are J-uniward, nsF5 and UERD, and the embedding rate is also randomly set to be a value of 0.1-0.4. In addition, considering the possible mismatch between training samples and test samples in real environment, we add a test set (DS-Test) whose source of samples are different from the training set. We hope that this test set can help to evaluate the robustness of steganalysis algorithms. We tested the performance of some latest steganalysis algorithms on IStego100K, with specific results and analysis details in the experimental part. We hope that the IStego100K dataset will further promote the development of universal image steganalysis technology. The description of IStego100K and instructions for use can be found at https://github.com/YangzlTHU/IStego100K |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05542v1 |
https://arxiv.org/pdf/1911.05542v1.pdf | |
PWC | https://paperswithcode.com/paper/istego100k-large-scale-image-steganalysis |
Repo | https://github.com/YangzlTHU/IStego100K |
Framework | none |
SiamVGG: Visual Tracking using Deeper Siamese Networks
Title | SiamVGG: Visual Tracking using Deeper Siamese Networks |
Authors | Yuhong Li, Xiaofan Zhang |
Abstract | Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16, with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge. |
Tasks | Object Tracking, Visual Object Tracking, Visual Tracking |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02804v2 |
http://arxiv.org/pdf/1902.02804v2.pdf | |
PWC | https://paperswithcode.com/paper/siamvgg-visual-tracking-using-deeper-siamese |
Repo | https://github.com/leeyeehoo/SiamVGG |
Framework | pytorch |
Exploring Randomly Wired Neural Networks for Image Recognition
Title | Exploring Randomly Wired Neural Networks for Image Recognition |
Authors | Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He |
Abstract | Neural networks for image recognition have evolved through extensive manual design from simple chain-like models to structures with multiple wiring paths. The success of ResNets and DenseNets is due in large part to their innovative wiring plans. Now, neural architecture search (NAS) studies are exploring the joint optimization of wiring and operation types, however, the space of possible wirings is constrained and still driven by manual design despite being searched. In this paper, we explore a more diverse set of connectivity patterns through the lens of randomly wired neural networks. To do this, we first define the concept of a stochastic network generator that encapsulates the entire network generation process. Encapsulation provides a unified view of NAS and randomly wired networks. Then, we use three classical random graph models to generate randomly wired graphs for networks. The results are surprising: several variants of these random generators yield network instances that have competitive accuracy on the ImageNet benchmark. These results suggest that new efforts focusing on designing better network generators may lead to new breakthroughs by exploring less constrained search spaces with more room for novel design. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01569v2 |
http://arxiv.org/pdf/1904.01569v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-randomly-wired-neural-networks-for |
Repo | https://github.com/leaderj1001/RandWireNN |
Framework | pytorch |
Accurate and Scalable Version Identification Using Musically-Motivated Embeddings
Title | Accurate and Scalable Version Identification Using Musically-Motivated Embeddings |
Authors | Furkan Yesiler, Joan Serrà, Emilia Gómez |
Abstract | The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance. |
Tasks | Data Augmentation |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12551v1 |
https://arxiv.org/pdf/1910.12551v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-and-scalable-version-identification |
Repo | https://github.com/furkanyesiler/move |
Framework | pytorch |
Fast Sparse ConvNets
Title | Fast Sparse ConvNets |
Authors | Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan |
Abstract | Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher efficiency, but also higher accuracy, and found wide adoption in the field. In this work, we further expand the arsenal of efficient building blocks for neural network architectures; but instead of combining standard primitives (such as convolution), we advocate for the replacement of these dense primitives with their sparse counterparts. While the idea of using sparsity to decrease the parameter count is not new, the conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly, which we open-source for the benefit of the community as part of the XNNPACK library. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve. On Snapdragon 835 our sparse networks outperform their dense equivalents by $1.3-2.4\times$ – equivalent to approximately one entire generation of MobileNet-family improvement. We hope that our findings will facilitate wider adoption of sparsity as a tool for creating efficient and accurate deep learning architectures. |
Tasks | |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09723v1 |
https://arxiv.org/pdf/1911.09723v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-sparse-convnets-1 |
Repo | https://github.com/google/XNNPACK |
Framework | tf |
AFS: An Attention-based mechanism for Supervised Feature Selection
Title | AFS: An Attention-based mechanism for Supervised Feature Selection |
Authors | Ning Gui, Danni Ge, Ziyin Hu |
Abstract | As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature se-lection patterns adjusted by backpropagation during the train-ing process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also intro-duced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algo-rithms upon both MNIST, noisy MNIST and several datasets with small samples. |
Tasks | Feature Selection |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11074v1 |
http://arxiv.org/pdf/1902.11074v1.pdf | |
PWC | https://paperswithcode.com/paper/afs-an-attention-based-mechanism-for |
Repo | https://github.com/upup123/AAAI-2019-AFS |
Framework | tf |
Seesaw-Net: Convolution Neural Network With Uneven Group Convolution
Title | Seesaw-Net: Convolution Neural Network With Uneven Group Convolution |
Authors | Jintao Zhang |
Abstract | In this paper, we are interested in boosting the representation capability of convolution neural networks which utilizing the inverted residual structure. Based on the success of Inverted Residual structure[Sandler et al. 2018] and Interleaved Low-Rank Group Convolutions[Sun et al. 2018], we rethink this two pattern of neural network structure, rather than NAS(Neural architecture search) method[Zoph and Le 2017; Pham et al. 2018; Liu et al. 2018b], we introduce uneven point-wise group convolution, which provide a novel search space for designing basic blocks to obtain better trade-off between representation capability and computational cost. Meanwhile, we propose two novel information flow patterns that will enable cross-group information flow for multiple group convolution layers with and without any channel permute/shuffle operation. Dense experiments on image classification task show that our proposed model, named Seesaw-Net, achieves state-of-the-art(SOTA) performance with limited computation and memory cost. Our code will be open-source and available together with pre-trained models. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03672v5 |
https://arxiv.org/pdf/1905.03672v5.pdf | |
PWC | https://paperswithcode.com/paper/190503672 |
Repo | https://github.com/cvtower/SeesawNet-pytorch-reimplement |
Framework | pytorch |
Learning to Reconstruct People in Clothing from a Single RGB Camera
Title | Learning to Reconstruct People in Clothing from a Single RGB Camera |
Authors | Thiemo Alldieck, Marcus Magnor, Bharat Lal Bhatnagar, Christian Theobalt, Gerard Pons-Moll |
Abstract | We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach. |
Tasks | |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.05885v2 |
http://arxiv.org/pdf/1903.05885v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reconstruct-people-in-clothing |
Repo | https://github.com/thmoa/octopus |
Framework | tf |