Paper Group ANR 476
A Simple Differentiable Programming Language. MixUp as Directional Adversarial Training. TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW. Removing Stripes, Scratches, and Curtaining with Non-Recoverable Compressed Sensing. Pruning from Scratch. Towards Automatic Embryo Staging in 3D+T Microscopy Images using Convolution …
A Simple Differentiable Programming Language
Title | A Simple Differentiable Programming Language |
Authors | Martin Abadi, Gordon D. Plotkin |
Abstract | Automatic differentiation plays a prominent role in scientific computing and in modern machine learning, often in the context of powerful programming systems. The relation of the various embodiments of automatic differentiation to the mathematical notion of derivative is not always entirely clear—discrepancies can arise, sometimes inadvertently. In order to study automatic differentiation in such programming contexts, we define a small but expressive programming language that includes a construct for reverse-mode differentiation. We give operational and denotational semantics for this language. The operational semantics employs popular implementation techniques, while the denotational semantics employs notions of differentiation familiar from real analysis. We establish that these semantics coincide. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04523v4 |
https://arxiv.org/pdf/1911.04523v4.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-differentiable-programming-language |
Repo | |
Framework | |
MixUp as Directional Adversarial Training
Title | MixUp as Directional Adversarial Training |
Authors | Guillaume P. Archambault, Yongyi Mao, Hongyu Guo, Richong Zhang |
Abstract | In this work, we explain the working mechanism of MixUp in terms of adversarial training. We introduce a new class of adversarial training schemes, which we refer to as directional adversarial training, or DAT. In a nutshell, a DAT scheme perturbs a training example in the direction of another example but keeps its original label as the training target. We prove that MixUp is equivalent to a special subclass of DAT, in that it has the same expected loss function and corresponds to the same optimization problem asymptotically. This understanding not only serves to explain the effectiveness of MixUp, but also reveals a more general family of MixUp schemes, which we call Untied MixUp. We prove that the family of Untied MixUp schemes is equivalent to the entire class of DAT schemes. We establish empirically the existence of Untied Mixup schemes which improve upon MixUp. |
Tasks | |
Published | 2019-06-17 |
URL | https://arxiv.org/abs/1906.06875v1 |
https://arxiv.org/pdf/1906.06875v1.pdf | |
PWC | https://paperswithcode.com/paper/mixup-as-directional-adversarial-training |
Repo | |
Framework | |
TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW
Title | TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW |
Authors | Guy G. F. Lemieux, Joe Edwards, Joel Vandergriendt, Aaron Severance, Ryan De Iaco, Abdullah Raouf, Hussein Osman, Tom Watzka, Satwant Singh |
Abstract | Reduced-precision arithmetic improves the size, cost, power and performance of neural networks in digital logic. In convolutional neural networks, the use of 1b weights can achieve state-of-the-art error rates while eliminating multiplication, reducing storage and improving power efficiency. The BinaryConnect binary-weighted system, for example, achieves 9.9% error using floating-point activations on the CIFAR-10 dataset. In this paper, we introduce TinBiNN, a lightweight vector processor overlay for accelerating inference computations with 1b weights and 8b activations. The overlay is very small – it uses about 5,000 4-input LUTs and fits into a low cost iCE40 UltraPlus FPGA from Lattice Semiconductor. To show this can be useful, we build two embedded ‘person detector’ systems by shrinking the original BinaryConnect network. The first is a 10-category classifier with a 89% smaller network that runs in 1,315ms and achieves 13.6% error. The other is a 1-category classifier that is even smaller, runs in 195ms, and has only 0.4% error. In both classifiers, the error can be attributed entirely to training and not reduced precision. |
Tasks | |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.06630v1 |
http://arxiv.org/pdf/1903.06630v1.pdf | |
PWC | https://paperswithcode.com/paper/tinbinn-tiny-binarized-neural-network-overlay |
Repo | |
Framework | |
Removing Stripes, Scratches, and Curtaining with Non-Recoverable Compressed Sensing
Title | Removing Stripes, Scratches, and Curtaining with Non-Recoverable Compressed Sensing |
Authors | Jonathan Schwartz, Yi Jiang, Yongjie Wang, Anthony Aiello, Pallab Bhattacharya, Hui Yuan, Zetian Mi, Nabil Bassim, Robert Hovden |
Abstract | Highly-directional image artifacts such as ion mill curtaining, mechanical scratches, or image striping from beam instability degrade the interpretability of micrographs. These unwanted, aperiodic features extend the image along a primary direction and occupy a small wedge of information in Fourier space. Deleting this wedge of data replaces stripes, scratches, or curtaining, with more complex streaking and blurring artifacts-known within the tomography community as missing wedge artifacts. Here, we overcome this problem by recovering the missing region using total variation minimization, which leverages image sparsity based reconstruction techniques-colloquially referred to as compressed sensing-to reliably restore images corrupted by stripe like features. Our approach removes beam instability, ion mill curtaining, mechanical scratches, or any stripe features and remains robust at low signal-to-noise. The success of this approach is achieved by exploiting compressed sensings inability to recover directional structures that are highly localized and missing in Fourier Space. |
Tasks | |
Published | 2019-01-23 |
URL | http://arxiv.org/abs/1901.08001v1 |
http://arxiv.org/pdf/1901.08001v1.pdf | |
PWC | https://paperswithcode.com/paper/removing-stripes-scratches-and-curtaining |
Repo | |
Framework | |
Pruning from Scratch
Title | Pruning from Scratch |
Authors | Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu |
Abstract | Network pruning is an important research field aiming at reducing computational costs of neural networks. Conventional approaches follow a fixed paradigm which first trains a large and redundant network, and then determines which units (e.g., channels) are less important and thus can be removed. In this work, we find that pre-training an over-parameterized model is not necessary for obtaining the target pruned structure. In fact, a fully-trained over-parameterized model will reduce the search space for the pruned structure. We empirically show that more diverse pruned structures can be directly pruned from randomly initialized weights, including potential models with better performance. Therefore, we propose a novel network pruning pipeline which allows pruning from scratch. In the experiments for compressing classification models on CIFAR10 and ImageNet datasets, our approach not only greatly reduces the pre-training burden of traditional pruning methods, but also achieves similar or even higher accuracy under the same computation budgets. Our results facilitate the community to rethink the effectiveness of existing techniques used for network pruning. |
Tasks | Network Pruning |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12579v1 |
https://arxiv.org/pdf/1909.12579v1.pdf | |
PWC | https://paperswithcode.com/paper/pruning-from-scratch |
Repo | |
Framework | |
Towards Automatic Embryo Staging in 3D+T Microscopy Images using Convolutional Neural Networks and PointNets
Title | Towards Automatic Embryo Staging in 3D+T Microscopy Images using Convolutional Neural Networks and PointNets |
Authors | Manuel Traub, Johannes Stegmaier |
Abstract | Automatic analyses and comparisons of different stages of embryonic development largely depend on a highly accurate spatio-temporal alignment of the investigated data sets. In this contribution, we compare multiple approaches to perform automatic staging of developing embryos that were imaged with time-resolved 3D light-sheet microscopy. The methods comprise image-based convolutional neural networks as well as an approach based on the PointNet architecture that directly operates on 3D point clouds of detected cell nuclei centroids. The proof-of-concept experiments with four wild-type zebrafish embryos render both approaches suitable for automatic staging with average deviations of 0.45 - 0.57 hours. |
Tasks | |
Published | 2019-10-01 |
URL | https://arxiv.org/abs/1910.00443v1 |
https://arxiv.org/pdf/1910.00443v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automatic-embryo-staging-in-3dt |
Repo | |
Framework | |
Estimating Fingertip Forces, Torques, and Local Curvatures from Fingernail Images
Title | Estimating Fingertip Forces, Torques, and Local Curvatures from Fingernail Images |
Authors | Nutan Chen, Göran Westling, Benoni B. Edin, Patrick van der Smagt |
Abstract | The study of dexterous manipulation has provided important insights in humans sensorimotor control as well as inspiration for manipulation strategies in robotic hands. Previous work focused on experimental environment with restrictions. Here we describe a method using the deformation and color distribution of the fingernail and its surrounding skin, to estimate the fingertip forces, torques and contact surface curvatures for various objects, including the shape and material of the contact surfaces and the weight of the objects. The proposed method circumvents limitations associated with sensorized objects, gloves or fixed contact surface type. In addition, compared with previous single finger estimation in an experimental environment, we extend the approach to multiple finger force estimation, which can be used for applications such as human grasping analysis. Four algorithms are used, c.q., Gaussian process (GP), Convolutional Neural Networks (CNN), Neural Networks with Fast Dropout (NN-FD) and Recurrent Neural Networks with Fast Dropout (RNN-FD), to model a mapping from images to the corresponding labels. The results further show that the proposed method has high accuracy to predict force, torque and contact surface. |
Tasks | |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.05659v1 |
https://arxiv.org/pdf/1909.05659v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-fingertip-forces-torques-and-local |
Repo | |
Framework | |
Neural Network Pruning with Residual-Connections and Limited-Data
Title | Neural Network Pruning with Residual-Connections and Limited-Data |
Authors | Jian-Hao Luo, Jianxin Wu |
Abstract | Filter level pruning is an effective method to accelerate the inference speed of deep CNN models. Although numerous pruning algorithms have been proposed, there are still two open issues. The first problem is how to prune residual connections. Most previous filter level pruning algorithms only prune channels inside residual blocks, leaving the number of output channels unchanged. We show that pruning both channels inside and outside the residual connections is crucial to achieve better performance. The second issue is pruning with limited data. We observe an interesting phenomenon: directly pruning on a small dataset is usually worse than fine-tuning a small model which is pruned or trained from scratch on the large dataset. In this paper, we propose a novel method, namely Compression Using Residual-connections and Limited-data (CURL), to tackle these two challenges. Experiments on the large scale dataset demonstrate the effectiveness of CURL. CURL significantly outperforms previous state-of-the-art methods on ImageNet. More importantly, when pruning on small datasets, CURL achieves comparable or much better performance than fine-tuning a pretrained small model. |
Tasks | Network Pruning |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08114v2 |
https://arxiv.org/pdf/1911.08114v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-pruning-with-residual |
Repo | |
Framework | |
Adaptive Gradient-Based Meta-Learning Methods
Title | Adaptive Gradient-Based Meta-Learning Methods |
Authors | Mikhail Khodak, Maria-Florina Balcan, Ameet Talwalkar |
Abstract | We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction algorithms. Our approach enables the task-similarity to be learned adaptively, provides sharper transfer-risk bounds in the setting of statistical learning-to-learn, and leads to straightforward derivations of average-case regret bounds for efficient algorithms in settings where the task-environment changes dynamically or the tasks share a certain geometric structure. We use our theory to modify several popular meta-learning algorithms and improve their meta-test-time performance on standard problems in few-shot learning and federated learning. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02717v3 |
https://arxiv.org/pdf/1906.02717v3.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-gradient-based-meta-learning-methods |
Repo | |
Framework | |
UDFNet: Unsupervised Disparity Fusion with Adversarial Networks
Title | UDFNet: Unsupervised Disparity Fusion with Adversarial Networks |
Authors | Can Pu, Robert B. Fisher |
Abstract | Existing disparity fusion methods based on deep learning achieve state-of-the-art performance, but they require ground truth disparity data to train. As far as I know, this is the first time an unsupervised disparity fusion not using ground truth disparity data has been proposed. In this paper, a mathematical model for disparity fusion is proposed to guide an adversarial network to train effectively without ground truth disparity data. The initial disparity maps are inputted from the left view along with auxiliary information (gradient, left & right intensity image) into the refiner and the refiner is trained to output the refined disparity map registered on the left view. The refined left disparity map and left intensity image are used to reconstruct a fake right intensity image. Finally, the fake and real right intensity images (from the right stereo vision camera) are fed into the discriminator. In the model, the refiner is trained to output a refined disparity value close to the weighted sum of the disparity inputs for global initialisation. Then, three refinement principles are adopted to refine the results further. (1) The reconstructed intensity error between the fake and real right intensity image is minimised. (2) The similarities between the fake and real right image in different receptive fields are maximised. (3) The refined disparity map is smoothed based on the corresponding intensity image. The adversarial networks’ architectures are effective for the fusion task. The fusion time using the proposed network is small. The network can achieve 90 fps using Nvidia Geforce GTX 1080Ti on the Kitti2015 dataset when the input resolution is 1242 * 375 (Width * Height) without downsampling and cropping. The accuracy of this work is equal to (or better than) the state-of-the-art supervised methods. |
Tasks | |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.10044v1 |
http://arxiv.org/pdf/1904.10044v1.pdf | |
PWC | https://paperswithcode.com/paper/udfnet-unsupervised-disparity-fusion-with |
Repo | |
Framework | |
A Unified Framework for Tuning Hyperparameters in Clustering Problems
Title | A Unified Framework for Tuning Hyperparameters in Clustering Problems |
Authors | Xinjie Fan, Yuguang Yue, Purnamrita Sarkar, Y. X. Rachel Wang |
Abstract | Selecting hyperparameters for unsupervised learning problems is challenging in general due to the lack of ground truth for validation. Despite the prevalence of this issue in statistics and machine learning, especially in clustering problems, there are not many methods for tuning these hyperparameters with theoretical guarantees. In this paper, we provide a framework with provable guarantees for selecting hyperparameters in a number of distinct models. We consider both the subgaussian mixture model and network models to serve as examples of i.i.d. and non-i.i.d. data. We demonstrate that the same framework can be used to choose the Lagrange multipliers of penalty terms in semi-definite programming (SDP) relaxations for community detection, and the bandwidth parameter for constructing kernel similarity matrices for spectral clustering. By incorporating a cross-validation procedure, we show the framework can also do consistent model selection for network models. Using a variety of simulated and real data examples, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings. |
Tasks | Community Detection, Model Selection |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08018v2 |
https://arxiv.org/pdf/1910.08018v2.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-framework-for-tuning |
Repo | |
Framework | |
Fair Adversarial Gradient Tree Boosting
Title | Fair Adversarial Gradient Tree Boosting |
Authors | Vincent Grari, Boris Ruf, Sylvain Lamprier, Marcin Detyniecki |
Abstract | Fair classification has become an important topic in machine learning research. While most bias mitigation strategies focus on neural networks, we noticed a lack of work on fair classifiers based on decision trees even though they have proven very efficient. In an up-to-date comparison of state-of-the-art classification algorithms in tabular data, tree boosting outperforms deep learning. For this reason, we have developed a novel approach of adversarial gradient tree boosting. The objective of the algorithm is to predict the output $Y$ with gradient tree boosting while minimizing the ability of an adversarial neural network to predict the sensitive attribute $S$. The approach incorporates at each iteration the gradient of the neural network directly in the gradient tree boosting. We empirically assess our approach on 4 popular data sets and compare against state-of-the-art algorithms. The results show that our algorithm achieves a higher accuracy while obtaining the same level of fairness, as measured using a set of different common fairness definitions. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05369v2 |
https://arxiv.org/pdf/1911.05369v2.pdf | |
PWC | https://paperswithcode.com/paper/fair-adversarial-gradient-tree-boosting |
Repo | |
Framework | |
Hard-Aware Fashion Attribute Classification
Title | Hard-Aware Fashion Attribute Classification |
Authors | Yun Ye, Yixin Li, Bo Wu, Wei Zhang, Lingyu Duan, Tao Mei |
Abstract | Fashion attribute classification is of great importance to many high-level tasks such as fashion item search, fashion trend analysis, fashion recommendation, etc. The task is challenging due to the extremely imbalanced data distribution, particularly the attributes with only a few positive samples. In this paper, we introduce a hard-aware pipeline to make full use of “hard” samples/attributes. We first propose Hard-Aware BackPropagation (HABP) to efficiently and adaptively focus on training “hard” data. Then for the identified hard labels, we propose to synthesize more complementary samples for training. To stabilize training, we extend semi-supervised GAN by directly deactivating outputs for synthetic complementary samples (Deact). In general, our method is more effective in addressing “hard” cases. HABP weights more on “hard” samples. For “hard” attributes with insufficient training data, Deact brings more stable synthetic samples for training and further improve the performance. Our method is verified on large scale fashion dataset, outperforming other state-of-the-art without any additional supervisions. |
Tasks | |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.10839v1 |
https://arxiv.org/pdf/1907.10839v1.pdf | |
PWC | https://paperswithcode.com/paper/hard-aware-fashion-attribute-classification |
Repo | |
Framework | |
Multi-Module System for Open Domain Chinese Question Answering over Knowledge Base
Title | Multi-Module System for Open Domain Chinese Question Answering over Knowledge Base |
Authors | Yiying Yang, Xiahui He, Kaijie Zhou, Zhongyu Wei |
Abstract | For the task of open domain Knowledge Based Question Answering in CCKS2019, we propose a method combining information retrieval and semantic parsing. This multi-module system extracts the topic entity and the most related relation predicate from a question and transforms it into a Sparql query statement. Our method obtained the F1 score of 70.45% on the test data. |
Tasks | Information Retrieval, Question Answering, Semantic Parsing |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12477v1 |
https://arxiv.org/pdf/1910.12477v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-module-system-for-open-domain-chinese |
Repo | |
Framework | |
Privacy Preserving Stochastic Channel-Based Federated Learning with Neural Network Pruning
Title | Privacy Preserving Stochastic Channel-Based Federated Learning with Neural Network Pruning |
Authors | Rulin Shao, Hui Liu, Dianbo Liu |
Abstract | Artificial neural network has achieved unprecedented success in a wide variety of domains such as classifying, predicting and recognizing objects. This success depends on the availability of big data since the training process requires massive and representative data sets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. We design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents equal performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the converging rates could be increased by introducing a pruning process. |
Tasks | Network Pruning |
Published | 2019-10-04 |
URL | https://arxiv.org/abs/1910.02115v1 |
https://arxiv.org/pdf/1910.02115v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-stochastic-channel-based |
Repo | |
Framework | |