Paper Group ANR 801
The Ant Swarm Neuro-Evolution Procedure for Optimizing Recurrent Networks. Few-shot Adaptive Faster R-CNN. Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty. Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems. Improvin …
The Ant Swarm Neuro-Evolution Procedure for Optimizing Recurrent Networks
Title | The Ant Swarm Neuro-Evolution Procedure for Optimizing Recurrent Networks |
Authors | AbdElRahman A. ElSaid, Alexander G. Ororbia, Travis J. Desell |
Abstract | Hand-crafting effective and efficient structures for recurrent neural networks (RNNs) is a difficult, expensive, and time-consuming process. To address this challenge, we propose a novel neuro-evolution algorithm based on ant colony optimization (ACO), called ant swarm neuro-evolution (ASNE), for directly optimizing RNN topologies. The procedure selects from multiple modern recurrent cell types such as Delta-RNN, GRU, LSTM, MGU and UGRNN cells, as well as recurrent connections which may span multiple layers and/or steps of time. In order to introduce an inductive bias that encourages the formation of sparser synaptic connectivity patterns, we investigate several variations of the core algorithm. We do so primarily by formulating different functions that drive the underlying pheromone simulation process (which mimic L1 and L2 regularization in standard machine learning) as well as by introducing ant agents with specialized roles (inspired by how real ant colonies operate), i.e., explorer ants that construct the initial feed forward structure and social ants which select nodes from the feed forward connections to subsequently craft recurrent memory structures. We also incorporate a Lamarckian strategy for weight initialization which reduces the number of backpropagation epochs required to locally train candidate RNNs, speeding up the neuro-evolution process. Our results demonstrate that the sparser RNNs evolved by ASNE significantly outperform traditional one and two layer architectures consisting of modern memory cells, as well as the well-known NEAT algorithm. Furthermore, we improve upon prior state-of-the-art results on the time series dataset utilized in our experiments. |
Tasks | L2 Regularization, Time Series |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11849v2 |
https://arxiv.org/pdf/1909.11849v2.pdf | |
PWC | https://paperswithcode.com/paper/the-ant-swarm-neuro-evolution-procedure-for |
Repo | |
Framework | |
Few-shot Adaptive Faster R-CNN
Title | Few-shot Adaptive Faster R-CNN |
Authors | Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng |
Abstract | To mitigate the detection performance drop caused by domain shift, we aim to develop a novel few-shot adaptation approach that requires only a few target domain images with limited bounding box annotations. To this end, we first observe several significant challenges. First, the target domain data is highly insufficient, making most existing domain adaptation methods ineffective. Second, object detection involves simultaneous localization and classification, further complicating the model adaptation process. Third, the model suffers from over-adaptation (similar to overfitting when training with a few data example) and instability risk that may lead to degraded detection performance in the target domain. To address these challenges, we first introduce a pairing mechanism over source and target features to alleviate the issue of insufficient target domain samples. We then propose a bi-level module to adapt the source trained detector to the target domain: 1) the split pooling based image level adaptation module uniformly extracts and aligns paired local patch features over locations, with different scale and aspect ratio; 2) the instance level adaptation module semantically aligns paired object features while avoids inter-class confusion. Meanwhile, a source model feature regularization (SMFR) is applied to stabilize the adaptation process of the two modules. Combining these contributions gives a novel few-shot adaptive Faster-RCNN framework, termed FAFRCNN, which effectively adapts to target domain with a few labeled samples. Experiments with multiple datasets show that our model achieves new state-of-the-art performance under both the interested few-shot domain adaptation(FDA) and unsupervised domain adaptation(UDA) setting. |
Tasks | Domain Adaptation, Object Detection, Unsupervised Domain Adaptation |
Published | 2019-03-22 |
URL | http://arxiv.org/abs/1903.09372v1 |
http://arxiv.org/pdf/1903.09372v1.pdf | |
PWC | https://paperswithcode.com/paper/few-shot-adaptive-faster-r-cnn |
Repo | |
Framework | |
Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty
Title | Semi-Supervised Multitask Learning on Multispectral Satellite Images Using Wasserstein Generative Adversarial Networks (GANs) for Predicting Poverty |
Authors | Anthony Perez, Swetava Ganguli, Stefano Ermon, George Azzari, Marshall Burke, David Lobell |
Abstract | Obtaining reliable data describing local poverty metrics at a granularity that is informative to policy-makers requires expensive and logistically difficult surveys, particularly in the developing world. Not surprisingly, the poverty stricken regions are also the ones which have a high probability of being a war zone, have poor infrastructure and sometimes have governments that do not cooperate with internationally funded development efforts. We train a CNN on free and publicly available daytime satellite images of the African continent from Landsat 7 to build a model for predicting local economic livelihoods. Only 5% of the satellite images can be associated with labels (which are obtained from DHS Surveys) and thus a semi-supervised approach using a GAN (similar to the approach of Salimans, et al. (2016)), albeit with a more stable-to-train flavor of GANs called the Wasserstein GAN regularized with gradient penalty(Gulrajani, et al. (2017)) is used. The method of multitask learning is employed to regularize the network and also create an end-to-end model for the prediction of multiple poverty metrics. |
Tasks | |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.11110v2 |
http://arxiv.org/pdf/1902.11110v2.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-multitask-learning-on |
Repo | |
Framework | |
Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems
Title | Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems |
Authors | Zhe Dong, Bryan A. Seybold, Kevin P. Murphy, Hung H. Bui |
Abstract | We propose an efficient inference method for switching nonlinear dynamical systems. The key idea is to learn an inference network which can be used as a proposal distribution for the continuous latent variables, while performing exact marginalization of the discrete latent variables. This allows us to use the reparameterization trick, and apply end-to-end training with stochastic gradient descent. We show that the proposed method can successfully segment time series data, including videos and 3D human pose, into meaningful ``regimes’’ by using the piece-wise nonlinear dynamics. | |
Tasks | Time Series |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09588v2 |
https://arxiv.org/pdf/1910.09588v2.pdf | |
PWC | https://paperswithcode.com/paper/collapsed-amortized-variational-inference-for |
Repo | |
Framework | |
Improving learnability of neural networks: adding supplementary axes to disentangle data representation
Title | Improving learnability of neural networks: adding supplementary axes to disentangle data representation |
Authors | Kim Bukweon, Lee Sung Min, Seo Jin Keun |
Abstract | Over-parameterized deep neural networks have proven to be able to learn an arbitrary dataset with 100$%$ training accuracy. Because of a risk of overfitting and computational cost issues, we cannot afford to increase the number of network nodes if we want achieve better training results for medical images. Previous deep learning research shows that the training ability of a neural network improves dramatically (for the same epoch of training) when a few nodes with supplementary information are added to the network. These few informative nodes allow the network to learn features that are otherwise difficult to learn by generating a disentangled data representation. This paper analyzes how concatenation of additional information as supplementary axes affects the training of the neural networks. This analysis was conducted for a simple multilayer perceptron (MLP) classification model with a rectified linear unit (ReLU) on two-dimensional training data. We compared the networks with and without concatenation of supplementary information to support our analysis. The model with concatenation showed more robust and accurate training results compared to the model without concatenation. We also confirmed that our findings are valid for deeper convolutional neural networks (CNN) using ultrasound images and for a conditional generative adversarial network (cGAN) using the MNIST data. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04205v1 |
http://arxiv.org/pdf/1902.04205v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-learnability-of-neural-networks |
Repo | |
Framework | |
Accurate Trajectory Prediction for Autonomous Vehicles
Title | Accurate Trajectory Prediction for Autonomous Vehicles |
Authors | Michael Diodato, Yu Li, Antonia Lovjer, Minsu Yeom, Albert Song, Yiyang Zeng, Abhay Khosla, Benedikt Schifferer, Manik Goyal, Iddo Drori |
Abstract | Predicting vehicle trajectories, angle and speed is important for safe and comfortable driving. We demonstrate the best predicted angle, speed, and best performance overall winning the top three places of the ICCV 2019 Learning to Drive challenge. Our key contributions are (i) a general neural network system architecture which embeds and fuses together multiple inputs by encoding, and decodes multiple outputs using neural networks, (ii) using pre-trained neural networks for augmenting the given input data with segmentation maps and semantic information, and (iii) leveraging the form and distribution of the expected output in the model. |
Tasks | Autonomous Vehicles, Trajectory Prediction |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.08568v1 |
https://arxiv.org/pdf/1911.08568v1.pdf | |
PWC | https://paperswithcode.com/paper/accurate-trajectory-prediction-for-autonomous |
Repo | |
Framework | |
Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks
Title | Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks |
Authors | David Stutz, Matthias Hein, Bernt Schiele |
Abstract | Adversarial training yields robust models against a specific threat model, e.g., $L_\infty$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_p$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_\infty$ adversarial examples, increases robustness against larger $L_\infty$, $L_2$, $L_1$ and $L_0$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by maximizing confidence. For each threat model, we use $7$ attacks with up to $50$ restarts and $5000$ iterations and report worst-case robust test error, extended to our confidence-thresholded setting, across all attacks. |
Tasks | |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06259v3 |
https://arxiv.org/pdf/1910.06259v3.pdf | |
PWC | https://paperswithcode.com/paper/confidence-calibrated-adversarial-training |
Repo | |
Framework | |
Dilated Convolution with Dilated GRU for Music Source Separation
Title | Dilated Convolution with Dilated GRU for Music Source Separation |
Authors | Jen-Yu Liu, Yi-Hsuan Yang |
Abstract | Stacked dilated convolutions used in Wavenet have been shown effective for generating high-quality audios. By replacing pooling/striding with dilation in convolution layers, they can preserve high-resolution information and still reach distant locations. Producing high-resolution predictions is also crucial in music source separation, whose goal is to separate different sound sources while maintaining the quality of the separated sounds. Therefore, this paper investigates using stacked dilated convolutions as the backbone for music source separation. However, while stacked dilated convolutions can reach wider context than standard convolutions, their effective receptive fields are still fixed and may not be wide enough for complex music audio signals. To reach information at remote locations, we propose to combine dilated convolution with a modified version of gated recurrent units (GRU) called the `Dilated GRU’ to form a block. A Dilated GRU unit receives information from k steps before instead of the previous step for a fixed k. This modification allows a GRU unit to reach a location with fewer recurrent steps and run faster because it can execute partially in parallel. We show that the proposed model with a stack of such blocks performs equally well or better than the state-of-the-art models for separating vocals and accompaniments. | |
Tasks | Music Source Separation |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01203v1 |
https://arxiv.org/pdf/1906.01203v1.pdf | |
PWC | https://paperswithcode.com/paper/dilated-convolution-with-dilated-gru-for |
Repo | |
Framework | |
What does it mean to understand a neural network?
Title | What does it mean to understand a neural network? |
Authors | Timothy P. Lillicrap, Konrad P. Kording |
Abstract | We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In analogy, we conjecture that rules for development and learning in brains may be far easier to understand than their resulting properties. The analogy suggests that neuroscience would benefit from a focus on learning and development. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06374v1 |
https://arxiv.org/pdf/1907.06374v1.pdf | |
PWC | https://paperswithcode.com/paper/what-does-it-mean-to-understand-a-neural |
Repo | |
Framework | |
Crypto Mining Makes Noise
Title | Crypto Mining Makes Noise |
Authors | Maurantonio Caprolu, Simone Raponi, Gabriele Oligeri, Roberto Di Pietro |
Abstract | A new cybersecurity attack (cryptojacking) is emerging, in both the literature and in the wild, where an adversary illicitly runs Crypto-clients software over the devices of unaware users. This attack has been proved to be very effective given the simplicity of running a Crypto-client into a target device, e.g., by means of web-based Java scripting. In this scenario, we propose Crypto-Aegis, a solution to detect and identify Crypto-clients network traffic–even when it is VPN-ed. In detail, our contributions are the following: (i) We identify and model a new type of attack, i.e., the sponge-attack, being a generalization of cryptojacking; (ii) We provide a detailed analysis of real network traffic generated by 3 major cryptocurrencies; (iii) We investigate how VPN tunneling shapes the network traffic generated by Crypto-clients by considering two major VPNbrands; (iv) We propose Crypto-Aegis, a Machine Learning (ML) based framework that builds over the previous steps to detect crypto-mining activities; and, finally, (v) We compare our results against competing solutions in the literature. Evidence from of our experimental campaign show the exceptional quality and viability of our solution–Crypto-Aegis achieves an F1-score of 0.96 and an AUC of 0.99. Given the extent and novelty of the addressed threat we believe that our approach and our results, other than being interesting on their own, also pave the way for further research in this area. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09272v1 |
https://arxiv.org/pdf/1910.09272v1.pdf | |
PWC | https://paperswithcode.com/paper/crypto-mining-makes-noise |
Repo | |
Framework | |
Secure Distributed On-Device Learning Networks With Byzantine Adversaries
Title | Secure Distributed On-Device Learning Networks With Byzantine Adversaries |
Authors | Yanjie Dong, Julian Cheng, Md. Jahangir Hossain, Victor C. M. Leung |
Abstract | The privacy concern exists when the central server has the copies of datasets. Hence, there is a paradigm shift for the learning networks to change from centralized in-cloud learning to distributed \mbox{on-device} learning. Benefit from the parallel computing, the on-device learning networks have a lower bandwidth requirement than the in-cloud learning networks. Moreover, the on-device learning networks also have several desirable characteristics such as privacy preserving and flexibility. However, the \mbox{on-device} learning networks are vulnerable to the malfunctioning terminals across the networks. The worst-case malfunctioning terminals are the Byzantine adversaries, that can perform arbitrary harmful operations to compromise the learned model based on the full knowledge of the networks. Hence, the design of secure learning algorithms becomes an emerging topic in the on-device learning networks with Byzantine adversaries. In this article, we present a comprehensive overview of the prevalent secure learning algorithms for the two promising on-device learning networks: Federated-Learning networks and decentralized-learning networks. We also review several future research directions in the \mbox{Federated-Learning} and decentralized-learning networks. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00887v1 |
https://arxiv.org/pdf/1906.00887v1.pdf | |
PWC | https://paperswithcode.com/paper/190600887 |
Repo | |
Framework | |
Active Learning with TensorBoard Projector
Title | Active Learning with TensorBoard Projector |
Authors | Francois Luus, Naweed Khan, Ismail Akhalwaya |
Abstract | An ML-based system for interactive labeling of image datasets is contributed in TensorBoard Projector to speed up image annotation performed by humans. The tool visualizes feature spaces and makes it directly editable by online integration of applied labels, and it is a system for verifying and managing machine learning data pertaining to labels. We propose realistic annotation emulation to evaluate the system design of interactive active learning, based on our improved semi-supervised extension of t-SNE dimensionality reduction. Our active learning tool can significantly increase labeling efficiency compared to uncertainty sampling, and we show that less than 100 labeling actions are typically sufficient for good classification on a variety of specialized image datasets. Our contribution is unique given that it needs to perform dimensionality reduction, feature space visualization and editing, interactive label propagation, low-complexity active learning, human perceptual modeling, annotation emulation and unsupervised feature extraction for specialized datasets in a production-quality implementation. |
Tasks | Active Learning, Dimensionality Reduction |
Published | 2019-01-03 |
URL | http://arxiv.org/abs/1901.00675v1 |
http://arxiv.org/pdf/1901.00675v1.pdf | |
PWC | https://paperswithcode.com/paper/active-learning-with-tensorboard-projector |
Repo | |
Framework | |
TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC
Title | TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC |
Authors | Somdip Dey, Amit K. Singh, Dilip K. Prasad, Klaus D. McDonald-Maier |
Abstract | Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless ANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs. |
Tasks | |
Published | 2019-02-15 |
URL | http://arxiv.org/abs/1902.05657v2 |
http://arxiv.org/pdf/1902.05657v2.pdf | |
PWC | https://paperswithcode.com/paper/tmav-temporal-motionless-analysis-of-video |
Repo | |
Framework | |
Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective
Title | Learning Curves for Deep Neural Networks: A Gaussian Field Theory Perspective |
Authors | Omry Cohen, Or Malka, Zohar Ringel |
Abstract | A series of recent works established a rigorous correspondence between very wide deep neural networks (DNNs), trained in a particular manner, and noiseless Bayesian Inference with a certain Gaussian Process (GP) known as the Neural Tangent Kernel (NTK). Here we extend a known field-theory formalism for GP inference to get a detailed understanding of learning-curves in DNNs trained in the regime of this correspondence (NTK regime). In particular, a renormalization-group approach is used to show that noiseless GP inference using NTK, which lacks a good analytical handle, can be well approximated by noisy GP inference on a related kernel we call the renormalized NTK. Following this, a perturbation-theory analysis is carried in one over the dataset-size yielding analytical expressions for the (fixed-teacher/fixed-target) leading and sub-leading asymptotics of the learning curves. At least for uniform datasets, a coherent picture emerges wherein fully-connected DNNs have a strong implicit bias towards functions which are low order polynomials of the input. |
Tasks | Bayesian Inference, Gaussian Processes |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05301v2 |
https://arxiv.org/pdf/1906.05301v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-curves-for-deep-neural-networks-a |
Repo | |
Framework | |
Stochastic Fairness and Language-Theoretic Fairness in Planning on Nondeterministic Domains
Title | Stochastic Fairness and Language-Theoretic Fairness in Planning on Nondeterministic Domains |
Authors | Benjamin Aminof, Giuseppe De Giacomo, Sasha Rubin |
Abstract | We address two central notions of fairness in the literature of planning on nondeterministic fully observable domains. The first, which we call stochastic fairness, is classical, and assumes an environment which operates probabilistically using possibly unknown probabilities. The second, which is language-theoretic, assumes that if an action is taken from a given state infinitely often then all its possible outcomes should appear infinitely often (we call this state-action fairness). While the two notions coincide for standard reachability goals, they diverge for temporally extended goals. This important difference has been overlooked in the planning literature, and we argue has led to confusion in a number of published algorithms which use reductions that were stated for state-action fairness, for which they are incorrect, while being correct for stochastic fairness. We remedy this and provide an optimal sound and complete algorithm for solving state-action fair planning for LTL/LTLf goals, as well as a correct proof of the lower bound of the goal-complexity (our proof is general enough that it provides new proofs also for the no-fairness and stochastic-fairness cases). Overall, we show that stochastic fairness is better behaved than state-action fairness. |
Tasks | |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11203v1 |
https://arxiv.org/pdf/1912.11203v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-fairness-and-language-theoretic |
Repo | |
Framework | |