Paper Group ANR 265
A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis. HDDL – A Language to Describe Hierarchical Planning Problems. A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization. OrthographicNet: A Deep Learning Approach for 3D Object Recognition in Open-Ended Domains. High-performance stock index trad …
A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis
Title | A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis |
Authors | Yunlong Liang, Fandong Meng, Jinchao Zhang, Jinan Xu, Yufeng Chen, Jie Zhou |
Abstract | Aspect based sentiment analysis (ABSA) aims to identify the sentiment polarity towards the given aspect in a sentence, while previous models typically exploit an aspect-independent (weakly associative) encoder for sentence representation generation. In this paper, we propose a novel Aspect-Guided Deep Transition model, named AGDT, which utilizes the given aspect to guide the sentence encoding from scratch with the specially-designed deep transition architecture. Furthermore, an aspect-oriented objective is designed to enforce AGDT to reconstruct the given aspect with the generated sentence representation. In doing so, our AGDT can accurately generate aspect-specific sentence representation, and thus conduct more accurate sentiment predictions. Experimental results on multiple SemEval datasets demonstrate the effectiveness of our proposed approach, which significantly outperforms the best reported results with the same setting. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.00324v1 |
https://arxiv.org/pdf/1909.00324v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-aspect-guided-deep-transition-model |
Repo | |
Framework | |
HDDL – A Language to Describe Hierarchical Planning Problems
Title | HDDL – A Language to Describe Hierarchical Planning Problems |
Authors | D. Höller, G. Behnke, P. Bercher, S. Biundo, H. Fiorino, D. Pellier, R. Alford |
Abstract | The research in hierarchical planning has made considerable progress in the last few years. Many recent systems do not rely on hand-tailored advice anymore to find solutions, but are supposed to be domain-independent systems that come with sophisticated solving techniques. In principle, this development would make the comparison between systems easier (because the domains are not tailored to a single system anymore) and – much more important – also the integration into other systems, because the modeling process is less tedious (due to the lack of advice) and there is no (or less) commitment to a certain planning system the model is created for. However, these advantages are destroyed by the lack of a common input language and feature set supported by the different systems. In this paper, we propose an extension to PDDL, the description language used in non-hierarchical planning, to the needs of hierarchical planning systems. We restrict our language to a basic feature set shared by many recent systems, give an extension of PDDL’s EBNF syntax definition, and discuss our extensions with respect to several planner-specific input languages from related work. |
Tasks | |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05499v1 |
https://arxiv.org/pdf/1911.05499v1.pdf | |
PWC | https://paperswithcode.com/paper/hddl-a-language-to-describe-hierarchical |
Repo | |
Framework | |
A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization
Title | A Stochastic Extra-Step Quasi-Newton Method for Nonsmooth Nonconvex Optimization |
Authors | Minghan Yang, Andre Milzarek, Zaiwen Wen, Tong Zhang |
Abstract | In this paper, a novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems. We assume that the gradient of the smooth part of the objective function can only be approximated by stochastic oracles. The proposed method combines general stochastic higher order steps derived from an underlying proximal type fixed-point equation with additional stochastic proximal gradient steps to guarantee convergence. Based on suitable bounds on the step sizes, we establish global convergence to stationary points in expectation and an extension of the approach using variance reduction techniques is discussed. Motivated by large-scale and big data applications, we investigate a stochastic coordinate-type quasi-Newton scheme that allows to generate cheap and tractable stochastic higher order directions. Finally, the proposed algorithm is tested on large-scale logistic regression and deep learning problems and it is shown that it compares favorably with other state-of-the-art methods. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09373v1 |
https://arxiv.org/pdf/1910.09373v1.pdf | |
PWC | https://paperswithcode.com/paper/a-stochastic-extra-step-quasi-newton-method |
Repo | |
Framework | |
OrthographicNet: A Deep Learning Approach for 3D Object Recognition in Open-Ended Domains
Title | OrthographicNet: A Deep Learning Approach for 3D Object Recognition in Open-Ended Domains |
Authors | Hamidreza Kasaei |
Abstract | Service robots are expected to be more autonomous and efficiently work in human-centric environments. For this type of robots, open-ended object recognition is a challenging task due to the high demand for two essential capabilities: (i) the accurate and real-time response, and (ii) the ability to learn new object categories from very few examples on-site. These capabilities are required for such robots since no matter how extensive the training data used for batch learning, the robot might be faced with an unknown object when operating in everyday environments. In this work, we present OrthographicNet, a deep transfer learning based approach, for 3D object recognition in open-ended domains. In particular, OrthographicNet generates a rotation and scale invariant global feature for a given object, enabling to recognize the same or similar objects seen from different perspectives. Experimental results show that our approach yields significant improvements over the previous state-of-the-art approaches concerning scalability, memory usage, and object recognition performance. Regarding real-time performance, two real-world demonstrations validate the promising performance of the proposed architecture. Moreover, our approach demonstrates the capability of learning from very few training examples in a real-world setting. |
Tasks | 3D Object Recognition, Object Recognition, Transfer Learning |
Published | 2019-02-08 |
URL | http://arxiv.org/abs/1902.03057v1 |
http://arxiv.org/pdf/1902.03057v1.pdf | |
PWC | https://paperswithcode.com/paper/orthographicnet-a-deep-learning-approach-for |
Repo | |
Framework | |
High-performance stock index trading: making effective use of a deep LSTM neural network
Title | High-performance stock index trading: making effective use of a deep LSTM neural network |
Authors | Chariton Chalvatzis, Dimitrios Hristu-Varsakelis |
Abstract | We present a deep long short-term memory (LSTM)-based neural network for predicting asset prices, together with a successful trading strategy for generating profits based on the model’s predictions. Our work is motivated by the fact that the effectiveness of any prediction model is inherently coupled to the trading strategy it is used with, and vise versa. This highlights the difficulty in developing models and strategies which are jointly optimal, but also points to avenues of investigation which are broader than prevailing approaches. Our LSTM model is structurally simple and generates predictions based on price observations over a modest number of past trading days. The model’s architecture is tuned to promote profitability, as opposed to accuracy, under a strategy that does not trade simply based on whether the price is predicted to rise or fall, but rather takes advantage of the distribution of predicted returns, and the fact that a prediction’s position within that distribution carries useful information about the expected profitability of a trade. The proposed model and trading strategy were tested on the S&P 500, Dow Jones Industrial Average (DJIA), NASDAQ and Russel 2000 stock indices, and achieved cumulative returns of 340%, 185%, 371% and 360%, respectively, over 2010-2018, far outperforming the benchmark buy-and-hold strategy as well as other recent efforts. |
Tasks | |
Published | 2019-02-07 |
URL | https://arxiv.org/abs/1902.03125v2 |
https://arxiv.org/pdf/1902.03125v2.pdf | |
PWC | https://paperswithcode.com/paper/high-performance-stock-index-trading-making |
Repo | |
Framework | |
Instant Quantization of Neural Networks using Monte Carlo Methods
Title | Instant Quantization of Neural Networks using Monte Carlo Methods |
Authors | Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller |
Abstract | Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training. |
Tasks | Quantization |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12253v2 |
https://arxiv.org/pdf/1905.12253v2.pdf | |
PWC | https://paperswithcode.com/paper/instant-quantization-of-neural-networks-using |
Repo | |
Framework | |
Defocused images removal of axial overlapping scattering particles by using three-dimensional nonlinear diffusion based on digital holography
Title | Defocused images removal of axial overlapping scattering particles by using three-dimensional nonlinear diffusion based on digital holography |
Authors | Wei-Na Li, Zhengyun Zhang, Jianshe Ma, Xiaohao Wang, Ping Su |
Abstract | We propose a three-dimensional nonlinear diffusion method to implement the similar autofocusing function of multiple micro-objects and simultaneously remove the defocused images, which can distinguish the locations of certain sized scattering particles that are overlapping along z-axis. It is applied to all of the reconstruction slices that are generated from the captured hologram after each back propagation. For certain small sized particles, the maxima of maximum gradient magnitude of each reconstruction slice appears at the ground truth z position after applying the proposed scheme when the reconstruction range along z-axis is sufficiently long and the reconstruction depth spacing is sufficiently fine. Therefore, the reconstructed image at ground truth z position is remained, while the defocused images are diffused out. The results demonstrated that the proposed scheme can diffuse out the defocused images which are 20 um away from the ground truth z position in spite of that several scattering particles with different diameters are completely overlapping along z-axis with a distance of 800 um when the hologram pixel pitch is 2 um. It also demonstrated that the sparsity distribution of the ground truth z slice cannot be affected by the sparsity distribution of corresponding defocused images when the diameter of the particle is not more than 35um and the reconstruction depth spacing is not less than 20 um. |
Tasks | |
Published | 2019-04-24 |
URL | https://arxiv.org/abs/1904.10613v4 |
https://arxiv.org/pdf/1904.10613v4.pdf | |
PWC | https://paperswithcode.com/paper/unfocused-images-removal-of-z-axis |
Repo | |
Framework | |
HMM-guided frame querying for bandwidth-constrained video search
Title | HMM-guided frame querying for bandwidth-constrained video search |
Authors | Bhairav Chidambaram, Mason McGill, Pietro Perona |
Abstract | We design an agent to search for frames of interest in video stored on a remote server, under bandwidth constraints. Using a convolutional neural network to score individual frames and a hidden Markov model to propagate predictions across frames, our agent accurately identifies temporal regions of interest based on sparse, strategically sampled frames. On a subset of the ImageNet-VID dataset, we demonstrate that using a hidden Markov model to interpolate between frame scores allows requests of 98% of frames to be omitted, without compromising frame-of-interest classification accuracy. |
Tasks | |
Published | 2019-12-31 |
URL | https://arxiv.org/abs/2001.00057v1 |
https://arxiv.org/pdf/2001.00057v1.pdf | |
PWC | https://paperswithcode.com/paper/hmm-guided-frame-querying-for-bandwidth |
Repo | |
Framework | |
Generative Exploration and Exploitation
Title | Generative Exploration and Exploitation |
Authors | Jiechuan Jiang, Zongqing Lu |
Abstract | Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing. |
Tasks | |
Published | 2019-04-21 |
URL | https://arxiv.org/abs/1904.09605v2 |
https://arxiv.org/pdf/1904.09605v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-exploration-and-exploitation |
Repo | |
Framework | |
Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning
Title | Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning |
Authors | Ashok Cutkosky |
Abstract | We provide algorithms that guarantee regret $R_T(u)\le \tilde O(G\u^3 + G(\u+1)\sqrt{T})$ or $R_T(u)\le \tilde O(G\u^3T^{1/3} + GT^{1/3}+ G\u\sqrt{T})$ for online convex optimization with $G$-Lipschitz losses for any comparison point $u$ without prior knowledge of either $G$ or $\u$. Previous algorithms dispense with the $O(\u^3)$ term at the expense of knowledge of one or both of these parameters, while a lower bound shows that some additional penalty term over $G\u\sqrt{T}$ is necessary. Previous penalties were exponential while our bounds are polynomial in all quantities. Further, given a known bound $\u\le D$, our same techniques allow us to design algorithms that adapt optimally to the unknown value of $\u$ without requiring knowledge of $G$. |
Tasks | |
Published | 2019-02-24 |
URL | http://arxiv.org/abs/1902.09013v1 |
http://arxiv.org/pdf/1902.09013v1.pdf | |
PWC | https://paperswithcode.com/paper/artificial-constraints-and-lipschitz-hints |
Repo | |
Framework | |
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Title | Evaluating the Underlying Gender Bias in Contextualized Word Embeddings |
Authors | Christine Basta, Marta R. Costa-jussà, Noe Casas |
Abstract | Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased. |
Tasks | Word Embeddings |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08783v1 |
http://arxiv.org/pdf/1904.08783v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-the-underlying-gender-bias-in |
Repo | |
Framework | |
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Title | Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization |
Authors | Xinyan Li, Qilong Gu, Yingxue Zhou, Tiancong Chen, Arindam Banerjee |
Abstract | While stochastic gradient descent (SGD) and variants have been surprisingly successful for training deep nets, several aspects of the optimization dynamics and generalization are still not well understood. In this paper, we present new empirical observations and theoretical results on both the optimization dynamics and generalization behavior of SGD for deep nets based on the Hessian of the training loss and associated quantities. We consider three specific research questions: (1) what is the relationship between the Hessian of the loss and the second moment of stochastic gradients (SGs)? (2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs? and (3) how can we characterize a scale-invariant generalization bound of deep nets based on the Hessian of the loss, which by itself is not scale invariant? We shed light on these three questions using theoretical results supported by extensive empirical observations, with experiments on synthetic data, MNIST, and CIFAR-10, with different batch sizes, and with different difficulty levels by synthetically adding random labels. |
Tasks | Stochastic Optimization |
Published | 2019-07-24 |
URL | https://arxiv.org/abs/1907.10732v1 |
https://arxiv.org/pdf/1907.10732v1.pdf | |
PWC | https://paperswithcode.com/paper/hessian-based-analysis-of-sgd-for-deep-nets |
Repo | |
Framework | |
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Title | The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study |
Authors | Daniel S. Park, Jascha Sohl-Dickstein, Quoc V. Le, Samuel L. Smith |
Abstract | We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search to study how the test error depends on learning rate, batch size, and network width. We find that the optimal SGD hyper-parameters are determined by a “normalized noise scale,” which is a function of the batch size, learning rate, and initialization conditions. In the absence of batch normalization, the optimal normalized noise scale is directly proportional to width. Wider networks, with their higher optimal noise scale, also achieve higher test accuracy. These observations hold for MLPs, ConvNets, and ResNets, and for two different parameterization schemes (“Standard” and “NTK”). We observe a similar trend with batch normalization for ResNets. Surprisingly, since the largest stable learning rate is bounded, the largest batch size consistent with the optimal normalized noise scale decreases as the width increases. |
Tasks | |
Published | 2019-05-09 |
URL | https://arxiv.org/abs/1905.03776v1 |
https://arxiv.org/pdf/1905.03776v1.pdf | |
PWC | https://paperswithcode.com/paper/190503776 |
Repo | |
Framework | |
Explaining and Interpreting LSTMs
Title | Explaining and Interpreting LSTMs |
Authors | Leila Arras, Jose A. Arjona-Medina, Michael Widrich, Grégoire Montavon, Michael Gillhofer, Klaus-Robert Müller, Sepp Hochreiter, Wojciech Samek |
Abstract | While neural networks have acted as a strong unifying force in the design of modern AI systems, the neural network architectures themselves remain highly heterogeneous due to the variety of tasks to be solved. In this chapter, we explore how to adapt the Layer-wise Relevance Propagation (LRP) technique used for explaining the predictions of feed-forward networks to the LSTM architecture used for sequential data modeling and forecasting. The special accumulators and gated interactions present in the LSTM require both a new propagation scheme and an extension of the underlying theoretical framework to deliver faithful explanations. |
Tasks | |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.12114v1 |
https://arxiv.org/pdf/1909.12114v1.pdf | |
PWC | https://paperswithcode.com/paper/explaining-and-interpreting-lstms |
Repo | |
Framework | |
Distributed Deep Learning for Precipitation Nowcasting
Title | Distributed Deep Learning for Precipitation Nowcasting |
Authors | Siddharth Samsi, Christopher J. Mattioli, Mark S. Veillette |
Abstract | Effective training of Deep Neural Networks requires massive amounts of data and compute. As a result, longer times are needed to train complex models requiring large datasets, which can severely limit research on model development and the exploitation of all available data. In this paper, this problem is investigated in the context of precipitation nowcasting, a term used to describe highly detailed short-term forecasts of precipitation and other hazardous weather. Convolutional Neural Networks (CNNs) are a powerful class of models that are well-suited for this task; however, the high resolution input weather imagery combined with model complexity required to process this data makes training CNNs to solve this task time consuming. To address this issue, a data-parallel model is implemented where a CNN is replicated across multiple compute nodes and the training batches are distributed across multiple nodes. By leveraging multiple GPUs, we show that the training time for a given nowcasting model architecture can be reduced from 59 hours to just over 1 hour. This will allow for faster iterations for improving CNN architectures and will facilitate future advancement in the area of nowcasting. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10964v1 |
https://arxiv.org/pdf/1908.10964v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-deep-learning-for-precipitation |
Repo | |
Framework | |