February 1, 2020

2898 words 14 mins read

Paper Group AWR 195

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance. A Deep Learning Approach to Grasping the Invisible. Simple and Effective Paraphrastic Similarity from Parallel Translations. Fisher-Bures Adversary Graph Convolutional Networks. Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Rep …

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance


Title	MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Authors	Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger
Abstract	A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.
Tasks	Data-to-Text Generation, Image Captioning, Machine Translation, Text Generation
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02622v2
PDF	https://arxiv.org/pdf/1909.02622v2.pdf
PWC	https://paperswithcode.com/paper/moverscore-text-generation-evaluating-with
Repo	https://github.com/AIPHES/emnlp19-moverscore
Framework	none

A Deep Learning Approach to Grasping the Invisible


Title	A Deep Learning Approach to Grasping the Invisible
Authors	Yang Yang, Hengyue Liang, Changhyun Choi
Abstract	We study an emerging problem named “grasping the invisible” in robotic manipulation, in which a robot is tasked to grasp an initially invisible target object via a sequence of pushing and grasping actions. In this problem, pushes are needed to search for the target and rearrange cluttered objects around it to enable effective grasps. We propose to solve the problem by formulating a deep learning approach in a critic-policy format. The target-oriented motion critic, which maps both visual observations and target information to the expected future rewards of pushing and grasping motion primitives, is learned via deep Q-learning. We divide the problem into two subtasks, and two policies are proposed to tackle each of them, by combining the critic predictions and relevant domain knowledge. A Bayesian-based policy accounting for past action experience performs pushing to search for the target; once the target is found, a classifier-based policy coordinates target-oriented pushing and grasping to grasp the target in clutter. The motion critic and the classifier are trained in a self-supervised manner through robot-environment interactions. Our system achieves a 93% and 87% task success rate on each of the two subtasks in simulation and an 85% task success rate in real robot experiments on the whole problem, which outperforms several baselines by large margins. Supplementary material is available at https://sites.google.com/umn.edu/grasping-invisible.
Tasks	Q-Learning
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04840v2
PDF	https://arxiv.org/pdf/1909.04840v2.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-to-grasping-the
Repo	https://github.com/choicelab/grasping-invisible
Framework	pytorch

Simple and Effective Paraphrastic Similarity from Parallel Translations


Title	Simple and Effective Paraphrastic Similarity from Parallel Translations
Authors	John Wieting, Kevin Gimpel, Graham Neubig, Taylor Berg-Kirkpatrick
Abstract	We present a model and methodology for learning paraphrastic sentence embeddings directly from bitext, removing the time-consuming intermediate step of creating paraphrase corpora. Further, we show that the resulting model can be applied to cross-lingual tasks where it both outperforms and is orders of magnitude faster than more complex state-of-the-art baselines.
Tasks	Sentence Embeddings
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13872v1
PDF	https://arxiv.org/pdf/1909.13872v1.pdf
PWC	https://paperswithcode.com/paper/simple-and-effective-paraphrastic-similarity-1
Repo	https://github.com/jwieting/simple-and-effective-paraphrastic-similarity
Framework	pytorch

Fisher-Bures Adversary Graph Convolutional Networks


Title	Fisher-Bures Adversary Graph Convolutional Networks
Authors	Ke Sun, Piotr Koniusz, Zhen Wang
Abstract	In a graph convolutional network, we assume that the graph $G$ is generated wrt some observation noise. During learning, we make small random perturbations $\Delta{}G$ of the graph and try to improve generalization. Based on quantum information geometry, $\Delta{}G$ can be characterized by the eigendecomposition of the graph Laplacian matrix. We try to minimize the loss wrt the perturbed $G+\Delta{G}$ while making $\Delta{G}$ to be effective in terms of the Fisher information of the neural network. Our proposed model can consistently improve graph convolutional networks on semi-supervised node classification tasks with reasonable computational overhead. We present three different geometries on the manifold of graphs: the intrinsic geometry measures the information theoretic dynamics of a graph; the extrinsic geometry characterizes how such dynamics can affect externally a graph neural network; the embedding geometry is for measuring node embeddings. These new analytical tools are useful in developing a good understanding of graph neural networks and fostering new techniques.
Tasks	Node Classification
Published	2019-03-11
URL	https://arxiv.org/abs/1903.04154v2
PDF	https://arxiv.org/pdf/1903.04154v2.pdf
PWC	https://paperswithcode.com/paper/fisher-bures-adversary-graph-convolutional
Repo	https://github.com/stellargraph/FisherGCN
Framework	tf

Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations


Title	Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations
Authors	Mingda Chen, Zewei Chu, Kevin Gimpel
Abstract	Prior work on pretrained sentence embeddings and benchmarks focus on the capabilities of stand-alone sentences. We propose DiscoEval, a test suite of tasks to evaluate whether sentence representations include broader context information. We also propose a variety of training objectives that makes use of natural annotations from Wikipedia to build sentence encoders capable of modeling discourse. We benchmark sentence encoders pretrained with our proposed training objectives, as well as other popular pretrained sentence encoders on DiscoEval and other sentence evaluation tasks. Empirically, we show that these training objectives help to encode different aspects of information in document structures. Moreover, BERT and ELMo demonstrate strong performances over DiscoEval with individual hidden layers showing different characteristics.
Tasks	Sentence Embeddings
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00142v2
PDF	https://arxiv.org/pdf/1909.00142v2.pdf
PWC	https://paperswithcode.com/paper/evaluation-benchmarks-and-learning
Repo	https://github.com/ZeweiChu/DiscoEval
Framework	pytorch

An Optimistic Perspective on Offline Reinforcement Learning


Title	An Optimistic Perspective on Offline Reinforcement Learning
Authors	Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
Abstract	Off-policy reinforcement learning (RL) using a fixed offline dataset of logged interactions is an important consideration in real world applications. This paper studies offline RL using the DQN replay dataset comprising the entire replay experience of a DQN agent on 60 Atari 2600 games. We demonstrate that recent off-policy deep RL algorithms, even when trained solely on this replay dataset, outperform the fully trained DQN agent. To enhance generalization in the offline setting, we present Random Ensemble Mixture (REM), a robust Q-learning algorithm that enforces optimal Bellman consistency on random convex combinations of multiple Q-value estimates. Offline REM trained on the DQN replay dataset surpasses strong RL baselines. The results here present an optimistic view that robust RL algorithms trained on sufficiently large and diverse offline datasets can lead to high quality policies. The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.
Tasks	Atari Games, Q-Learning
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04543v3
PDF	https://arxiv.org/pdf/1907.04543v3.pdf
PWC	https://paperswithcode.com/paper/striving-for-simplicity-in-off-policy-deep
Repo	https://github.com/google-research/batch_rl
Framework	tf

TreeGrad: Transferring Tree Ensembles to Neural Networks


Title	TreeGrad: Transferring Tree Ensembles to Neural Networks
Authors	Chapman Siu
Abstract	Gradient Boosting Decision Tree (GBDT) are popular machine learning algorithms with implementations such as LightGBM and in popular machine learning toolkits like Scikit-Learn. Many implementations can only produce trees in an offline manner and in a greedy manner. We explore ways to convert existing GBDT implementations to known neural network architectures with minimal performance loss in order to allow decision splits to be updated in an online manner and provide extensions to allow splits points to be altered as a neural architecture search problem. We provide learning bounds for our neural network.
Tasks	Neural Architecture Search
Published	2019-04-25
URL	https://arxiv.org/abs/1904.11132v3
PDF	https://arxiv.org/pdf/1904.11132v3.pdf
PWC	https://paperswithcode.com/paper/treegrad-transferring-tree-ensembles-to
Repo	https://github.com/chappers/TreeGrad
Framework	tf

Adaptive gradient descent without descent


Title	Adaptive gradient descent without descent
Authors	Yura Malitsky, Konstantin Mishchenko
Abstract	We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don’t increase the stepsize too fast and 2) don’t overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including matrix factorization and training of ResNet-18.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09529v1
PDF	https://arxiv.org/pdf/1910.09529v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-gradient-descent-without-descent
Repo	https://github.com/ymalitsky/adaptive_gd
Framework	none

MLPerf Inference Benchmark


Title	MLPerf Inference Benchmark
Authors	Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou
Abstract	Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and four orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf implements a set of rules and practices to ensure comparability across systems with wildly differing architectures. In this paper, we present the method and design principles of the initial MLPerf Inference release. The first call for submissions garnered more than 600 inference-performance measurements from 14 organizations, representing over 30 systems that show a range of capabilities.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02549v1
PDF	https://arxiv.org/pdf/1911.02549v1.pdf
PWC	https://paperswithcode.com/paper/mlperf-inference-benchmark
Repo	https://github.com/mlperf/inference
Framework	pytorch

Manifold Criterion Guided Transfer Learning via Intermediate Domain Generation


Title	Manifold Criterion Guided Transfer Learning via Intermediate Domain Generation
Authors	Lei Zhang, Shanshan Wang, Guang-Bin Huang, Wangmeng Zuo, Jian Yang, David Zhang
Abstract	In many practical transfer learning scenarios, the feature distribution is different across the source and target domains (i.e. non-i.i.d.). Maximum mean discrepancy (MMD), as a domain discrepancy metric, has achieved promising performance in unsupervised domain adaptation (DA). We argue that MMD-based DA methods ignore the data locality structure, which, to some extent, would cause the negative transfer effect. The locality plays an important role in minimizing the nonlinear local domain discrepancy underlying the marginal distributions. For better exploiting the domain locality, a novel local generative discrepancy metric (LGDM) based intermediate domain generation learning called Manifold Criterion guided Transfer Learning (MCTL) is proposed in this paper. The merits of the proposed MCTL are four-fold: 1) the concept of manifold criterion (MC) is first proposed as a measure validating the distribution matching across domains, and domain adaptation is achieved if the MC is satisfied; 2) the proposed MC can well guide the generation of the intermediate domain sharing similar distribution with the target domain, by minimizing the local domain discrepancy; 3) a global generative discrepancy metric (GGDM) is presented, such that both the global and local discrepancy can be effectively and positively reduced; 4) a simplified version of MCTL called MCTL-S is presented under a perfect domain generation assumption for more generic learning scenario. Experiments on a number of benchmark visual transfer tasks demonstrate the superiority of the proposed manifold criterion guided generative transfer method, by comparing with other state-of-the-art methods. The source code is available in https://github.com/wangshanshanCQU/MCTL.
Tasks	Domain Adaptation, Transfer Learning, Unsupervised Domain Adaptation
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10211v1
PDF	http://arxiv.org/pdf/1903.10211v1.pdf
PWC	https://paperswithcode.com/paper/manifold-criterion-guided-transfer-learning
Repo	https://github.com/wangshanshanCQU/MCTL
Framework	none

QFlip: An Adaptive Reinforcement Learning Strategy for the FlipIt Security Game


Title	QFlip: An Adaptive Reinforcement Learning Strategy for the FlipIt Security Game
Authors	Lisa Oakley, Alina Oprea
Abstract	A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees. FlipIt is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in FlipIt, but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the FlipIt game as a Markov Decision Process and introduce QFlip, an adaptive strategy for FlipIt based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of QFlip against specific opponents. QFlip converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces. Finally, we introduce a generalized QFlip strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent’s strategy. We also release an OpenAI Gym environment for FlipIt to facilitate future research.
Tasks	Q-Learning
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11938v3
PDF	https://arxiv.org/pdf/1906.11938v3.pdf
PWC	https://paperswithcode.com/paper/playing-adaptively-against-stealthy-opponents
Repo	https://github.com/lisaoakley/gym-flipit
Framework	none

Dynamic Distribution Pruning for Efficient Network Architecture Search


Title	Dynamic Distribution Pruning for Efficient Network Architecture Search
Authors	Xiawu Zheng, Rongrong Ji, Lang Tang, Yan Wan, Baochang Zhang, Yongjian Wu, Yunsheng Wu, Ling Shao
Abstract	Network architectures obtained by Neural Architecture Search (NAS) have shown state-of-the-art performance in various computer vision tasks. Despite the exciting progress, the computational complexity of the forward-backward propagation and the search process makes it difficult to apply NAS in practice. In particular, most previous methods require thousands of GPU days for the search process to converge. In this paper, we propose a dynamic distribution pruning method towards extremely efficient NAS, which samples architectures from a joint categorical distribution. The search space is dynamically pruned every a few epochs to update this distribution, and the optimal neural architecture is obtained when there is only one structure remained. We conduct experiments on two widely-used datasets in NAS. On CIFAR-10, the optimal structure obtained by our method achieves the state-of-the-art $1.9$% test error, while the search process is more than $1,000$ times faster (only $1.5$ GPU hours on a Tesla V100) than the state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2% top-1 accuracy under the MobileNet settings, with a time cost of only $2$ GPU days that is $100%$ acceleration over the fastest NAS algorithm. The code is available at \url{ https://github.com/tanglang96/DDPNAS}
Tasks	Neural Architecture Search
Published	2019-05-28
URL	https://arxiv.org/abs/1905.13543v2
PDF	https://arxiv.org/pdf/1905.13543v2.pdf
PWC	https://paperswithcode.com/paper/190513543
Repo	https://github.com/tanglang96/DDPNAS
Framework	pytorch

Comprehensible Context-driven Text Game Playing


Title	Comprehensible Context-driven Text Game Playing
Authors	Xusen Yin, Jonathan May
Abstract	In order to train a computer agent to play a text-based computer game, we must represent each hidden state of the game. A Long Short-Term Memory (LSTM) model running over observed texts is a common choice for state construction. However, a normal Deep Q-learning Network (DQN) for such an agent requires millions of steps of training or more to converge. As such, an LSTM-based DQN can take tens of days to finish the training process. Though we can use a Convolutional Neural Network (CNN) as a text-encoder to construct states much faster than the LSTM, doing so without an understanding of the syntactic context of the words being analyzed can slow convergence. In this paper, we use a fast CNN to encode position- and syntax-oriented structures extracted from observed texts as states. We additionally augment the reward signal in a universal and practical manner. Together, we show that our improvements can not only speed up the process by one order of magnitude but also learn a superior agent.
Tasks	Q-Learning
Published	2019-05-06
URL	https://arxiv.org/abs/1905.02265v3
PDF	https://arxiv.org/pdf/1905.02265v3.pdf
PWC	https://paperswithcode.com/paper/comprehensible-context-driven-text-game
Repo	https://github.com/yinxusen/dqn-zork
Framework	none

ImJoy: an open-source computational platform for the deep learning era


Title	ImJoy: an open-source computational platform for the deep learning era
Authors	Wei Ouyang, Florian Mueller, Martin Hjelmare, Emma Lundberg, Christophe Zimmer
Abstract	Deep learning methods have shown extraordinary potential for analyzing very diverse biomedical data, but their dissemination beyond developers is hindered by important computational hurdles. We introduce ImJoy (https://imjoy.io/), a flexible and open-source browser-based platform designed to facilitate widespread reuse of deep learning solutions in biomedical research. We highlight ImJoy’s main features and illustrate its functionalities with deep learning plugins for mobile and interactive image analysis and genomics.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13105v1
PDF	https://arxiv.org/pdf/1905.13105v1.pdf
PWC	https://paperswithcode.com/paper/imjoy-an-open-source-computational-platform
Repo	https://github.com/oeway/ImJoy
Framework	none

Semantic Neural Machine Translation using AMR


Title	Semantic Neural Machine Translation using AMR
Authors	Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, Jinsong Su
Abstract	It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (short for abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.
Tasks	Machine Translation
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07282v1
PDF	http://arxiv.org/pdf/1902.07282v1.pdf
PWC	https://paperswithcode.com/paper/semantic-neural-machine-translation-using-amr
Repo	https://github.com/freesunshine0316/semantic-nmt
Framework	tf