July 30, 2019

3086 words 15 mins read

Paper Group AWR 3

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions. ReBNet: Residual Binarized Neural Network. A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions. Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis. Adversarial PoseNet: A …

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions


Title	Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Authors	Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, Jethro Tan
Abstract	Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures including a wide variety of expressions used in spoken language and (2) inherent ambiguity in interpretation of human instructions. In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and how higher success rates of the object picking task can be achieved through an interactive clarification process.
Tasks	Object Detection
Published	2017-10-17
URL	http://arxiv.org/abs/1710.06280v2
PDF	http://arxiv.org/pdf/1710.06280v2.pdf
PWC	https://paperswithcode.com/paper/interactively-picking-real-world-objects-with
Repo	https://github.com/pfnet-research/picking-instruction
Framework	none

ReBNet: Residual Binarized Neural Network


Title	ReBNet: Residual Binarized Neural Network
Authors	Mohammad Ghasemzadeh, Mohammad Samragh, Farinaz Koushanfar
Abstract	This paper proposes ReBNet, an end-to-end framework for training reconfigurable binary neural networks on software and developing efficient accelerators for execution on FPGA. Binary neural networks offer an intriguing opportunity for deploying large-scale deep learning models on resource-constrained devices. Binarization reduces the memory footprint and replaces the power-hungry matrix-multiplication with light-weight XnorPopcount operations. However, binary networks suffer from a degraded accuracy compared to their fixed-point counterparts. We show that the state-of-the-art methods for optimizing binary networks accuracy, significantly increase the implementation cost and complexity. To compensate for the degraded accuracy while adhering to the simplicity of binary networks, we devise the first reconfigurable scheme that can adjust the classification accuracy based on the application. Our proposition improves the classification accuracy by representing features with multiple levels of residual binarization. Unlike previous methods, our approach does not exacerbate the area cost of the hardware accelerator. Instead, it provides a tradeoff between throughput and accuracy while the area overhead of multi-level binarization is negligible.
Tasks
Published	2017-11-03
URL	http://arxiv.org/abs/1711.01243v3
PDF	http://arxiv.org/pdf/1711.01243v3.pdf
PWC	https://paperswithcode.com/paper/rebnet-residual-binarized-neural-network
Repo	https://github.com/mohaghasemzadeh/ReBNet
Framework	mxnet

A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions


Title	A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions
Authors	Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier Turek, Timothy Mattson, Abdullah Muzahid
Abstract	The field of machine programming (MP), the automation of the development of software, is making notable research advances. This is, in part, due to the emergence of a wide range of novel techniques in machine learning. In this paper, we apply MP to the automation of software performance regression testing. A performance regression is a software performance degradation caused by a code change. We present AutoPerf - a novel approach to automate regression testing that utilizes three core techniques: (i) zero-positive learning, (ii) autoencoders, and (iii) hardware telemetry. We demonstrate AutoPerf’s generality and efficacy against 3 types of performance regressions across 10 real performance bugs in 7 benchmark and open-source programs. On average, AutoPerf exhibits 4% profiling overhead and accurately diagnoses more performance bugs than prior state-of-the-art approaches. Thus far, AutoPerf has produced no false negatives.
Tasks
Published	2017-09-21
URL	https://arxiv.org/abs/1709.07536v6
PDF	https://arxiv.org/pdf/1709.07536v6.pdf
PWC	https://paperswithcode.com/paper/autoperf-a-generalized-zero-positive-learning
Repo	https://github.com/mejbah/AutoPerf
Framework	none

Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis


Title	Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis
Authors	Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
Abstract	The recent work of Gatys et al., who characterized the style of an image by the statistics of convolutional neural network filters, ignited a renewed interest in the texture generation and image stylization problems. While their image generation technique uses a slow optimization process, recently several authors have proposed to learn generator neural networks that can produce similar outputs in one quick forward pass. While generator networks are promising, they are still inferior in visual quality and diversity compared to generation-by-optimization. In this work, we advance them in two significant ways. First, we introduce an instance normalization module to replace batch normalization with significant improvements to the quality of image stylization. Second, we improve diversity by introducing a new learning formulation that encourages generators to sample unbiasedly from the Julesz texture ensemble, which is the equivalence class of all images characterized by certain filter responses. Together, these two improvements take feed forward texture synthesis and image stylization much closer to the quality of generation-via-optimization, while retaining the speed advantage.
Tasks	Image Generation, Image Stylization, Texture Synthesis
Published	2017-01-09
URL	http://arxiv.org/abs/1701.02096v2
PDF	http://arxiv.org/pdf/1701.02096v2.pdf
PWC	https://paperswithcode.com/paper/improved-texture-networks-maximizing-quality
Repo	https://github.com/DmitryUlyanov/texture_nets
Framework	torch

Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation


Title	Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation
Authors	Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, Jian Yang
Abstract	For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of joint inter-connectivity. To address the problem by incorporating priors about the structure of human bodies, we propose a novel structure-aware convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator (G) generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors.
Tasks	Pose Estimation
Published	2017-04-30
URL	http://arxiv.org/abs/1705.00389v2
PDF	http://arxiv.org/pdf/1705.00389v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-posenet-a-structure-aware
Repo	https://github.com/rohitrango/Adversarial-Pose-Estimation
Framework	pytorch

Gated Multimodal Units for Information Fusion


Title	Gated Multimodal Units for Information Fusion
Authors	John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González
Abstract	This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.01992v1
PDF	http://arxiv.org/pdf/1702.01992v1.pdf
PWC	https://paperswithcode.com/paper/gated-multimodal-units-for-information-fusion
Repo	https://github.com/johnarevalo/gmu-mmimdb
Framework	none

Exploring Models and Data for Remote Sensing Image Caption Generation


Title	Exploring Models and Data for Remote Sensing Image Caption Generation
Authors	Xiaoqiang Lu, Binqiang Wang, Xiangtao Zheng, Xuelong Li
Abstract	Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal
Tasks	Scene Classification
Published	2017-12-21
URL	http://arxiv.org/abs/1712.07835v1
PDF	http://arxiv.org/pdf/1712.07835v1.pdf
PWC	https://paperswithcode.com/paper/exploring-models-and-data-for-remote-sensing
Repo	https://github.com/201528014227051/RSICD_optimal
Framework	none

Context encoders as a simple but powerful extension of word2vec


Title	Context encoders as a simple but powerful extension of word2vec
Authors	Franziska Horn
Abstract	With a simple architecture and the ability to learn meaningful word embeddings efficiently from texts containing billions of words, word2vec remains one of the most popular neural language models used today. However, as only a single embedding is learned for every word in the vocabulary, the model fails to optimally represent words with multiple meanings. Additionally, it is not possible to create embeddings for new (out-of-vocabulary) words on the spot. Based on an intuitive interpretation of the continuous bag-of-words (CBOW) word2vec model’s negative sampling training objective in terms of predicting context based similarities, we motivate an extension of the model we call context encoders (ConEc). By multiplying the matrix of trained word2vec embeddings with a word’s average context vector, out-of-vocabulary (OOV) embeddings and representations for a word with multiple meanings can be created based on the word’s local contexts. The benefits of this approach are illustrated by using these word embeddings as features in the CoNLL 2003 named entity recognition (NER) task.
Tasks	Named Entity Recognition, Word Embeddings
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02496v1
PDF	http://arxiv.org/pdf/1706.02496v1.pdf
PWC	https://paperswithcode.com/paper/context-encoders-as-a-simple-but-powerful
Repo	https://github.com/cod3licious/conec
Framework	none

Learning Graphs with Monotone Topology Properties and Multiple Connected Components


Title	Learning Graphs with Monotone Topology Properties and Multiple Connected Components
Authors	Eduardo Pavez, Hilmi E. Egilmez, Antonio Ortega
Abstract	Recent papers have formulated the problem of learning graphs from data as an inverse covariance estimation with graph Laplacian constraints. While such problems are convex, existing methods cannot guarantee that solutions will have specific graph topology properties (e.g., being $k$-partite), which are desirable for some applications. In fact, the problem of learning a graph with given topology properties, e.g., finding the $k$-partite graph that best matches the data, is in general non-convex. In this paper, we develop novel theoretical results that provide performance guarantees for an approach to solve these problems. Our solution decomposes this problem into two sub-problems, for which efficient solutions are known. Specifically, a graph topology inference (GTI) step is employed to select a feasible graph topology, i.e., one having the desired property. Then, a graph weight estimation (GWE) step is performed by solving a generalized graph Laplacian estimation problem, where edges are constrained by the topology found in the GTI step. Our main result is a bound on the error of the GWE step as a function of the error in the GTI step. This error bound indicates that the GTI step should be solved using an algorithm that approximates the similarity matrix by another matrix whose entries have been thresholded to zero to have the desired type of graph topology. The GTI stage can leverage existing methods (e.g., state of the art approaches for graph coloring) which are typically based on minimizing the total weight of removed edges. Since the GWE stage is formulated as an inverse covariance estimation problem with linear constraints, it can be solved using existing convex optimization methods. We demonstrate that our two step approach can achieve good results for both synthetic and texture image data.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1705.10934v4
PDF	http://arxiv.org/pdf/1705.10934v4.pdf
PWC	https://paperswithcode.com/paper/learning-graphs-with-monotone-topology
Repo	https://github.com/STAC-USC/graph_learning_properties
Framework	none

Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science


Title	Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science
Authors	George C. Linderman, Gal Mishne, Yuval Kluger, Stefan Steinerberger
Abstract	If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k’ \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
Tasks
Published	2017-11-13
URL	http://arxiv.org/abs/1711.04712v1
PDF	http://arxiv.org/pdf/1711.04712v1.pdf
PWC	https://paperswithcode.com/paper/randomized-near-neighbor-graphs-giant
Repo	https://github.com/KlugerLab/pyFIt-SNE
Framework	none

Fast Matrix Factorization for Online Recommendation with Implicit Feedback


Title	Fast Matrix Factorization for Online Recommendation with Implicit Feedback
Authors	Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua
Abstract	This paper contributes improvements on both the effectiveness and efficiency of Matrix Factorization (MF) methods for implicit feedback. We highlight two critical issues of existing works. First, due to the large space of unobserved feedback, most existing works resort to assign a uniform weight to the missing data to reduce computational complexity. However, such a uniform assumption is invalid in real-world settings. Second, most methods are also designed in an offline setting and fail to keep up with the dynamic nature of online data. We address the above two issues in learning MF models from implicit feedback. We first propose to weight the missing data based on item popularity, which is more effective and flexible than the uniform-weight assumption. However, such a non-uniform weighting poses efficiency challenge in learning the model. To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data. We exploit this efficiency to then seamlessly devise an incremental update strategy that instantly refreshes a MF model given new feedback. Through comprehensive experiments on two public datasets in both offline and online protocols, we show that our eALS method consistently outperforms state-of-the-art implicit MF methods. Our implementation is available at https://github.com/hexiangnan/sigir16-eals.
Tasks
Published	2017-08-16
URL	https://arxiv.org/abs/1708.05024v1
PDF	https://arxiv.org/pdf/1708.05024v1.pdf
PWC	https://paperswithcode.com/paper/fast-matrix-factorization-for-online
Repo	https://github.com/hexiangnan/sigir16-eals
Framework	none

Global optimization of Lipschitz functions


Title	Global optimization of Lipschitz functions
Authors	Cédric Malherbe, Nicolas Vayatis
Abstract	The goal of the paper is to design sequential strategies which lead to efficient optimization of an unknown function under the only assumption that it has a finite Lipschitz constant. We first identify sufficient conditions for the consistency of generic sequential algorithms and formulate the expected minimax rate for their performance. We introduce and analyze a first algorithm called LIPO which assumes the Lipschitz constant to be known. Consistency, minimax rates for LIPO are proved, as well as fast rates under an additional H"older like condition. An adaptive version of LIPO is also introduced for the more realistic setup where the Lipschitz constant is unknown and has to be estimated along with the optimization. Similar theoretical guarantees are shown to hold for the adaptive LIPO algorithm and a numerical assessment is provided at the end of the paper to illustrate the potential of this strategy with respect to state-of-the-art methods over typical benchmark problems for global optimization.
Tasks	Hyperparameter Optimization
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02628v3
PDF	http://arxiv.org/pdf/1703.02628v3.pdf
PWC	https://paperswithcode.com/paper/global-optimization-of-lipschitz-functions
Repo	https://github.com/Sycor4x/lipo
Framework	none

Faster Greedy MAP Inference for Determinantal Point Processes


Title	Faster Greedy MAP Inference for Determinantal Point Processes
Authors	Insu Han, Prabhanjan Kambadur, Kyoungsoo Park, Jinwoo Shin
Abstract	Determinantal point processes (DPPs) are popular probabilistic models that arise in many machine learning tasks, where distributions of diverse sets are characterized by matrix determinants. In this paper, we develop fast algorithms to find the most likely configuration (MAP) of large-scale DPPs, which is NP-hard in general. Due to the submodular nature of the MAP objective, greedy algorithms have been used with empirical success. Greedy implementations require computation of log-determinants, matrix inverses or solving linear systems at each iteration. We present faster implementations of the greedy algorithms by utilizing the complementary benefits of two log-determinant approximation schemes: (a) first-order expansions to the matrix log-determinant function and (b) high-order expansions to the scalar log function with stochastic trace estimators. In our experiments, our algorithms are orders of magnitude faster than their competitors, while sacrificing marginal accuracy.
Tasks	Point Processes
Published	2017-03-09
URL	http://arxiv.org/abs/1703.03389v2
PDF	http://arxiv.org/pdf/1703.03389v2.pdf
PWC	https://paperswithcode.com/paper/faster-greedy-map-inference-for-determinantal
Repo	https://github.com/insuhan/fastdppmap
Framework	none

Optic Disc and Cup Segmentation Methods for Glaucoma Detection with Modification of U-Net Convolutional Neural Network


Title	Optic Disc and Cup Segmentation Methods for Glaucoma Detection with Modification of U-Net Convolutional Neural Network
Authors	Artem Sevastopolsky
Abstract	Glaucoma is the second leading cause of blindness all over the world, with approximately 60 million cases reported worldwide in 2010. If undiagnosed in time, glaucoma causes irreversible damage to the optic nerve leading to blindness. The optic nerve head examination, which involves measurement of cup-to-disc ratio, is considered one of the most valuable methods of structural diagnosis of the disease. Estimation of cup-to-disc ratio requires segmentation of optic disc and optic cup on eye fundus images and can be performed by modern computer vision algorithms. This work presents universal approach for automatic optic disc and cup segmentation, which is based on deep learning, namely, modification of U-Net convolutional neural network. Our experiments include comparison with the best known methods on publicly available databases DRIONS-DB, RIM-ONE v.3, DRISHTI-GS. For both optic disc and cup segmentation, our method achieves quality comparable to current state-of-the-art methods, outperforming them in terms of the prediction time.
Tasks
Published	2017-04-04
URL	http://arxiv.org/abs/1704.00979v1
PDF	http://arxiv.org/pdf/1704.00979v1.pdf
PWC	https://paperswithcode.com/paper/optic-disc-and-cup-segmentation-methods-for
Repo	https://github.com/seva100/optic-nerve-cnn
Framework	tf

ZhuSuan: A Library for Bayesian Deep Learning


Title	ZhuSuan: A Library for Bayesian Deep Learning
Authors	Jiaxin Shi, Jianfei Chen, Jun Zhu, Shengyang Sun, Yucen Luo, Yihong Gu, Yuhao Zhou
Abstract	In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of probabilistic models, including both the traditional hierarchical Bayesian models and recent deep generative models. We use running examples to illustrate the probabilistic programming on ZhuSuan, including Bayesian logistic regression, variational auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural networks.
Tasks	Bayesian Inference, Probabilistic Programming
Published	2017-09-18
URL	http://arxiv.org/abs/1709.05870v1
PDF	http://arxiv.org/pdf/1709.05870v1.pdf
PWC	https://paperswithcode.com/paper/zhusuan-a-library-for-bayesian-deep-learning
Repo	https://github.com/thu-ml/zhusuan
Framework	tf