July 28, 2019

3221 words 16 mins read

Paper Group ANR 178

Acceleration and Averaging in Stochastic Mirror Descent Dynamics. Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API. A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences”. …

Acceleration and Averaging in Stochastic Mirror Descent Dynamics


Title	Acceleration and Averaging in Stochastic Mirror Descent Dynamics
Authors	Walid Krichene, Peter L. Bartlett
Abstract	We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions. Building on an averaging formulation of accelerated mirror descent, we propose a stochastic variant in which the gradient is contaminated by noise, and study the resulting stochastic differential equation. We prove a bound on the rate of change of an energy function associated with the problem, then use it to derive estimates of convergence rates of the function values, (a.s. and in expectation) both for persistent and asymptotically vanishing noise. We discuss the interaction between the parameters of the dynamics (learning rate and averaging weights) and the covariation of the noise process, and show, in particular, how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate.
Tasks
Published	2017-07-19
URL	http://arxiv.org/abs/1707.06219v1
PDF	http://arxiv.org/pdf/1707.06219v1.pdf
PWC	https://paperswithcode.com/paper/acceleration-and-averaging-in-stochastic-1
Repo
Framework

Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars


Title	Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars
Authors	Mark Martinez, Chawin Sitawarin, Kevin Finch, Lennart Meincke, Alex Yablonski, Alain Kornhauser
Abstract	As an initial assessment, over 480,000 labeled virtual images of normal highway driving were readily generated in Grand Theft Auto V’s virtual environment. Using these images, a CNN was trained to detect following distance to cars/objects ahead, lane markings, and driving angle (angular heading relative to lane centerline): all variables necessary for basic autonomous driving. Encouraging results were obtained when tested on over 50,000 labeled virtual images from substantially different GTA-V driving environments. This initial assessment begins to define both the range and scope of the labeled images needed for training as well as the range and scope of labeled images needed for testing the definition of boundaries and limitations of trained networks. It is the efficacy and flexibility of a “GTA-V”-like virtual environment that is expected to provide an efficient well-defined foundation for the training and testing of Convolutional Neural Networks for safe driving. Additionally, described is the Princeton Virtual Environment (PVE) for the training, testing and enhancement of safe driving AI, which is being developed using the video-game engine Unity. PVE is being developed to recreate rare but critical corner cases that can be used in re-training and enhancing machine learning models and understanding the limitations of current self driving models. The Florida Tesla crash is being used as an initial reference.
Tasks	Autonomous Driving, Self-Driving Cars
Published	2017-12-04
URL	http://arxiv.org/abs/1712.01397v1
PDF	http://arxiv.org/pdf/1712.01397v1.pdf
PWC	https://paperswithcode.com/paper/beyond-grand-theft-auto-v-for-training
Repo
Framework

Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API


Title	Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API
Authors	Hossein Hosseini, Baicen Xiao, Andrew Clark, Radha Poovendran
Abstract	Due to the growth of video data on Internet, automatic video analysis has gained a lot of attention from academia as well as companies such as Facebook, Twitter and Google. In this paper, we examine the robustness of video analysis algorithms in adversarial settings. Specifically, we propose targeted attacks on two fundamental classes of video analysis algorithms, namely video classification and shot detection. We show that an adversary can subtly manipulate a video in such a way that a human observer would perceive the content of the original video, but the video analysis algorithm will return the adversary’s desired outputs. We then apply the attacks on the recently released Google Cloud Video Intelligence API. The API takes a video file and returns the video labels (objects within the video), shot changes (scene changes within the video) and shot labels (description of video events over time). Through experiments, we show that the API generates video and shot labels by processing only the first frame of every second of the video. Hence, an adversary can deceive the API to output only her desired video and shot labels by periodically inserting an image into the video at the rate of one frame per second. We also show that the pattern of shot changes returned by the API can be mostly recovered by an algorithm that compares the histograms of consecutive frames. Based on our equivalent model, we develop a method for slightly modifying the video frames, in order to deceive the API into generating our desired pattern of shot changes. We perform extensive experiments with different videos and show that our attacks are consistently successful across videos with different characteristics. At the end, we propose introducing randomness to video analysis algorithms as a countermeasure to our attacks.
Tasks	Video Classification
Published	2017-08-14
URL	http://arxiv.org/abs/1708.04301v1
PDF	http://arxiv.org/pdf/1708.04301v1.pdf
PWC	https://paperswithcode.com/paper/attacking-automatic-video-analysis-algorithms
Repo
Framework

A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences”


Title	A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences”
Authors	Mieczysław A. Kłopotek
Abstract	In this paper we would like to deny the results of Wang et al. raising two fundamental claims: * A line does not contribute anything to recognition of motion parameters from two images * Four traceable points are not sufficient to recover motion parameters from two perspective To be constructive, however, we show that four traceable points are sufficient to recover motion parameters from two frames under orthogonal projection and that five points are sufficient to simplify the solution of the two-frame problem under orthogonal projection to solving a linear equation system.
Tasks
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05267v1
PDF	http://arxiv.org/pdf/1704.05267v1.pdf
PWC	https://paperswithcode.com/paper/a-comment-on-analysis-of-video-image
Repo
Framework

Functional Decision Theory: A New Theory of Instrumental Rationality


Title	Functional Decision Theory: A New Theory of Instrumental Rationality
Authors	Eliezer Yudkowsky, Nate Soares
Abstract	This paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?” Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making.
Tasks	Decision Making
Published	2017-10-13
URL	http://arxiv.org/abs/1710.05060v2
PDF	http://arxiv.org/pdf/1710.05060v2.pdf
PWC	https://paperswithcode.com/paper/functional-decision-theory-a-new-theory-of
Repo
Framework

Learning 3D Object Categories by Looking Around Them


Title	Learning 3D Object Categories by Looking Around Them
Authors	David Novotny, Diane Larlus, Andrea Vedaldi
Abstract	Traditional approaches for learning 3D object categories use either synthetic data or manual supervision. In this paper, we propose a method which does not require manual annotations and is instead cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly comparing 3D shapes; and a 3D shape completion network that can extract the full shape of an object from partial observations. We also demonstrate the benefits of configuring networks to perform probabilistic predictions as well as of geometry-aware data augmentation schemes. We obtain state-of-the-art results on publicly-available benchmarks.
Tasks	Data Augmentation
Published	2017-05-10
URL	http://arxiv.org/abs/1705.03951v2
PDF	http://arxiv.org/pdf/1705.03951v2.pdf
PWC	https://paperswithcode.com/paper/learning-3d-object-categories-by-looking
Repo
Framework

Audio Cover Song Identification using Convolutional Neural Network


Title	Audio Cover Song Identification using Convolutional Neural Network
Authors	Sungkyun Chang, Juheon Lee, Sang Keun Choe, Kyogu Lee
Abstract	In this paper, we propose a new approach to cover song identification using a CNN (convolutional neural network). Most previous studies extract the feature vectors that characterize the cover song relation from a pair of songs and used it to compute the (dis)similarity between the two songs. Based on the observation that there is a meaningful pattern between cover songs and that this can be learned, we have reformulated the cover song identification problem in a machine learning framework. To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs. We then construct the data set composed of cover song pairs and non-cover song pairs, which are used as positive and negative training samples, respectively. The trained CNN outputs the probability of being in the cover song relation given a cross-similarity matrix generated from any two pieces of music and identifies the cover song by ranking on the probability. Experimental results show that the proposed algorithm achieves performance better than or comparable to the state-of-the-art.
Tasks
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00166v1
PDF	http://arxiv.org/pdf/1712.00166v1.pdf
PWC	https://paperswithcode.com/paper/audio-cover-song-identification-using
Repo
Framework

CSWA: Aggregation-Free Spatial-Temporal Community Sensing


Title	CSWA: Aggregation-Free Spatial-Temporal Community Sensing
Authors	Jiang Bian, Haoyi Xiong, Yanjie Fu, Sajal K. Das
Abstract	In this paper, we present a novel community sensing paradigm – {C}ommunity {S}ensing {W}ithout {A}ggregation}. CSWA is designed to obtain the environment information (e.g., air pollution or temperature) in each subarea of the target area, without aggregating sensor and location data collected by community members. CSWA operates on top of a secured peer-to-peer network over the community members and proposes a novel \emph{Decentralized Spatial-Temporal Compressive Sensing} framework based on \emph{Parallelized Stochastic Gradient Descent}. Through learning the \emph{low-rank structure} via distributed optimization, CSWA approximates the value of the sensor data in each subarea (both covered and uncovered) for each sensing cycle using the sensor data locally stored in each member’s mobile device. Simulation experiments based on real-world datasets demonstrate that CSWA exhibits low approximation error (i.e., less than $0.2 ^\circ$C in city-wide temperature sensing task and $10$ units of PM2.5 index in urban air pollution sensing) and performs comparably to (sometimes better than) state-of-the-art algorithms based on the data aggregation and centralized computation.
Tasks	Compressive Sensing, Distributed Optimization
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05712v1
PDF	http://arxiv.org/pdf/1711.05712v1.pdf
PWC	https://paperswithcode.com/paper/cswa-aggregation-free-spatial-temporal
Repo
Framework

Towards a new paradigm for assistive technology at home: research challenges, design issues and performance assessment


Title	Towards a new paradigm for assistive technology at home: research challenges, design issues and performance assessment
Authors	Luca Buoncompagni, Barbara Bruno, Antonella Giuni, Fulvio Mastrogiovanni, Renato Zaccaria
Abstract	Providing elderly and people with special needs, including those suffering from physical disabilities and chronic diseases, with the possibility of retaining their independence at best is one of the most important challenges our society is expected to face. Assistance models based on the home care paradigm are being adopted rapidly in almost all industrialized and emerging countries. Such paradigms hypothesize that it is necessary to ensure that the so-called Activities of Daily Living are correctly and regularly performed by the assisted person to increase the perception of an improved quality of life. This chapter describes the computational inference engine at the core of Arianna, a system able to understand whether an assisted person performs a given set of ADL and to motivate him/her in performing them through a speech-mediated motivational dialogue, using a set of nearables to be installed in an apartment, plus a wearable to be worn or fit in garments.
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10164v1
PDF	http://arxiv.org/pdf/1710.10164v1.pdf
PWC	https://paperswithcode.com/paper/towards-a-new-paradigm-for-assistive
Repo
Framework

On Communication Complexity of Classification Problems


Title	On Communication Complexity of Classification Problems
Authors	Daniel M. Kane, Roi Livni, Shay Moran, Amir Yehudayoff
Abstract	This work studies distributed learning in the spirit of Yao’s model of communication complexity: consider a two-party setting, where each of the players gets a list of labelled examples and they communicate in order to jointly perform some learning task. To naturally fit into the framework of learning theory, the players can send each other examples (as well as bits) where each example/bit costs one unit of communication. This enables a uniform treatment of infinite classes such as half-spaces in $\mathbb{R}^d$, which are ubiquitous in machine learning. We study several fundamental questions in this model. For example, we provide combinatorial characterizations of the classes that can be learned with efficient communication in the proper-case as well as in the improper-case. These findings imply unconditional separations between various learning contexts, e.g.\ realizable versus agnostic learning, proper versus improper learning, etc. The derivation of these results hinges on a type of decision problems we term “{\it realizability problems}” where the goal is deciding whether a distributed input sample is consistent with an hypothesis from a pre-specified class. From a technical perspective, the protocols we use are based on ideas from machine learning theory and the impossibility results are based on ideas from communication complexity theory.
Tasks
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05893v3
PDF	http://arxiv.org/pdf/1711.05893v3.pdf
PWC	https://paperswithcode.com/paper/on-communication-complexity-of-classification
Repo
Framework

Affect-LM: A Neural Language Model for Customizable Affective Text Generation


Title	Affect-LM: A Neural Language Model for Customizable Affective Text Generation
Authors	Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer
Abstract	Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating state-of-the-art neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long Short-Term Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, Affect-LM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that Affect-LM generates naturally looking emotional sentences without sacrificing grammatical correctness. Affect-LM also learns affect-discriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction.
Tasks	Language Modelling, Text Generation
Published	2017-04-22
URL	http://arxiv.org/abs/1704.06851v1
PDF	http://arxiv.org/pdf/1704.06851v1.pdf
PWC	https://paperswithcode.com/paper/affect-lm-a-neural-language-model-for
Repo
Framework

Transcribing Against Time


Title	Transcribing Against Time
Authors	Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura, Alex Waibel
Abstract	We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion. This is done by specifying a fixed time budget, and then automatically choosing location and size of segments for correction such that the number of corrected errors is maximized. The core components, as suggested by previous research [1], are a utility model that estimates the number of errors in a particular segment, and a cost model that estimates annotation effort for the segment. In this work we propose a dynamic updating framework that allows for the training of cost models during the ongoing transcription process. This removes the need for transcriber enrollment prior to the actual transcription, and improves correction efficiency by allowing highly transcriber-adaptive cost modeling. We first confirm and analyze the improvements afforded by this method in a simulated study. We then conduct a realistic user study, observing efficiency improvements of 15% relative on average, and 42% for the participants who deviated most strongly from our initial, transcriber-agnostic cost model. Moreover, we find that our updating framework can capture dynamically changing factors, such as transcriber fatigue and topic familiarity, which we observe to have a large influence on the transcriber’s working behavior.
Tasks
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05227v1
PDF	http://arxiv.org/pdf/1709.05227v1.pdf
PWC	https://paperswithcode.com/paper/transcribing-against-time
Repo
Framework

Exploring Geometric Property Thresholds For Filtering Non-Text Regions In A Connected Component Based Text Detection Application


Title	Exploring Geometric Property Thresholds For Filtering Non-Text Regions In A Connected Component Based Text Detection Application
Authors	Teresa Nicole Brooks
Abstract	Automated text detection is a difficult computer vision task. In order to accurately detect and identity text in an image or video, two major problems must be addressed. The primary problem is implementing a robust and reliable method for distinguishing text vs non-text regions in images and videos. Part of the difficulty stems from the almost unlimited combinations of fonts, lighting conditions, distortions, and other variations that can be found in images and videos. This paper explores key properties of two popular and proven methods for implementing text detection; maximum stable external regions (MSER) and stroke width variation.
Tasks
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03548v1
PDF	http://arxiv.org/pdf/1709.03548v1.pdf
PWC	https://paperswithcode.com/paper/exploring-geometric-property-thresholds-for
Repo
Framework

DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition


Title	DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition
Authors	Shuangping Huangm Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Haobin Wang
Abstract	Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR.
Tasks
Published	2017-03-17
URL	http://arxiv.org/abs/1703.05870v2
PDF	http://arxiv.org/pdf/1703.05870v2.pdf
PWC	https://paperswithcode.com/paper/dropregion-training-of-inception-font-network
Repo
Framework

An innovative solution for breast cancer textual big data analysis


Title	An innovative solution for breast cancer textual big data analysis
Authors	Nicolas Thiebaut, Antoine Simoulin, Karl Neuberger, Issam Ibnouhsein, Nicolas Bousquet, Nathalie Reix, Sébastien Molière, Carole Mathelin
Abstract	The digitalization of stored information in hospitals now allows for the exploitation of medical data in text format, as electronic health records (EHRs), initially gathered for other purposes than epidemiology. Manual search and analysis operations on such data become tedious. In recent years, the use of natural language processing (NLP) tools was highlighted to automatize the extraction of information contained in EHRs, structure it and perform statistical analysis on this structured information. The main difficulties with the existing approaches is the requirement of synonyms or ontology dictionaries, that are mostly available in English only and do not include local or custom notations. In this work, a team composed of oncologists as domain experts and data scientists develop a custom NLP-based system to process and structure textual clinical reports of patients suffering from breast cancer. The tool relies on the combination of standard text mining techniques and an advanced synonym detection method. It allows for a global analysis by retrieval of indicators such as medical history, tumor characteristics, therapeutic responses, recurrences and prognosis. The versatility of the method allows to obtain easily new indicators, thus opening up the way for retrospective studies with a substantial reduction of the amount of manual work. With no need for biomedical annotators or pre-defined ontologies, this language-agnostic method reached an good extraction accuracy for several concepts of interest, according to a comparison with a manually structured file, without requiring any existing corpus with local or new notations.
Tasks	Epidemiology
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02259v1
PDF	http://arxiv.org/pdf/1712.02259v1.pdf
PWC	https://paperswithcode.com/paper/an-innovative-solution-for-breast-cancer
Repo
Framework