Paper Group ANR 178
Acceleration and Averaging in Stochastic Mirror Descent Dynamics. Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API. A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences”. …
Acceleration and Averaging in Stochastic Mirror Descent Dynamics
Title | Acceleration and Averaging in Stochastic Mirror Descent Dynamics |
Authors | Walid Krichene, Peter L. Bartlett |
Abstract | We formulate and study a general family of (continuous-time) stochastic dynamics for accelerated first-order minimization of smooth convex functions. Building on an averaging formulation of accelerated mirror descent, we propose a stochastic variant in which the gradient is contaminated by noise, and study the resulting stochastic differential equation. We prove a bound on the rate of change of an energy function associated with the problem, then use it to derive estimates of convergence rates of the function values, (a.s. and in expectation) both for persistent and asymptotically vanishing noise. We discuss the interaction between the parameters of the dynamics (learning rate and averaging weights) and the covariation of the noise process, and show, in particular, how the asymptotic rate of covariation affects the choice of parameters and, ultimately, the convergence rate. |
Tasks | |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.06219v1 |
http://arxiv.org/pdf/1707.06219v1.pdf | |
PWC | https://paperswithcode.com/paper/acceleration-and-averaging-in-stochastic-1 |
Repo | |
Framework | |
Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars
Title | Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars |
Authors | Mark Martinez, Chawin Sitawarin, Kevin Finch, Lennart Meincke, Alex Yablonski, Alain Kornhauser |
Abstract | As an initial assessment, over 480,000 labeled virtual images of normal highway driving were readily generated in Grand Theft Auto V’s virtual environment. Using these images, a CNN was trained to detect following distance to cars/objects ahead, lane markings, and driving angle (angular heading relative to lane centerline): all variables necessary for basic autonomous driving. Encouraging results were obtained when tested on over 50,000 labeled virtual images from substantially different GTA-V driving environments. This initial assessment begins to define both the range and scope of the labeled images needed for training as well as the range and scope of labeled images needed for testing the definition of boundaries and limitations of trained networks. It is the efficacy and flexibility of a “GTA-V”-like virtual environment that is expected to provide an efficient well-defined foundation for the training and testing of Convolutional Neural Networks for safe driving. Additionally, described is the Princeton Virtual Environment (PVE) for the training, testing and enhancement of safe driving AI, which is being developed using the video-game engine Unity. PVE is being developed to recreate rare but critical corner cases that can be used in re-training and enhancing machine learning models and understanding the limitations of current self driving models. The Florida Tesla crash is being used as an initial reference. |
Tasks | Autonomous Driving, Self-Driving Cars |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.01397v1 |
http://arxiv.org/pdf/1712.01397v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-grand-theft-auto-v-for-training |
Repo | |
Framework | |
Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API
Title | Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API |
Authors | Hossein Hosseini, Baicen Xiao, Andrew Clark, Radha Poovendran |
Abstract | Due to the growth of video data on Internet, automatic video analysis has gained a lot of attention from academia as well as companies such as Facebook, Twitter and Google. In this paper, we examine the robustness of video analysis algorithms in adversarial settings. Specifically, we propose targeted attacks on two fundamental classes of video analysis algorithms, namely video classification and shot detection. We show that an adversary can subtly manipulate a video in such a way that a human observer would perceive the content of the original video, but the video analysis algorithm will return the adversary’s desired outputs. We then apply the attacks on the recently released Google Cloud Video Intelligence API. The API takes a video file and returns the video labels (objects within the video), shot changes (scene changes within the video) and shot labels (description of video events over time). Through experiments, we show that the API generates video and shot labels by processing only the first frame of every second of the video. Hence, an adversary can deceive the API to output only her desired video and shot labels by periodically inserting an image into the video at the rate of one frame per second. We also show that the pattern of shot changes returned by the API can be mostly recovered by an algorithm that compares the histograms of consecutive frames. Based on our equivalent model, we develop a method for slightly modifying the video frames, in order to deceive the API into generating our desired pattern of shot changes. We perform extensive experiments with different videos and show that our attacks are consistently successful across videos with different characteristics. At the end, we propose introducing randomness to video analysis algorithms as a countermeasure to our attacks. |
Tasks | Video Classification |
Published | 2017-08-14 |
URL | http://arxiv.org/abs/1708.04301v1 |
http://arxiv.org/pdf/1708.04301v1.pdf | |
PWC | https://paperswithcode.com/paper/attacking-automatic-video-analysis-algorithms |
Repo | |
Framework | |
A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences”
Title | A Comment on “Analysis of Video Image Sequences Using Point and Line Correspondences” |
Authors | Mieczysław A. Kłopotek |
Abstract | In this paper we would like to deny the results of Wang et al. raising two fundamental claims: * A line does not contribute anything to recognition of motion parameters from two images * Four traceable points are not sufficient to recover motion parameters from two perspective To be constructive, however, we show that four traceable points are sufficient to recover motion parameters from two frames under orthogonal projection and that five points are sufficient to simplify the solution of the two-frame problem under orthogonal projection to solving a linear equation system. |
Tasks | |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05267v1 |
http://arxiv.org/pdf/1704.05267v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comment-on-analysis-of-video-image |
Repo | |
Framework | |
Functional Decision Theory: A New Theory of Instrumental Rationality
Title | Functional Decision Theory: A New Theory of Instrumental Rationality |
Authors | Eliezer Yudkowsky, Nate Soares |
Abstract | This paper describes and motivates a new decision theory known as functional decision theory (FDT), as distinct from causal decision theory and evidential decision theory. Functional decision theorists hold that the normative principle for action is to treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?” Adhering to this principle delivers a number of benefits, including the ability to maximize wealth in an array of traditional decision-theoretic and game-theoretic problems where CDT and EDT perform poorly. Using one simple and coherent decision rule, functional decision theorists (for example) achieve more utility than CDT on Newcomb’s problem, more utility than EDT on the smoking lesion problem, and more utility than both in Parfit’s hitchhiker problem. In this paper, we define FDT, explore its prescriptions in a number of different decision problems, compare it to CDT and EDT, and give philosophical justifications for FDT as a normative theory of decision-making. |
Tasks | Decision Making |
Published | 2017-10-13 |
URL | http://arxiv.org/abs/1710.05060v2 |
http://arxiv.org/pdf/1710.05060v2.pdf | |
PWC | https://paperswithcode.com/paper/functional-decision-theory-a-new-theory-of |
Repo | |
Framework | |
Learning 3D Object Categories by Looking Around Them
Title | Learning 3D Object Categories by Looking Around Them |
Authors | David Novotny, Diane Larlus, Andrea Vedaldi |
Abstract | Traditional approaches for learning 3D object categories use either synthetic data or manual supervision. In this paper, we propose a method which does not require manual annotations and is instead cued by observing objects from a moving vantage point. Our system builds on two innovations: a Siamese viewpoint factorization network that robustly aligns different videos together without explicitly comparing 3D shapes; and a 3D shape completion network that can extract the full shape of an object from partial observations. We also demonstrate the benefits of configuring networks to perform probabilistic predictions as well as of geometry-aware data augmentation schemes. We obtain state-of-the-art results on publicly-available benchmarks. |
Tasks | Data Augmentation |
Published | 2017-05-10 |
URL | http://arxiv.org/abs/1705.03951v2 |
http://arxiv.org/pdf/1705.03951v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-3d-object-categories-by-looking |
Repo | |
Framework | |
Audio Cover Song Identification using Convolutional Neural Network
Title | Audio Cover Song Identification using Convolutional Neural Network |
Authors | Sungkyun Chang, Juheon Lee, Sang Keun Choe, Kyogu Lee |
Abstract | In this paper, we propose a new approach to cover song identification using a CNN (convolutional neural network). Most previous studies extract the feature vectors that characterize the cover song relation from a pair of songs and used it to compute the (dis)similarity between the two songs. Based on the observation that there is a meaningful pattern between cover songs and that this can be learned, we have reformulated the cover song identification problem in a machine learning framework. To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs. We then construct the data set composed of cover song pairs and non-cover song pairs, which are used as positive and negative training samples, respectively. The trained CNN outputs the probability of being in the cover song relation given a cross-similarity matrix generated from any two pieces of music and identifies the cover song by ranking on the probability. Experimental results show that the proposed algorithm achieves performance better than or comparable to the state-of-the-art. |
Tasks | |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00166v1 |
http://arxiv.org/pdf/1712.00166v1.pdf | |
PWC | https://paperswithcode.com/paper/audio-cover-song-identification-using |
Repo | |
Framework | |
CSWA: Aggregation-Free Spatial-Temporal Community Sensing
Title | CSWA: Aggregation-Free Spatial-Temporal Community Sensing |
Authors | Jiang Bian, Haoyi Xiong, Yanjie Fu, Sajal K. Das |
Abstract | In this paper, we present a novel community sensing paradigm – {C}ommunity {S}ensing {W}ithout {A}ggregation}. CSWA is designed to obtain the environment information (e.g., air pollution or temperature) in each subarea of the target area, without aggregating sensor and location data collected by community members. CSWA operates on top of a secured peer-to-peer network over the community members and proposes a novel \emph{Decentralized Spatial-Temporal Compressive Sensing} framework based on \emph{Parallelized Stochastic Gradient Descent}. Through learning the \emph{low-rank structure} via distributed optimization, CSWA approximates the value of the sensor data in each subarea (both covered and uncovered) for each sensing cycle using the sensor data locally stored in each member’s mobile device. Simulation experiments based on real-world datasets demonstrate that CSWA exhibits low approximation error (i.e., less than $0.2 ^\circ$C in city-wide temperature sensing task and $10$ units of PM2.5 index in urban air pollution sensing) and performs comparably to (sometimes better than) state-of-the-art algorithms based on the data aggregation and centralized computation. |
Tasks | Compressive Sensing, Distributed Optimization |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05712v1 |
http://arxiv.org/pdf/1711.05712v1.pdf | |
PWC | https://paperswithcode.com/paper/cswa-aggregation-free-spatial-temporal |
Repo | |
Framework | |
Towards a new paradigm for assistive technology at home: research challenges, design issues and performance assessment
Title | Towards a new paradigm for assistive technology at home: research challenges, design issues and performance assessment |
Authors | Luca Buoncompagni, Barbara Bruno, Antonella Giuni, Fulvio Mastrogiovanni, Renato Zaccaria |
Abstract | Providing elderly and people with special needs, including those suffering from physical disabilities and chronic diseases, with the possibility of retaining their independence at best is one of the most important challenges our society is expected to face. Assistance models based on the home care paradigm are being adopted rapidly in almost all industrialized and emerging countries. Such paradigms hypothesize that it is necessary to ensure that the so-called Activities of Daily Living are correctly and regularly performed by the assisted person to increase the perception of an improved quality of life. This chapter describes the computational inference engine at the core of Arianna, a system able to understand whether an assisted person performs a given set of ADL and to motivate him/her in performing them through a speech-mediated motivational dialogue, using a set of nearables to be installed in an apartment, plus a wearable to be worn or fit in garments. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10164v1 |
http://arxiv.org/pdf/1710.10164v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-a-new-paradigm-for-assistive |
Repo | |
Framework | |
On Communication Complexity of Classification Problems
Title | On Communication Complexity of Classification Problems |
Authors | Daniel M. Kane, Roi Livni, Shay Moran, Amir Yehudayoff |
Abstract | This work studies distributed learning in the spirit of Yao’s model of communication complexity: consider a two-party setting, where each of the players gets a list of labelled examples and they communicate in order to jointly perform some learning task. To naturally fit into the framework of learning theory, the players can send each other examples (as well as bits) where each example/bit costs one unit of communication. This enables a uniform treatment of infinite classes such as half-spaces in $\mathbb{R}^d$, which are ubiquitous in machine learning. We study several fundamental questions in this model. For example, we provide combinatorial characterizations of the classes that can be learned with efficient communication in the proper-case as well as in the improper-case. These findings imply unconditional separations between various learning contexts, e.g.\ realizable versus agnostic learning, proper versus improper learning, etc. The derivation of these results hinges on a type of decision problems we term “{\it realizability problems}” where the goal is deciding whether a distributed input sample is consistent with an hypothesis from a pre-specified class. From a technical perspective, the protocols we use are based on ideas from machine learning theory and the impossibility results are based on ideas from communication complexity theory. |
Tasks | |
Published | 2017-11-16 |
URL | http://arxiv.org/abs/1711.05893v3 |
http://arxiv.org/pdf/1711.05893v3.pdf | |
PWC | https://paperswithcode.com/paper/on-communication-complexity-of-classification |
Repo | |
Framework | |
Affect-LM: A Neural Language Model for Customizable Affective Text Generation
Title | Affect-LM: A Neural Language Model for Customizable Affective Text Generation |
Authors | Sayan Ghosh, Mathieu Chollet, Eugene Laksana, Louis-Philippe Morency, Stefan Scherer |
Abstract | Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating state-of-the-art neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long Short-Term Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, Affect-LM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that Affect-LM generates naturally looking emotional sentences without sacrificing grammatical correctness. Affect-LM also learns affect-discriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction. |
Tasks | Language Modelling, Text Generation |
Published | 2017-04-22 |
URL | http://arxiv.org/abs/1704.06851v1 |
http://arxiv.org/pdf/1704.06851v1.pdf | |
PWC | https://paperswithcode.com/paper/affect-lm-a-neural-language-model-for |
Repo | |
Framework | |
Transcribing Against Time
Title | Transcribing Against Time |
Authors | Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura, Alex Waibel |
Abstract | We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion. This is done by specifying a fixed time budget, and then automatically choosing location and size of segments for correction such that the number of corrected errors is maximized. The core components, as suggested by previous research [1], are a utility model that estimates the number of errors in a particular segment, and a cost model that estimates annotation effort for the segment. In this work we propose a dynamic updating framework that allows for the training of cost models during the ongoing transcription process. This removes the need for transcriber enrollment prior to the actual transcription, and improves correction efficiency by allowing highly transcriber-adaptive cost modeling. We first confirm and analyze the improvements afforded by this method in a simulated study. We then conduct a realistic user study, observing efficiency improvements of 15% relative on average, and 42% for the participants who deviated most strongly from our initial, transcriber-agnostic cost model. Moreover, we find that our updating framework can capture dynamically changing factors, such as transcriber fatigue and topic familiarity, which we observe to have a large influence on the transcriber’s working behavior. |
Tasks | |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05227v1 |
http://arxiv.org/pdf/1709.05227v1.pdf | |
PWC | https://paperswithcode.com/paper/transcribing-against-time |
Repo | |
Framework | |
Exploring Geometric Property Thresholds For Filtering Non-Text Regions In A Connected Component Based Text Detection Application
Title | Exploring Geometric Property Thresholds For Filtering Non-Text Regions In A Connected Component Based Text Detection Application |
Authors | Teresa Nicole Brooks |
Abstract | Automated text detection is a difficult computer vision task. In order to accurately detect and identity text in an image or video, two major problems must be addressed. The primary problem is implementing a robust and reliable method for distinguishing text vs non-text regions in images and videos. Part of the difficulty stems from the almost unlimited combinations of fonts, lighting conditions, distortions, and other variations that can be found in images and videos. This paper explores key properties of two popular and proven methods for implementing text detection; maximum stable external regions (MSER) and stroke width variation. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03548v1 |
http://arxiv.org/pdf/1709.03548v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-geometric-property-thresholds-for |
Repo | |
Framework | |
DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition
Title | DropRegion Training of Inception Font Network for High-Performance Chinese Font Recognition |
Authors | Shuangping Huangm Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Haobin Wang |
Abstract | Chinese font recognition (CFR) has gained significant attention in recent years. However, due to the sparsity of labeled font samples and the structural complexity of Chinese characters, CFR is still a challenging task. In this paper, a DropRegion method is proposed to generate a large number of stochastic variant font samples whose local regions are selectively disrupted and an inception font network (IFN) with two additional convolutional neural network (CNN) structure elements, i.e., a cascaded cross-channel parametric pooling (CCCP) and global average pooling, is designed. Because the distribution of strokes in a font image is non-stationary, an elastic meshing technique that adaptively constructs a set of local regions with equalized information is developed. Thus, DropRegion is seamlessly embedded in the IFN, which enables end-to-end training; the proposed DropRegion-IFN can be used for high performance CFR. Experimental results have confirmed the effectiveness of our new approach for CFR. |
Tasks | |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.05870v2 |
http://arxiv.org/pdf/1703.05870v2.pdf | |
PWC | https://paperswithcode.com/paper/dropregion-training-of-inception-font-network |
Repo | |
Framework | |
An innovative solution for breast cancer textual big data analysis
Title | An innovative solution for breast cancer textual big data analysis |
Authors | Nicolas Thiebaut, Antoine Simoulin, Karl Neuberger, Issam Ibnouhsein, Nicolas Bousquet, Nathalie Reix, Sébastien Molière, Carole Mathelin |
Abstract | The digitalization of stored information in hospitals now allows for the exploitation of medical data in text format, as electronic health records (EHRs), initially gathered for other purposes than epidemiology. Manual search and analysis operations on such data become tedious. In recent years, the use of natural language processing (NLP) tools was highlighted to automatize the extraction of information contained in EHRs, structure it and perform statistical analysis on this structured information. The main difficulties with the existing approaches is the requirement of synonyms or ontology dictionaries, that are mostly available in English only and do not include local or custom notations. In this work, a team composed of oncologists as domain experts and data scientists develop a custom NLP-based system to process and structure textual clinical reports of patients suffering from breast cancer. The tool relies on the combination of standard text mining techniques and an advanced synonym detection method. It allows for a global analysis by retrieval of indicators such as medical history, tumor characteristics, therapeutic responses, recurrences and prognosis. The versatility of the method allows to obtain easily new indicators, thus opening up the way for retrospective studies with a substantial reduction of the amount of manual work. With no need for biomedical annotators or pre-defined ontologies, this language-agnostic method reached an good extraction accuracy for several concepts of interest, according to a comparison with a manually structured file, without requiring any existing corpus with local or new notations. |
Tasks | Epidemiology |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02259v1 |
http://arxiv.org/pdf/1712.02259v1.pdf | |
PWC | https://paperswithcode.com/paper/an-innovative-solution-for-breast-cancer |
Repo | |
Framework | |