January 27, 2020

3109 words 15 mins read

Paper Group ANR 1201

A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays. Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators. From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories. On the Legal Compatibility of Fairness Definitions. Connecting Lyapunov Control …

A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays


Title	A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays
Authors	Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek, Walterio Mayol-Cuevas
Abstract	We present a convolutional neural network implementation for pixel processor array (PPA) sensors. PPA hardware consists of a fine-grained array of general-purpose processing elements, each capable of light capture, data storage, program execution, and communication with neighboring elements. This allows images to be stored and manipulated directly at the point of light capture, rather than having to transfer images to external processing hardware. Our CNN approach divides this array up into 4x4 blocks of processing elements, essentially trading-off image resolution for increased local memory capacity per 4x4 “pixel”. We implement parallel operations for image addition, subtraction and bit-shifting images in this 4x4 block format. Using these components we formulate how to perform ternary weight convolutions upon these images, compactly store results of such convolutions, perform max-pooling, and transfer the resulting sub-sampled data to an attached micro-controller. We train ternary weight filter CNNs for digit recognition and a simple tracking task, and demonstrate inference of these networks upon the SCAMP5 PPA system. This work represents a first step towards embedding neural network processing capability directly onto the focal plane of a sensor.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05647v2
PDF	https://arxiv.org/pdf/1909.05647v2.pdf
PWC	https://paperswithcode.com/paper/a-camera-that-cnns-towards-embedded-neural
Repo
Framework

Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators


Title	Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators
Authors	Sanghyun Yi, Rahul Goel, Chandra Khatri, Alessandra Cervone, Tagyoung Chung, Behnam Hedayatnia, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tur
Abstract	Encoder-decoder based neural architectures serve as the basis of state-of-the-art approaches in end-to-end open domain dialog systems. Since most of such systems are trained with a maximum likelihood~(MLE) objective they suffer from issues such as lack of generalizability and the generic response problem, i.e., a system response that can be an answer to a large number of user utterances, e.g., “Maybe, I don’t know.” Having explicit feedback on the relevance and interestingness of a system response at each turn can be a useful signal for mitigating such issues and improving system quality by selecting responses from different approaches. Towards this goal, we present a system that evaluates chatbot responses at each dialog turn for coherence and engagement. Our system provides explicit turn-level dialog quality feedback, which we show to be highly correlated with human evaluation. To show that incorporating this feedback in the neural response generation models improves dialog quality, we present two different and complementary mechanisms to incorporate explicit feedback into a neural response generation model: reranking and direct modification of the loss function during training. Our studies show that a response generation model that incorporates these combined feedback mechanisms produce more engaging and coherent responses in an open-domain spoken dialog setting, significantly improving the response quality using both automatic and human evaluation.
Tasks	Chatbot
Published	2019-04-30
URL	https://arxiv.org/abs/1904.13015v4
PDF	https://arxiv.org/pdf/1904.13015v4.pdf
PWC	https://paperswithcode.com/paper/towards-coherent-and-engaging-spoken-dialog
Repo
Framework

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories


Title	From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories
Authors	Songwei Ge, Curtis Xuan, Ruihua Song, Chao Zou, Wei Liu, Jin Zhou
Abstract	Sound effects play an essential role in producing high-quality radio stories but require enormous labor cost to add. In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model. However, directly implementing a tag-based retrieval model leads to high false positives due to the ambiguity of story contents. To solve this problem, we introduce a retrieval-based framework hybridized with a semantic inference model which helps to achieve robust retrieval results. Our model relies on fine-designed features extracted from the context of candidate triggers. We collect two story dubbing datasets through crowdsourcing to analyze the setting of adding sound effects and to train and test our proposed methods. We further discuss the importance of each feature and introduce several heuristic rules for the trade-off between precision and recall. Together with the text-to-speech technology, our results reveal a promising automatic pipeline on producing high-quality radio stories.
Tasks
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07590v1
PDF	https://arxiv.org/pdf/1908.07590v1.pdf
PWC	https://paperswithcode.com/paper/190807590
Repo
Framework

On the Legal Compatibility of Fairness Definitions


Title	On the Legal Compatibility of Fairness Definitions
Authors	Alice Xiang, Inioluwa Deborah Raji
Abstract	Past literature has been effective in demonstrating ideological gaps in machine learning (ML) fairness definitions when considering their use in complex socio-technical systems. However, we go further to demonstrate that these definitions often misunderstand the legal concepts from which they purport to be inspired, and consequently inappropriately co-opt legal language. In this paper, we demonstrate examples of this misalignment and discuss the differences in ML terminology and their legal counterparts, as well as what both the legal and ML fairness communities can learn from these tensions. We focus this paper on U.S. anti-discrimination law since the ML fairness research community regularly references terms from this body of law.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1912.00761v1
PDF	https://arxiv.org/pdf/1912.00761v1.pdf
PWC	https://paperswithcode.com/paper/on-the-legal-compatibility-of-fairness
Repo
Framework

Connecting Lyapunov Control Theory to Adversarial Attacks


Title	Connecting Lyapunov Control Theory to Adversarial Attacks
Authors	Arash Rahnama, Andre T. Nguyen, Edward Raff
Abstract	Significant work is being done to develop the math and tools necessary to build provable defenses, or at least bounds, against adversarial attacks of neural networks. In this work, we argue that tools from control theory could be leveraged to aid in defending against such attacks. We do this by example, building a provable defense against a weaker adversary. This is done so we can focus on the mechanisms of control theory, and illuminate its intrinsic value.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07732v1
PDF	https://arxiv.org/pdf/1907.07732v1.pdf
PWC	https://paperswithcode.com/paper/connecting-lyapunov-control-theory-to
Repo
Framework

CoachAI: A Conversational Agent Assisted Health Coaching Platform


Title	CoachAI: A Conversational Agent Assisted Health Coaching Platform
Authors	Ahmed Fadhil, Gianluca Schiavo, Yunlong Wang
Abstract	Poor lifestyle represents a health risk factor and is the leading cause of morbidity and chronic conditions. The impact of poor lifestyle can be significantly altered by individual behavior change. Although the current shift in healthcare towards a long lasting modifiable behavior, however, with increasing caregiver workload and individuals’ continuous needs of care, there is a need to ease caregiver’s work while ensuring continuous interaction with users. This paper describes the design and validation of CoachAI, a conversational agent assisted health coaching system to support health intervention delivery to individuals and groups. CoachAI instantiates a text based healthcare chatbot system that bridges the remote human coach and the users. This research provides three main contributions to the preventive healthcare and healthy lifestyle promotion: (1) it presents the conversational agent to aid the caregiver; (2) it aims to decrease caregiver’s workload and enhance care given to users, by handling (automating) repetitive caregiver tasks; and (3) it presents a domain independent mobile health conversational agent for health intervention delivery. We will discuss our approach and analyze the results of a one month validation study on physical activity, healthy diet and stress management.
Tasks	Chatbot
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11961v1
PDF	http://arxiv.org/pdf/1904.11961v1.pdf
PWC	https://paperswithcode.com/paper/coachai-a-conversational-agent-assisted
Repo
Framework

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data


Title	RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data
Authors	Benjamin Coleman, Anshumali Shrivastava, Richard G. Baraniuk
Abstract	We present the first sublinear memory sketch which can be queried to find the $v$ nearest neighbors in a dataset. Our online sketching algorithm can compress an $N$-element dataset to a sketch of size $O(N^b \log^3{N})$ in $O(N^{b+1} \log^3{N})$ time, where $b < 1$ when the query satisfies a data-dependent near-neighbor stability condition. We achieve data-dependent sublinear space by combining recent advances in locality sensitive hashing (LSH)-based estimators with compressed sensing. Our results shed new light on the memory-accuracy tradeoff for near-neighbor search. The techniques presented reveal a deep connection between the fundamental compressed sensing (or heavy hitters) recovery problem and near-neighbor search, leading to new insight for geometric search problems and implications for sketching algorithms.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06687v2
PDF	http://arxiv.org/pdf/1902.06687v2.pdf
PWC	https://paperswithcode.com/paper/race-sub-linear-memory-sketches-for
Repo
Framework

Deep Convolutional Neural Networks for Imaging Data Based Survival Analysis of Rectal Cancer


Title	Deep Convolutional Neural Networks for Imaging Data Based Survival Analysis of Rectal Cancer
Authors	Hongming Li, Pamela Boimel, James Janopaul-Naylor, Haoyu Zhong, Ying Xiao, Edgar Ben-Josef, Yong Fan
Abstract	Recent radiomic studies have witnessed promising performance of deep learning techniques in learning radiomic features and fusing multimodal imaging data. Most existing deep learning based radiomic studies build predictive models in a setting of pattern classification, not appropriate for survival analysis studies where some data samples have incomplete observations. To improve existing survival analysis techniques whose performance is hinged on imaging features, we propose a deep learning method to build survival regression models by optimizing imaging features with deep convolutional neural networks (CNNs) in a proportional hazards model. To make the CNNs applicable to tumors with varied sizes, a spatial pyramid pooling strategy is adopted. Our method has been validated based on a simulated imaging dataset and a FDG-PET/CT dataset of rectal cancer patients treated for locally advanced rectal cancer. Compared with survival prediction models built upon hand-crafted radiomic features using Cox proportional hazards model and random survival forests, our method achieved competitive prediction performance.
Tasks	Survival Analysis
Published	2019-01-05
URL	http://arxiv.org/abs/1901.01449v1
PDF	http://arxiv.org/pdf/1901.01449v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-networks-for
Repo
Framework

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework


Title	Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework
Authors	Mingbo Ma, Baigong Zheng, Kaibo Liu, Renjie Zheng, Hairong Liu, Kainan Peng, Kenneth Church, Liang Huang
Abstract	Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audio with near human-level naturalness. However, these efforts still suffer from two types of latencies: (a) the computational latency (synthesize time), which grows linearly with the sentence length even with parallel approaches, and (b) the input latency in scenarios where the input text is incrementally generated (such as in simultaneous translation, dialog generation, and assistive technologies). To reduce these latencies, we devise the first neural incremental TTS approach based on the recently proposed prefix-to-prefix framework. We synthesize speech in an online fashion, playing a segment of audio while generating the next, resulting in an O(1) rather than O(n) latency. Experiments on English TTS show that our approach achieves similar speech naturalness compared to full sentence methods, but only using a fraction of time and a constant (1 - 2 words) latency.
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02750v1
PDF	https://arxiv.org/pdf/1911.02750v1.pdf
PWC	https://paperswithcode.com/paper/incremental-text-to-speech-synthesis-with
Repo
Framework

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding


Title	A Hardware-Oriented and Memory-Efficient Method for CTC Decoding
Authors	Siyuan Lu, Jinming Lu, Jun Lin, Zhongfeng Wang
Abstract	The Connectionist Temporal Classification (CTC) has achieved great success in sequence to sequence analysis tasks such as automatic speech recognition (ASR) and scene text recognition (STR). These applications can use the CTC objective function to train the recurrent neural networks (RNNs), and decode the outputs of RNNs during inference. While hardware architectures for RNNs have been studied, hardware-based CTCdecoders are desired for high-speed CTC-based inference systems. This paper, for the first time, provides a low-complexity and memory-efficient approach to build a CTC-decoder based on the beam search decoding. Firstly, we improve the beam search decoding algorithm to save the storage space. Secondly, we compress a dictionary (reduced from 26.02MB to 1.12MB) and use it as the language model. Meanwhile searching this dictionary is trivial. Finally, a fixed-point CTC-decoder for an English ASR and an STR task using the proposed method is implemented with C++ language. It is shown that the proposed method has little precision loss compared with its floating-point counterpart. Our experiments demonstrate the compression ratio of the storage required by the proposed beam search decoding algorithm are 29.49 (ASR) and 17.95 (STR).
Tasks	Language Modelling, Scene Text Recognition, Speech Recognition
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03175v1
PDF	https://arxiv.org/pdf/1905.03175v1.pdf
PWC	https://paperswithcode.com/paper/a-hardware-oriented-and-memory-efficient
Repo
Framework

A context sensitive real-time Spell Checker with language adaptability


Title	A context sensitive real-time Spell Checker with language adaptability
Authors	Prabhakar Gupta
Abstract	We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other languages out of the box. Most of the systems do not work in real-time. We explain the process of generating a language’s word dictionary and n-gram probability dictionaries using Wikipedia-articles data and manually curated video subtitles. We present the results of generating a list of suggestions for a misspelled word. We also propose three approaches to create noisy channel datasets of real-world typographic errors. We compare our system with industry-accepted spell checker tools for 11 languages. Finally, we show the performance of our system on synthetic datasets for 24 languages.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.11242v1
PDF	https://arxiv.org/pdf/1910.11242v1.pdf
PWC	https://paperswithcode.com/paper/a-context-sensitive-real-time-spell-checker
Repo
Framework

Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings


Title	Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings
Authors	Yoni Kasten, Amnon Geifman, Meirav Galun, Ronen Basri
Abstract	Essential matrix averaging, i.e., the task of recovering camera locations and orientations in calibrated, multiview settings, is a first step in global approaches to Euclidean structure from motion. A common approach to essential matrix averaging is to separately solve for camera orientations and subsequently for camera positions. This paper presents a novel approach that solves simultaneously for both camera orientations and positions. We offer a complete characterization of the algebraic conditions that enable a unique Euclidean reconstruction of $n$ cameras from a collection of $(^n_2)$ essential matrices. We next use these conditions to formulate essential matrix averaging as a constrained optimization problem, allowing us to recover a consistent set of essential matrices given a (possibly partial) set of measured essential matrices computed independently for pairs of images. We finally use the recovered essential matrices to determine the global positions and orientations of the $n$ cameras. We test our method on common SfM datasets, demonstrating high accuracy while maintaining efficiency and robustness, compared to existing methods.
Tasks
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02663v2
PDF	https://arxiv.org/pdf/1904.02663v2.pdf
PWC	https://paperswithcode.com/paper/algebraic-characterization-of-essential
Repo
Framework

Detection of Face Recognition Adversarial Attacks


Title	Detection of Face Recognition Adversarial Attacks
Authors	Fabio Valerio Massoli, Fabio Carrara, Giuseppe Amato, Fabrizio Falchi
Abstract	Deep Learning methods have become state-of-the-art for solving tasks such as Face Recognition (FR). Unfortunately, despite their success, it has been pointed out that these learning models are exposed to adversarial inputs - images to which an imperceptible amount of noise for humans is added to maliciously fool a neural network - thus limiting their adoption in real-world applications. While it is true that an enormous effort has been spent in order to train robust models against this type of threat, adversarial detection techniques have recently started to draw attention within the scientific community. A detection approach has the advantage that it does not require to re-train any model, thus it can be added on top of any system. In this context, we present our work on adversarial samples detection in forensics mainly focused on detecting attacks against FR systems in which the learning model is typically used only as a features extractor. Thus, in these cases, train a more robust classifier might not be enough to defence a FR system. In this frame, the contribution of our work is four-fold: i) we tested our recently proposed adversarial detection approach against classifier attacks, i.e. adversarial samples crafted to fool a FR neural network acting as a classifier; ii) using a k-Nearest Neighbor (kNN) algorithm as a guidance, we generated deep features attacks against a FR system based on a DL model acting as features extractor, followed by a kNN which gives back the query identity based on features similarity; iii) we used the deep features attacks to fool a FR system on the 1:1 Face Verification task and we showed their superior effectiveness with respect to classifier attacks in fooling such type of system; iv) we used the detectors trained on classifier attacks to detect deep features attacks, thus showing that such approach is generalizable to different types of offensives.
Tasks	Face Recognition, Face Verification
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02918v1
PDF	https://arxiv.org/pdf/1912.02918v1.pdf
PWC	https://paperswithcode.com/paper/detection-of-face-recognition-adversarial
Repo
Framework

Learning Markov models via low-rank optimization


Title	Learning Markov models via low-rank optimization
Authors	Ziwei Zhu, Xudong Li, Mengdi Wang, Anru Zhang
Abstract	Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite of a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum likelihood estimation methods: a convex approach with nuclear-norm regularization and a nonconvex approach with rank constraint. We show that both estimators enjoy optimal statistical rates in terms of the Kullback-Leiber divergence and the $\ell_2$ error. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.
Tasks	Decision Making
Published	2019-06-28
URL	https://arxiv.org/abs/1907.00113v1
PDF	https://arxiv.org/pdf/1907.00113v1.pdf
PWC	https://paperswithcode.com/paper/learning-markov-models-via-low-rank
Repo
Framework

Forest structure in epigenetic landscapes


Title	Forest structure in epigenetic landscapes
Authors	Yuriria Cortes-Poza, J. Rogelio Perez-Buendia
Abstract	Morphogenesis is the biological process that causes the emergence and changes of patterns (tissues and organs) in living organisms. It is a robust, self-organising mechanism, governed by Genetic Regulatory Networks (GRN), that hasn’t been thoroughly understood. In this work we propose Epigenetic Forests as a tool to study morphogenesis and to extract valuable information from GRN. Our method unfolds the richness and structure within the GRN. As a case study, we analyze the GRN during cell fate determination during the early stages of development of the flower Arabidopsis thaliana and its spatial dynamics. By using a genetic algorithm we optimize cell differentiation in our model and correctly recover the architecture of the flower.
Tasks
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09386v1
PDF	http://arxiv.org/pdf/1903.09386v1.pdf
PWC	https://paperswithcode.com/paper/forest-structure-in-epigenetic-landscapes
Repo
Framework