January 27, 2020

2983 words 15 mins read

Paper Group ANR 1335

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning. Full-stack Optimization for Accelerating CNNs with FPGA Validation. Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry. Deep Network for Capacitive ECG Denoising. Multi-agent Reinforcement Learning Embedded Game for the Optimization of Bui …

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning


Title	REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
Authors	Ming Jiang, Junjie Hu, Qiuyuan Huang, Lei Zhang, Jana Diesner, Jianfeng Gao
Abstract	Popular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system’s overall effectiveness. This score is often not informative enough to indicate what specific errors are made by a given system. In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems. REO assesses the quality of captions from three perspectives: 1) Relevance to the ground truth, 2) Extraness of the content that is irrelevant to the ground truth, and 3) Omission of the elements in the images and human references. Experiments on three benchmark datasets demonstrate that our method achieves a higher consistency with human judgments and provides more intuitive evaluation results than alternative metrics.
Tasks	Image Captioning
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02217v1
PDF	https://arxiv.org/pdf/1909.02217v1.pdf
PWC	https://paperswithcode.com/paper/reo-relevance-extraness-omission-a-fine
Repo
Framework

Full-stack Optimization for Accelerating CNNs with FPGA Validation


Title	Full-stack Optimization for Accelerating CNNs with FPGA Validation
Authors	Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong
Abstract	We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference latency, energy efficiency, hardware utilization and inference accuracy. As a validation vehicle, we have implemented a 170MHz FPGA inference chip achieving 2.28ms latency for the ImageNet benchmark. The achieved latency is among the lowest reported in the literature while achieving comparable accuracy. However, our chip shines in that it has 9x higher energy efficiency compared to other implementations achieving comparable latency. A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware. For instance, compared to a FPGA implementation for a traditional 8-bit MAC, SAC substantially reduces required hardware resources (4.85x fewer Look-up Tables) and power consumption (2.48x).
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00462v1
PDF	http://arxiv.org/pdf/1905.00462v1.pdf
PWC	https://paperswithcode.com/paper/full-stack-optimization-for-accelerating-cnns
Repo
Framework

Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry


Title	Beyond Tracking: Selecting Memory and Refining Poses for Deep Visual Odometry
Authors	Fei Xue, Xin Wang, Shunkai Li, Qiuyuan Wang, Junqiu Wang, Hongbin Zha
Abstract	Most previous learning-based visual odometry (VO) methods take VO as a pure tracking problem. In contrast, we present a VO framework by incorporating two additional components called Memory and Refining. The Memory component preserves global information by employing an adaptive and efficient selection strategy. The Refining component ameliorates previous results with the contexts stored in the Memory by adopting a spatial-temporal attention mechanism for feature distilling. Experiments on the KITTI and TUM-RGBD benchmark datasets demonstrate that our method outperforms state-of-the-art learning-based methods by a large margin and produces competitive results against classic monocular VO approaches. Especially, our model achieves outstanding performance in challenging scenarios such as texture-less regions and abrupt motions, where classic VO algorithms tend to fail.
Tasks	Visual Odometry
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01892v2
PDF	http://arxiv.org/pdf/1904.01892v2.pdf
PWC	https://paperswithcode.com/paper/beyond-tracking-selecting-memory-and-refining
Repo
Framework

Deep Network for Capacitive ECG Denoising


Title	Deep Network for Capacitive ECG Denoising
Authors	Vignesh Ravichandran, Balamurali Murugesan, Sharath M Shankaranarayana, Keerthi Ram, Preejith S. P, Jayaraj Joseph, Mohanasankar Sivaprakasam
Abstract	Continuous monitoring of cardiac health under free living condition is crucial to provide effective care for patients undergoing post operative recovery and individuals with high cardiac risk like the elderly. Capacitive Electrocardiogram (cECG) is one such technology which allows comfortable and long term monitoring through its ability to measure biopotential in conditions without having skin contact. cECG monitoring can be done using many household objects like chairs, beds and even car seats allowing for seamless monitoring of individuals. This method is unfortunately highly susceptible to motion artifacts which greatly limits its usage in clinical practice. The current use of cECG systems has been limited to performing rhythmic analysis. In this paper we propose a novel end-to-end deep learning architecture to perform the task of denoising capacitive ECG. The proposed network is trained using motion corrupted three channel cECG and a reference LEAD I ECG collected on individuals while driving a car. Further, we also propose a novel joint loss function to apply loss on both signal and frequency domain. We conduct extensive rhythmic analysis on the model predictions and the ground truth. We further evaluate the signal denoising using Mean Square Error(MSE) and Cross Correlation between model predictions and ground truth. We report MSE of 0.167 and Cross Correlation of 0.476. The reported results highlight the feasibility of performing morphological analysis using the filtered cECG. The proposed approach can allow for continuous and comprehensive monitoring of the individuals in free living conditions.
Tasks	Denoising, ECG Denoising, Electrocardiography (ECG), Morphological Analysis
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12536v1
PDF	http://arxiv.org/pdf/1903.12536v1.pdf
PWC	https://paperswithcode.com/paper/deep-network-for-capacitive-ecg-denoising
Repo
Framework

Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning


Title	Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning
Authors	Jun Hao
Abstract	Most of the current game-theoretic demand-side management methods focus primarily on the scheduling of home appliances, and the related numerical experiments are analyzed under various scenarios to achieve the corresponding Nash-equilibrium (NE) and optimal results. However, not much work is conducted for academic or commercial buildings. The methods for optimizing academic-buildings are distinct from the optimal methods for home appliances. In my study, we address a novel methodology to control the operation of heating, ventilation, and air conditioning system (HVAC). With the development of Artificial Intelligence and computer technologies, reinforcement learning (RL) can be implemented in multiple realistic scenarios and help people to solve thousands of real-world problems. Reinforcement Learning, which is considered as the art of future AI, builds the bridge between agents and environments through Markov Decision Chain or Neural Network and has seldom been used in power system. The art of RL is that once the simulator for a specific environment is built, the algorithm can keep learning from the environment. Therefore, RL is capable of dealing with constantly changing simulator inputs such as power demand, the condition of power system and outdoor temperature, etc. Compared with the existing distribution power system planning mechanisms and the related game theoretical methodologies, our proposed algorithm can plan and optimize the hourly energy usage, and have the ability to corporate with even shorter time window if needed.
Tasks	Multi-agent Reinforcement Learning
Published	2019-01-17
URL	http://arxiv.org/abs/1901.07333v1
PDF	http://arxiv.org/pdf/1901.07333v1.pdf
PWC	https://paperswithcode.com/paper/multi-agent-reinforcement-learning-embedded
Repo
Framework

CoCoNet: A Collaborative Convolutional Network


Title	CoCoNet: A Collaborative Convolutional Network
Authors	Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal
Abstract	We present an end-to-end CNN architecture for fine-grained visual recognition called Collaborative Convolutional Network (CoCoNet). The network uses a collaborative filter after the convolutional layers to represent an image as an optimal weighted collaboration of features learned from training samples as a whole rather than one at a time. This gives CoCoNet more power to encode the fine-grained nature of the data with limited samples in an end-to-end fashion. We perform a detailed study of the performance with 1-stage and 2-stage transfer learning and different configurations with benchmark architectures like AlexNet and VggNet. The ablation study shows that the proposed method outperforms its constituent parts considerably and consistently. CoCoNet also outperforms the baseline popular deep learning based fine-grained recognition method, namely Bilinear-CNN (BCNN) with statistical significance. Experiments have been performed on the fine-grained species recognition problem, but the method is general enough to be applied to other similar tasks. Lastly, we also introduce a new public dataset for fine-grained species recognition, that of Indian endemic birds and have reported initial results on it. The training metadata and new dataset are available through the corresponding author.
Tasks	Fine-Grained Visual Recognition, Transfer Learning
Published	2019-01-28
URL	https://arxiv.org/abs/1901.09886v3
PDF	https://arxiv.org/pdf/1901.09886v3.pdf
PWC	https://paperswithcode.com/paper/coconet-a-collaborative-convolutional-network
Repo
Framework

TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACs


Title	TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACs
Authors	Ali Mirzaeian, Houman Homayoun, Avesta Sasan
Abstract	In this paper, we first propose the design of Temporal-Carry-deferring MAC (TCD-MAC) and illustrate how our proposed solution can gain significant energy and performance benefit when utilized to process a stream of input data. We then propose using the TCD-MAC to build a reconfigurable, high speed, and low power Neural Processing Engine (TCD-NPE). We, further, propose a novel scheduler that lists the sequence of needed processing events to process an MLP model in the least number of computational rounds in our proposed TCD-NPE. We illustrate that our proposed TCD-NPE significantly outperform similar neural processing solutions that use conventional MACs in terms of both energy consumption and execution time.
Tasks
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06458v1
PDF	https://arxiv.org/pdf/1910.06458v1.pdf
PWC	https://paperswithcode.com/paper/tcd-npe-a-re-configurable-and-efficient
Repo
Framework

Are Disentangled Representations Helpful for Abstract Visual Reasoning?


Title	Are Disentangled Representations Helpful for Abstract Visual Reasoning?
Authors	Sjoerd van Steenkiste, Francesco Locatello, Jürgen Schmidhuber, Olivier Bachem
Abstract	A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven’s Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better down-stream performance. In particular, they enable quicker learning using fewer samples.
Tasks	Visual Reasoning
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12506v3
PDF	https://arxiv.org/pdf/1905.12506v3.pdf
PWC	https://paperswithcode.com/paper/are-disentangled-representations-helpful-for
Repo
Framework

Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling


Title	Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling
Authors	Siyuan Feng, Tan Lee
Abstract	This research addresses the problem of acoustic modeling of low-resource languages for which transcribed training data is absent. The goal is to learn robust frame-level feature representations that can be used to identify and distinguish subword-level speech units. The proposed feature representations comprise various types of multilingual bottleneck features (BNFs) that are obtained via multi-task learning of deep neural networks (MTL-DNN). One of the key problems is how to acquire high-quality frame labels for untranscribed training data to facilitate supervised DNN training. It is shown that learning of robust BNF representations can be achieved by effectively leveraging transcribed speech data and well-trained automatic speech recognition (ASR) systems from one or more out-of-domain (resource-rich) languages. Out-of-domain ASR systems can be applied to perform speaker adaptation with untranscribed training data of the target language, and to decode the training speech into frame-level labels for DNN training. It is also found that better frame labels can be generated by considering temporal dependency in speech when performing frame clustering. The proposed methods of feature learning are evaluated on the standard task of unsupervised subword modeling in Track 1 of the ZeroSpeech 2017 Challenge. The best performance achieved by our system is $9.7%$ in terms of across-speaker triphone minimal-pair ABX error rate, which is comparable to the best systems reported recently. Lastly, our investigation reveals that the closeness between target languages and out-of-domain languages and the amount of available training data for individual target languages could have significant impact on the goodness of learned features.
Tasks	Multi-Task Learning, Speech Recognition
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03538v2
PDF	https://arxiv.org/pdf/1908.03538v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-cross-lingual-speaker-and-phonetic
Repo
Framework

Accelerating DNN Training in Wireless Federated Edge Learning System


Title	Accelerating DNN Training in Wireless Federated Edge Learning System
Authors	Jinke Ren, Guanding Yu, Guangyao Ding
Abstract	Training task in classical machine learning models, such as deep neural networks (DNN), is generally implemented at the remote computationally-adequate cloud center for centralized learning, which is typically time-consuming and resource-hungry. It also incurs serious privacy issue and long communication latency since massive data are transmitted to the centralized node. To overcome these shortcomings, we consider a newly-emerged framework, namely federated edge learning (FEEL), to aggregate the local learning updates at the edge server instead of users’ raw data. Aiming at accelerating the training process while guaranteeing the learning accuracy, we first define a novel performance evaluation criterion, called learning efficiency and formulate a training acceleration optimization problem in the CPU scenario, where each user device is equipped with CPU. The closed-form expressions for joint batchsize selection and communication resource allocation are developed and some insightful results are also highlighted. Further, we extend our learning framework into the GPU scenario and propose a novel training function to characterize the learning property of general GPU modules. The optimal solution in this case is manifested to have the similar structure as that of the CPU scenario, recommending that our proposed algorithm is applicable in more general systems. Finally, extensive experiments validate our theoretical analysis and demonstrate that our proposal can reduce the training time and improve the learning accuracy simultaneously.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09712v2
PDF	https://arxiv.org/pdf/1905.09712v2.pdf
PWC	https://paperswithcode.com/paper/accelerating-dnn-training-in-wireless
Repo
Framework

Studying Topology of Time Lines Graph leads to an alternative approach to the Newcomb’s Paradox


Title	Studying Topology of Time Lines Graph leads to an alternative approach to the Newcomb’s Paradox
Authors	Giuseppe Giacopelli
Abstract	The Newcomb’s paradox is one of the most known paradox in Game Theory about the Oracles. We will define the graph associated to the time lines of the Game. After this Studying its topology and using only the Expected Utility Principle we will formulate a solution of the paradox able to explain all the classical cases.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.09311v1
PDF	https://arxiv.org/pdf/1910.09311v1.pdf
PWC	https://paperswithcode.com/paper/studying-topology-of-time-lines-graph-leads
Repo
Framework

Online Allocation and Pricing: Constant Regret via Bellman Inequalities


Title	Online Allocation and Pricing: Constant Regret via Bellman Inequalities
Authors	Alberto Vera, Siddhartha Banerjee, Itai Gurvich
Abstract	We develop a framework for designing tractable heuristics for Markov Decision Processes (MDP), and use it to obtain constant regret policies for a variety of online allocation problems, including online packing, budget-constrained probing, dynamic pricing, and online contextual bandits with knapsacks. Our approach is based on adaptively constructing a benchmark for the value function, which we then use to select our actions. The centerpiece of our framework are the Bellman Inequalities, which allow us to create benchmarks which both have access to future information, and also, can violate the one-step optimality equations (i.e., Bellman equations). The flexibility of balancing these allows us to get policies which are both tractable and have strong performance guarantees – in particular, our constant-regret policies only require solving an LP for selecting each action.
Tasks	Multi-Armed Bandits
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06361v1
PDF	https://arxiv.org/pdf/1906.06361v1.pdf
PWC	https://paperswithcode.com/paper/online-allocation-and-pricing-constant-regret
Repo
Framework

Bootstrapping Upper Confidence Bound


Title	Bootstrapping Upper Confidence Bound
Authors	Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng
Abstract	Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap. To improve its finite sample performance, we further incorporate second-order correction into the above construction. In theory, we derive both problem-dependent and problem-independent regret bounds for multi-armed bandits under a much weaker tail assumption than the standard sub-Gaussianity. Numerical results demonstrate significant regret reductions by our method, in comparison with several baselines in a range of multi-armed and linear bandit problems.
Tasks	Decision Making, Multi-Armed Bandits
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05247v3
PDF	https://arxiv.org/pdf/1906.05247v3.pdf
PWC	https://paperswithcode.com/paper/bootstrapping-upper-confidence-bound
Repo
Framework

Detecting anthropogenic cloud perturbations with deep learning


Title	Detecting anthropogenic cloud perturbations with deep learning
Authors	Duncan Watson-Parris, Samuel Sutherland, Matthew Christensen, Anthony Caterini, Dino Sejdinovic, Philip Stier
Abstract	One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth’s energy balance. Aerosols provide the `seeds’ on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects. \|
Tasks
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13061v1
PDF	https://arxiv.org/pdf/1911.13061v1.pdf
PWC	https://paperswithcode.com/paper/detecting-anthropogenic-cloud-perturbations
Repo
Framework

Understanding Bias in Machine Learning


Title	Understanding Bias in Machine Learning
Authors	Jindong Gu, Daniela Oelke
Abstract	Bias is known to be an impediment to fair decisions in many domains such as human resources, the public sector, health care etc. Recently, hope has been expressed that the use of machine learning methods for taking such decisions would diminish or even resolve the problem. At the same time, machine learning experts warn that machine learning models can be biased as well. In this article, our goal is to explain the issue of bias in machine learning from a technical perspective and to illustrate the impact that biased data can have on a machine learning model. To reach such a goal, we develop interactive plots to visualizing the bias learned from synthetic data.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1909.01866v1
PDF	https://arxiv.org/pdf/1909.01866v1.pdf
PWC	https://paperswithcode.com/paper/understanding-bias-in-machine-learning
Repo
Framework