January 27, 2020

3009 words 15 mins read

Paper Group ANR 1190

LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement. Deep Scale-spaces: Equivariance Over Scale. Learning Invariant Representations for Sentiment Analysis: The Missing Material is Datasets. A Distraction Score for Watermarks. Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control. Proceedings 7th …

LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement


Title	LED2Net: Deep Illumination-aware Dehazing with Low-light and Detail Enhancement
Authors	Guisik Kim, Junseok Kwon
Abstract	We present a novel dehazing and low-light enhancement method based on an illumination map that is accurately estimated by a convolutional neural network (CNN). In this paper, the illumination map is used as a component for three different tasks, namely, atmospheric light estimation, transmission map estimation, and low-light enhancement. To train CNNs for dehazing and low-light enhancement simultaneously based on the retinex theory, we synthesize numerous low-light and hazy images from normal hazy images from the FADE data set. In addition, we further improve the network using detail enhancement. Experimental results demonstrate that our method surpasses recent state-of-theart algorithms quantitatively and qualitatively. In particular, our haze-free images present vivid colors and enhance visibility without a halo effect or color distortion.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05119v2
PDF	https://arxiv.org/pdf/1906.05119v2.pdf
PWC	https://paperswithcode.com/paper/led2net-deep-illumination-aware-dehazing-with
Repo
Framework

Deep Scale-spaces: Equivariance Over Scale


Title	Deep Scale-spaces: Equivariance Over Scale
Authors	Daniel E. Worrall, Max Welling
Abstract	We introduce deep scale-spaces (DSS), a generalization of convolutional neural networks, exploiting the scale symmetry structure of conventional image recognition tasks. Put plainly, the class of an image is invariant to the scale at which it is viewed. We construct scale equivariant cross-correlations based on a principled extension of convolutions, grounded in the theory of scale-spaces and semigroups. As a very basic operation, these cross-correlations can be used in almost any modern deep learning architecture in a plug-and-play manner. We demonstrate our networks on the Patch Camelyon and Cityscapes datasets, to prove their utility and perform introspective studies to further understand their properties.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11697v1
PDF	https://arxiv.org/pdf/1905.11697v1.pdf
PWC	https://paperswithcode.com/paper/deep-scale-spaces-equivariance-over-scale
Repo
Framework

Learning Invariant Representations for Sentiment Analysis: The Missing Material is Datasets


Title	Learning Invariant Representations for Sentiment Analysis: The Missing Material is Datasets
Authors	Victor Bouvier, Philippe Very, Céline Hudelot, Clément Chastagnol
Abstract	Learning representations which remain invariant to a nuisance factor has a great interest in Domain Adaptation, Transfer Learning, and Fair Machine Learning. Finding such representations becomes highly challenging in NLP tasks since the nuisance factor is entangled in a raw text. To our knowledge, a major issue is also that only few NLP datasets allow assessing the impact of such factor. In this paper, we introduce two generalization metrics to assess model robustness to a nuisance factor: \textit{generalization under target bias} and \textit{generalization onto unknown}. We combine those metrics with a simple data filtering approach to control the impact of the nuisance factor on the data and thus to build experimental biased datasets. We apply our method to standard datasets of the literature (\textit{Amazon} and \textit{Yelp}). Our work shows that a simple text classification baseline (i.e., sentiment analysis on reviews) may be badly affected by the \textit{product ID} (considered as a nuisance factor) when learning the polarity of a review. The method proposed is generic and applicable as soon as the nuisance variable is annotated in the dataset.
Tasks	Domain Adaptation, Sentiment Analysis, Text Classification, Transfer Learning
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12305v1
PDF	https://arxiv.org/pdf/1907.12305v1.pdf
PWC	https://paperswithcode.com/paper/learning-invariant-representations-for
Repo
Framework

A Distraction Score for Watermarks


Title	A Distraction Score for Watermarks
Authors	Aurelia Guy, Sema Berkiten
Abstract	In this work we propose a novel technique to quantify how distracting watermarks are on an image. We begin with watermark detection using a two-tower CNN model composed of a binary classification task and a semantic segmentation prediction. With this model, we demonstrate significant improvement in image precision while maintaining per-pixel accuracy, especially for our real-world dataset with sparse positive examples. We fit a nonlinear function to represent detected watermarks by a single score correlated with human perception based on their size, location, and visual obstructiveness. Finally, we validate our method in an image ranking setup, which is the main application of our watermark scoring algorithm.
Tasks	Semantic Segmentation
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03651v1
PDF	https://arxiv.org/pdf/1908.03651v1.pdf
PWC	https://paperswithcode.com/paper/a-distraction-score-for-watermarks
Repo
Framework

Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control


Title	Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control
Authors	Yu-Wei Chao, Jimei Yang, Weifeng Chen, Jia Deng
Abstract	Recent progress on physics-based character animation has shown impressive breakthroughs on human motion synthesis, through the imitation of motion capture data via deep reinforcement learning. However, results have mostly been demonstrated on imitating a single distinct motion pattern, and do not generalize to interactive tasks that require flexible motion patterns due to varying human-object spatial configurations. In this paper, we focus on one class of interactive task—sitting onto a chair. We propose a hierarchical reinforcement learning framework which relies on a collection of subtask controllers trained to imitate simple, reusable mocap motions, and a meta controller trained to execute the subtasks properly to complete the main task. We experimentally demonstrate the strength of our approach over different single level and hierarchical baselines. We also show that our approach can be applied to motion prediction given an image input. A video highlight can be found at https://youtu.be/3CeN0OGz2cA.
Tasks	Hierarchical Reinforcement Learning, Motion Capture, motion prediction
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07423v1
PDF	https://arxiv.org/pdf/1908.07423v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-sit-synthesizing-human-chair
Repo
Framework

Proceedings 7th International Workshop on Theorem proving components for Educational software


Title	Proceedings 7th International Workshop on Theorem proving components for Educational software
Authors	Pedro Quaresma, Walther Neuper
Abstract	The 7th International Workshop on Theorem proving components for Educational software (ThEdu’18) was held in Oxford, United Kingdom, on 18 July 2018. It was associated to the conference, Federated Logic Conference 2018 (FLoC2018). The major aim of the ThEdu workshop series was to link developers interested in adapting Computer Theorem Proving (TP) to the needs of education and to inform mathematicians and mathematics educators about TP’s potential for educational software. Topics of interest include: methods of automated deduction applied to checking students’ input; methods of automated deduction applied to prove post-conditions for particular problem solutions; combinations of deduction and computation enabling systems to propose next steps; automated provers specific for dynamic geometry systems; proof and proving in mathematics education. ThEdu’18 was a vibrant workshop, with one invited talk and six contributions. It triggered the post-proceedings at hand.
Tasks	Automated Theorem Proving
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12402v1
PDF	http://arxiv.org/pdf/1903.12402v1.pdf
PWC	https://paperswithcode.com/paper/proceedings-7th-international-workshop-on
Repo
Framework

Sensor fusion using EMG and vision for hand gesture classification in mobile applications


Title	Sensor fusion using EMG and vision for hand gesture classification in mobile applications
Authors	Enea Ceolini, Gemma Taverni, Lyes Khacef, Melika Payvand, Elisa Donati
Abstract	The discrimination of human gestures using wearable solutions is extremely important as a supporting technique for assisted living, healthcare of the elderly and neurorehabilitation. This paper presents a mobile electromyography (EMG) analysis framework to be an auxiliary component in physiotherapy sessions or as a feedback for neuroprosthesis calibration. We implemented a framework that allows the integration of multisensors, EMG and visual information, to perform sensor fusion and to improve the accuracy of hand gesture recognition tasks. In particular, we used an event-based camera adapted to run on the limited computational resources of mobile phones. We introduced a new publicly available dataset of sensor fusion for hand gesture recognition recorded from 10 subjects and used it to train the recognition models offline. We compare the online results of the hand gesture recognition using the fusion approach with the individual sensors with an improvement in the accuracy of 13% and 11%, for EMG and vision respectively, reaching 85%.
Tasks	Calibration, Electromyography (EMG), Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition, Sensor Fusion
Published	2019-10-19
URL	https://arxiv.org/abs/1910.11126v1
PDF	https://arxiv.org/pdf/1910.11126v1.pdf
PWC	https://paperswithcode.com/paper/sensor-fusion-using-emg-and-vision-for-hand
Repo
Framework

A New GNG Graph-Based Hand Gesture Recognition Approach


Title	A New GNG Graph-Based Hand Gesture Recognition Approach
Authors	Narges Mirehi, Maryam Tahmasbi
Abstract	Hand Gesture Recognition (HGR) is of major importance for Human-Computer Interaction (HCI) applications. In this paper, we present a new hand gesture recognition approach called GNG-IEMD. In this approach, first, we use a Growing Neural Gas (GNG) graph to model the image. Then we extract features from this graph. These features are not geometric or pixel-based, so do not depend on scale, rotation, and articulation. The dissimilarity between hand gestures is measured with a novel Improved Earth Mover\textquotesingle s Distance (IEMD) metric. We evaluate the performance of the proposed approach on challenging public datasets including NTU Hand Digits, HKU, HKU multi-angle, and UESTC-ASL and compare the results with state-of-the-art approaches. The experimental results demonstrate the performance of the proposed approach.
Tasks	Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03534v1
PDF	https://arxiv.org/pdf/1909.03534v1.pdf
PWC	https://paperswithcode.com/paper/a-new-gng-graph-based-hand-gesture
Repo
Framework

Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding


Title	Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding
Authors	Rishi Chhatwal, Peter Gronvall, Nathaniel Huber-Fliflet, Robert Keeling, Jianping Zhang, Haozhen Zhao
Abstract	In today’s legal environment, lawsuits and regulatory investigations require companies to embark upon increasingly intensive data-focused engagements to identify, collect and analyze large quantities of data. When documents are staged for review the process can require companies to dedicate an extraordinary level of resources, both with respect to human resources, but also with respect to the use of technology-based techniques to intelligently sift through data. For several years, attorneys have been using a variety of tools to conduct this exercise, and most recently, they are accepting the use of machine learning techniques like text classification to efficiently cull massive volumes of data to identify responsive documents for use in these matters. In recent years, a group of AI and Machine Learning researchers have been actively researching Explainable AI. In an explainable AI system, actions or decisions are human understandable. In typical legal `document review’ scenarios, a document can be identified as responsive, as long as one or more of the text snippets in a document are deemed responsive. In these scenarios, if predictive coding can be used to locate these responsive snippets, then attorneys could easily evaluate the model’s document classification decision. When deployed with defined and explainable results, predictive coding can drastically enhance the overall quality and speed of the document review process by reducing the time it takes to review documents. The authors of this paper propose the concept of explainable predictive coding and simple explainable predictive coding methods to locate responsive snippets within responsive documents. We also report our preliminary experimental results using the data from an actual legal matter that entailed this type of document review. \|
Tasks	Document Classification, Text Classification
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01721v1
PDF	http://arxiv.org/pdf/1904.01721v1.pdf
PWC	https://paperswithcode.com/paper/explainable-text-classification-in-legal
Repo
Framework

Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network


Title	Legal Judgment Prediction via Multi-Perspective Bi-Feedback Network
Authors	Wenmian Yang, Weijia Jia, XIaojie Zhou, Yutao Luo
Abstract	The Legal Judgment Prediction (LJP) is to determine judgment results based on the fact descriptions of the cases. LJP usually consists of multiple subtasks, such as applicable law articles prediction, charges prediction, and the term of the penalty prediction. These multiple subtasks have topological dependencies, the results of which affect and verify each other. However, existing methods use dependencies of results among multiple subtasks inefficiently. Moreover, for cases with similar descriptions but different penalties, current methods cannot predict accurately because the word collocation information is ignored. In this paper, we propose a Multi-Perspective Bi-Feedback Network with the Word Collocation Attention mechanism based on the topology structure among subtasks. Specifically, we design a multi-perspective forward prediction and backward verification framework to utilize result dependencies among multiple subtasks effectively. To distinguish cases with similar descriptions but different penalties, we integrate word collocations features of fact descriptions into the network via an attention mechanism. The experimental results show our model achieves significant improvements over baselines on all prediction tasks.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.03969v2
PDF	https://arxiv.org/pdf/1905.03969v2.pdf
PWC	https://paperswithcode.com/paper/legal-judgment-prediction-via-multi
Repo
Framework

Adversarial Attacks on Graph Neural Networks via Meta Learning


Title	Adversarial Attacks on Graph Neural Networks via Meta Learning
Authors	Daniel Zügner, Stephan Günnemann
Abstract	Deep learning models for graphs have advanced the state of the art on many tasks. Despite their recent success, little is known about their robustness. We investigate training time attacks on graph neural networks for node classification that perturb the discrete graph structure. Our core principle is to use meta-gradients to solve the bilevel problem underlying training-time attacks, essentially treating the graph as a hyperparameter to optimize. Our experiments show that small graph perturbations consistently lead to a strong decrease in performance for graph convolutional networks, and even transfer to unsupervised embeddings. Remarkably, the perturbations created by our algorithm can misguide the graph neural networks such that they perform worse than a simple baseline that ignores all relational information. Our attacks do not assume any knowledge about or access to the target classifiers.
Tasks	Meta-Learning, Node Classification
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08412v1
PDF	http://arxiv.org/pdf/1902.08412v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-on-graph-neural-networks
Repo
Framework

Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications


Title	Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications
Authors	Kayvan Bijari, Hadi Zare, Emad Kebriaei, Hadi Veisi
Abstract	Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this regard is structuring a graph that encodes and encompasses the features of textual data for the effective machine learning algorithm. Besides, exploration and exploiting of semantic relations is regarded as a principal step in text mining applications. However, most of the traditional text mining methods perform somewhat poor in terms of employing such relations. In this paper, we propose a sentence-level graph-based text representation which includes stop words to consider semantic and term relations. Then, we employ a representation learning approach on the combined graphs of sentences to extract the latent and continuous features of the documents. Eventually, the learned features of the documents are fed into a deep neural network for the sentiment classification task. The experimental results demonstrate that the proposed method substantially outperforms the related sentiment analysis approaches based on several benchmark datasets. Furthermore, our method can be generalized on different datasets without any dependency on pre-trained word embeddings.
Tasks	Document Classification, Information Retrieval, Opinion Mining, Representation Learning, Sentiment Analysis, Word Embeddings
Published	2019-02-23
URL	https://arxiv.org/abs/1902.10247v3
PDF	https://arxiv.org/pdf/1902.10247v3.pdf
PWC	https://paperswithcode.com/paper/deep-sentiment-analysis-using-a-graph-based
Repo
Framework

Unsupervised Pose Flow Learning for Pose Guided Synthesis


Title	Unsupervised Pose Flow Learning for Pose Guided Synthesis
Authors	Haitian Zheng, Lele Chen, Chenliang Xu, Jiebo Luo
Abstract	Pose guided synthesis aims to generate a new image in an arbitrary target pose while preserving the appearance details from the source image. Existing approaches rely on either hard-coded spatial transformations or 3D body modeling. They often overlook complex non-rigid pose deformation or unmatched occluded regions, thus fail to effectively preserve appearance information. In this paper, we propose an unsupervised pose flow learning scheme that learns to transfer the appearance details from the source image. Based on such learned pose flow, we proposed GarmentNet and SynthesisNet, both of which use multi-scale feature-domain alignment for coarse-to-fine synthesis. Experiments on the DeepFashion, MVC dataset and additional real-world datasets demonstrate that our approach compares favorably with the state-of-the-art methods and generalizes to unseen poses and clothing styles.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13819v1
PDF	https://arxiv.org/pdf/1909.13819v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-pose-flow-learning-for-pose
Repo
Framework

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition


Title	A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Authors	Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny
Abstract	Modern Automatic Speech Recognition (ASR) systems rely on distributed deep learning to for quick training completion. To enable efficient distributed training, it is imperative that the training algorithms can converge with a large mini-batch size. In this work, we discovered that Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) can work with much larger batch size than commonly used Synchronous SGD (SSGD) algorithm. On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale. Further, we proposed a Hierarchical-ADPSGD (H-ADPSGD) system in which learners on the same computing node construct a super learner via a fast allreduce implementation, and super learners deploy ADPSGD algorithm among themselves. On a 64 Nvidia V100 GPU cluster connected via a 100Gb/s Ethernet network, our system is able to train SWB-2000 to reach a 7.6% WER on the Hub5-2000 Switchboard (SWB) test-set and a 13.2% WER on the Call-home (CH) test-set in 5.2 hours. To the best of our knowledge, this is the fastest ASR training system that attains this level of model accuracy for SWB-2000 task to be ever reported in the literature.
Tasks	Speech Recognition
Published	2019-07-10
URL	https://arxiv.org/abs/1907.05701v1
PDF	https://arxiv.org/pdf/1907.05701v1.pdf
PWC	https://paperswithcode.com/paper/a-highly-efficient-distributed-deep-learning
Repo
Framework

A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor


Title	A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor
Authors	Di Sheng, György Fazekas
Abstract	In this paper, a siamese DNN model is proposed to learn the characteristics of the audio dynamic range compressor (DRC). This facilitates an intelligent control system that uses audio examples to configure the DRC, a widely used non-linear audio signal conditioning technique in the areas of music production, speech communication and broadcasting. Several alternative siamese DNN architectures are proposed to learn feature embeddings that can characterise subtle effects due to dynamic range compression. These models are compared with each other as well as handcrafted features proposed in previous work. The evaluation of the relations between the hyperparameters of DNN and DRC parameters are also provided. The best model is able to produce a universal feature embedding that is capable of predicting multiple DRC parameters simultaneously, which is a significant improvement from our previous research. The feature embedding shows better performance than handcrafted audio features when predicting DRC parameters for both mono-instrument audio loops and polyphonic music pieces.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.01022v1
PDF	http://arxiv.org/pdf/1905.01022v1.pdf
PWC	https://paperswithcode.com/paper/a-feature-learning-siamese-model-for
Repo
Framework