April 3, 2020

3439 words 17 mins read

Paper Group AWR 56

ARAE: Adversarially Robust Training of Autoencoders Improves Novelty Detection. Entropy Minimization vs. Diversity Maximization for Domain Adaptation. Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format. Social navigation with human empowerment driven reinforcement learning. Tensor-to-Vector Regression f …

ARAE: Adversarially Robust Training of Autoencoders Improves Novelty Detection


Title	ARAE: Adversarially Robust Training of Autoencoders Improves Novelty Detection
Authors	Mohammadreza Salehi, Atrin Arya, Barbod Pajoum, Mohammad Otoofi, Amirreza Shaeiri, Mohammad Hossein Rohban, Hamid R. Rabiee
Abstract	Autoencoders (AE) have recently been widely employed to approach the novelty detection problem. Trained only on the normal data, the AE is expected to reconstruct the normal data effectively while fail to regenerate the anomalous data, which could be utilized for novelty detection. However, in this paper, it is demonstrated that this does not always hold. AE often generalizes so perfectly that it can also reconstruct the anomalous data well. To address this problem, we propose a novel AE that can learn more semantically meaningful features. Specifically, we exploit the fact that adversarial robustness promotes learning of meaningful features. Therefore, we force the AE to learn such features by penalizing networks with a bottleneck layer that is unstable against adversarial perturbations. We show that despite using a much simpler architecture in comparison to the prior methods, the proposed AE outperforms or is competitive to state-of-the-art on three benchmark datasets.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05669v1
PDF	https://arxiv.org/pdf/2003.05669v1.pdf
PWC	https://paperswithcode.com/paper/arae-adversarially-robust-training-of
Repo	https://github.com/rohban-lab/Salehi_submitted_2020
Framework	pytorch

Entropy Minimization vs. Diversity Maximization for Domain Adaptation


Title	Entropy Minimization vs. Diversity Maximization for Domain Adaptation
Authors	Xiaofu Wu, Suofei hang, Quan Zhou, Zhen Yang, Chunming Zhao, Longin Jan Latecki
Abstract	Entropy minimization has been widely used in unsupervised domain adaptation (UDA). However, existing works reveal that entropy minimization only may result into collapsed trivial solutions. In this paper, we propose to avoid trivial solutions by further introducing diversity maximization. In order to achieve the possible minimum target risk for UDA, we show that diversity maximization should be elaborately balanced with entropy minimization, the degree of which can be finely controlled with the use of deep embedded validation in an unsupervised manner. The proposed minimal-entropy diversity maximization (MEDM) can be directly implemented by stochastic gradient descent without use of adversarial learning. Empirical evidence demonstrates that MEDM outperforms the state-of-the-art methods on four popular domain adaptation datasets.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01690v1
PDF	https://arxiv.org/pdf/2002.01690v1.pdf
PWC	https://paperswithcode.com/paper/entropy-minimization-vs-diversity
Repo	https://github.com/AI-NERC-NUPT/MEDM
Framework	pytorch

Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format


Title	Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format
Authors	Jimmy Lin, Joel Mackenzie, Chris Kamphuis, Craig Macdonald, Antonio Mallia, Michał Siedlaczek, Andrew Trotman, Arjen de Vries
Abstract	There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08276v1
PDF	https://arxiv.org/pdf/2003.08276v1.pdf
PWC	https://paperswithcode.com/paper/supporting-interoperability-between-open
Repo	https://github.com/osirrc/ciff
Framework	none


Title	Social navigation with human empowerment driven reinforcement learning
Authors	Tessa van der Heiden, Christian Weiss, Naveen Nagaraja Shankar, Herke van Hoof
Abstract	The next generation of mobile robots needs to be socially-compliant to be accepted by humans. As simple as this task may seem, defining compliance formally is not trivial. Yet, classical reinforcement learning (RL) relies upon hard-coded reward signals. In this work, we go beyond this approach and provide the agent with intrinsic motivation using empowerment. Empowerment maximizes the influence of an agent on its near future and has been shown to be a good model for biological behaviors. It also has been used for artificial agents to learn complicated and generalized actions. Self-empowerment maximizes the influence of an agent on its future. On the contrary, our robot strives for the empowerment of people in its environment, so they are not disturbed by the robot when pursuing their goals. We show that our robot has a positive influence on humans, as it minimizes the travel time and distance of humans while moving efficiently to its own goal. The method can be used in any multi-agent system that requires a robot to solve a particular task involving humans interactions.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08158v2
PDF	https://arxiv.org/pdf/2003.08158v2.pdf
PWC	https://paperswithcode.com/paper/social-navigation-with-human-empowerment
Repo	https://github.com/tessavdheiden/SCR
Framework	none

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network


Title	Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network
Authors	Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Abstract	We propose a tensor-to-vector regression approach to multi-channel speech enhancement in order to address the issue of input size explosion and hidden-layer size expansion. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. TTN is a recently emerged solution for compact representation of deep models with fully connected hidden layers. Thus TTN maintains DNN’s expressive power yet involves a much smaller amount of trainable parameters. Furthermore, TTN can handle a multi-dimensional tensor input by design, which exactly matches the desired setting in multi-channel speech enhancement. We first provide a theoretical extension from DNN to TTN based regression. Next, we show that TTN can attain speech enhancement quality comparable with that for DNN but with much fewer parameters, e.g., a reduction from 27 million to only 5 million parameters is observed in a single-channel scenario. TTN also improves PESQ over DNN from 2.86 to 2.96 by slightly increasing the number of trainable parameters. Finally, in 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06. Our implementation is available online https://github.com/uwjunqi/Tensor-Train-Neural-Network.
Tasks	Speech Enhancement
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00544v1
PDF	https://arxiv.org/pdf/2002.00544v1.pdf
PWC	https://paperswithcode.com/paper/tensor-to-vector-regression-for-multi-channel
Repo	https://github.com/uwjunqi/Tensor-Train-Neural-Network
Framework	pytorch

Neural Architecture Search for Deep Image Prior


Title	Neural Architecture Search for Deep Image Prior
Authors	Kary Ho, Andrew Gilbert, Hailin Jin, John Collomosse
Abstract	We present a neural architecture search (NAS) technique to enhance the performance of unsupervised image de-noising, in-painting and super-resolution under the recently proposed Deep Image Prior (DIP). We show that evolutionary search can automatically optimize the encoder-decoder (E-D) structure and meta-parameters of the DIP network, which serves as a content-specific prior to regularize these single image restoration tasks. Our binary representation encodes the design space for an asymmetric E-D network that typically converges to yield a content-specific DIP within 10-20 generations using a population size of 500. The optimized architectures consistently improve upon the visual quality of classical DIP for a diverse range of photographic and artistic content.
Tasks	Image Restoration, Neural Architecture Search, Super-Resolution
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04776v1
PDF	https://arxiv.org/pdf/2001.04776v1.pdf
PWC	https://paperswithcode.com/paper/neural-architecture-search-for-deep-image
Repo	https://github.com/Pol22/NAS_DIP
Framework	tf

Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning


Title	Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
Authors	Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau
Abstract	Accurate reporting of energy and carbon usage is essential for understanding the potential climate impacts of machine learning research. We introduce a framework that makes this easier by providing a simple interface for tracking realtime energy consumption and carbon emissions, as well as generating standardized online appendices. Utilizing this framework, we create a leaderboard for energy efficient reinforcement learning algorithms to incentivize responsible research in this area as an example for other areas of machine learning. Finally, based on case studies using our framework, we propose strategies for mitigation of carbon emissions and reduction of energy consumption. By making accounting easier, we hope to further the sustainable development of machine learning experiments and spur more research into energy efficient algorithms.
Tasks
Published	2020-01-31
URL	https://arxiv.org/abs/2002.05651v1
PDF	https://arxiv.org/pdf/2002.05651v1.pdf
PWC	https://paperswithcode.com/paper/towards-the-systematic-reporting-of-the
Repo	https://github.com/Breakend/experiment-impact-tracker
Framework	none

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting


Title	From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting
Authors	Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao
Abstract	Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i.e., the number of population can vary in [0, inf) in theory. However, collected data and labeled instances are limited in reality, which means that only a small closed set is observed. Existing methods typically model this task in a regression manner, while they are prone to suffer from an unseen scene with counts out of the scope of the closed set. In fact, counting has an interesting and exclusive property—spatially decomposable. A dense region can always be divided until sub-region counts are within the previously observed closed set. We therefore introduce the idea of spatial divide-and-conquer (S-DC) that transforms open-set counting into a closed-set problem. This idea is implemented by a novel Supervised Spatial Divide-and-Conquer Network (SS-DCNet). Thus, SS-DCNet can only learn from a closed set but generalize well to open-set scenarios via S-DC. SS-DCNet is also efficient. To avoid repeatedly computing sub-region convolutional features, S-DC is executed on the feature map instead of on the input image. We provide theoretical analyses as well as a controlled experiment on toy data, demonstrating why closed-set modeling makes sense. Extensive experiments show that SS-DCNet achieves the state-of-the-art performance. Code and models are available at: https://tinyurl.com/SS-DCNet.
Tasks	Object Counting
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01886v1
PDF	https://arxiv.org/pdf/2001.01886v1.pdf
PWC	https://paperswithcode.com/paper/from-open-set-to-closed-set-supervised
Repo	https://github.com/xhp-hust-2018-2011/S-DCNet
Framework	pytorch

Channel Pruning via Automatic Structure Search


Title	Channel Pruning via Automatic Structure Search
Authors	Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, Yonghong Tian
Abstract	Channel pruning is among the predominant approaches to compress deep neural networks. To this end, most existing pruning methods focus on selecting channels (filters) by importance/optimization or regularization based on rule-of-thumb designs, which defects in sub-optimal pruning. In this paper, we propose a new channel pruning method based on artificial bee colony algorithm (ABC), dubbed as ABCPruner, which aims to efficiently find optimal pruned structure, i.e., channel number in each layer, rather than selecting “important” channels as previous works did. To solve the intractably huge combinations of pruned structure for deep networks, we first propose to shrink the combinations where the preserved channels are limited to a specific space, thus the combinations of pruned structure can be significantly reduced. And then, we formulate the search of optimal pruned structure as an optimization problem and integrate the ABC algorithm to solve it in an automatic manner to lessen human interference. ABCPruner has been demonstrated to be more effective, which also enables the fine-tuning to be conducted efficiently in an end-to-end manner. Experiments on CIFAR-10 show that ABCPruner reduces 73.68% of FLOPs and 88.68% of parameters with even 0.06% accuracy improvement for VGGNet-16. On ILSVRC-2012, it achieves a reduction of 62.87% FLOPs and removes 60.01% of parameters with negligible accuracy cost for ResNet-152. The source codes can be available at https://github.com/lmbxmu/ABCPruner.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08565v2
PDF	https://arxiv.org/pdf/2001.08565v2.pdf
PWC	https://paperswithcode.com/paper/channel-pruning-via-automatic-structure
Repo	https://github.com/lmbxmu/ABCPruner
Framework	pytorch

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection


Title	ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection
Authors	Zhenbo Xu, Wei Zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen, Errui Ding, Ajin Meng, Liusheng Huang
Abstract	3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.
Tasks	3D Object Detection, Autonomous Driving, Disparity Estimation, Object Detection
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00529v1
PDF	https://arxiv.org/pdf/2003.00529v1.pdf
PWC	https://paperswithcode.com/paper/zoomnet-part-aware-adaptive-zooming-neural
Repo	https://github.com/detectRecog/ZoomNet
Framework	none

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach


Title	Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach
Authors	Zhe Zhang, Chunyu Wang, Wenhu Qin, Wenjun Zeng
Abstract	We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person’s limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. We present a geometric approach to reinforce the visual features of each pair of joints based on the IMUs. This notably improves 2D pose estimation accuracy especially when one joint is occluded. We call this approach Orientation Regularized Network (ORN). Then we lift the multi-view 2D poses to the 3D space by an Orientation Regularized Pictorial Structure Model (ORPSM) which jointly minimizes the projection error between the 3D and 2D poses, along with the discrepancy between the 3D pose and IMU orientations. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset. Our code will be released at https://github.com/microsoft/imu-human-pose-estimation-pytorch.
Tasks	3D Absolute Human Pose Estimation, 3D Human Pose Estimation, Pose Estimation
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11163v1
PDF	https://arxiv.org/pdf/2003.11163v1.pdf
PWC	https://paperswithcode.com/paper/fusing-wearable-imus-with-multi-view-images
Repo	https://github.com/CHUNYUWANG/imu-human-pose-pytorch
Framework	pytorch

Convolutional Neural Networks with Intermediate Loss for 3D Super-Resolution of CT and MRI Scans


Title	Convolutional Neural Networks with Intermediate Loss for 3D Super-Resolution of CT and MRI Scans
Authors	Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Nicolae Verga
Abstract	CT scanners that are commonly-used in hospitals nowadays produce low-resolution images, up to 512 pixels in size. One pixel in the image corresponds to a one millimeter piece of tissue. In order to accurately segment tumors and make treatment plans, doctors need CT scans of higher resolution. The same problem appears in MRI. In this paper, we propose an approach for the single-image super-resolution of 3D CT or MRI scans. Our method is based on deep convolutional neural networks (CNNs) composed of 10 convolutional layers and an intermediate upscaling layer that is placed after the first 6 convolutional layers. Our first CNN, which increases the resolution on two axes (width and height), is followed by a second CNN, which increases the resolution on the third axis (depth). Different from other methods, we compute the loss with respect to the ground-truth high-resolution output right after the upscaling layer, in addition to computing the loss after the last convolutional layer. The intermediate loss forces our network to produce a better output, closer to the ground-truth. A widely-used approach to obtain sharp results is to add Gaussian blur using a fixed standard deviation. In order to avoid overfitting to a fixed standard deviation, we apply Gaussian smoothing with various standard deviations, unlike other approaches. We evaluate our method in the context of 2D and 3D super-resolution of CT and MRI scans from two databases, comparing it to relevant related works from the literature and baselines based on various interpolation schemes, using 2x and 4x scaling factors. The empirical results show that our approach attains superior results to all other methods. Moreover, our human annotation study reveals that both doctors and regular annotators chose our method in favor of Lanczos interpolation in 97.55% cases for 2x upscaling factor and in 96.69% cases for 4x upscaling factor.
Tasks	Image Super-Resolution, Super-Resolution
Published	2020-01-05
URL	https://arxiv.org/abs/2001.01330v2
PDF	https://arxiv.org/pdf/2001.01330v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-with-3
Repo	https://github.com/lilygeorgescu/3d-super-res-cnn
Framework	tf

Pix2Pix-based Stain-to-Stain Translation: A Solution for Robust Stain Normalization in Histopathology Images Analysis


Title	Pix2Pix-based Stain-to-Stain Translation: A Solution for Robust Stain Normalization in Histopathology Images Analysis
Authors	Pegah Salehi, Abdolah Chalechale
Abstract	The diagnosis of cancer is mainly performed by visual analysis of the pathologists, through examining the morphology of the tissue slices and the spatial arrangement of the cells. If the microscopic image of a specimen is not stained, it will look colorless and textured. Therefore, chemical staining is required to create contrast and help identify specific tissue components. During tissue preparation due to differences in chemicals, scanners, cutting thicknesses, and laboratory protocols, similar tissues are usually varied significantly in appearance. This diversity in staining, in addition to Interpretive disparity among pathologists more is one of the main challenges in designing robust and flexible systems for automated analysis. To address the staining color variations, several methods for normalizing stain have been proposed. In our proposed method, a Stain-to-Stain Translation (STST) approach is used to stain normalization for Hematoxylin and Eosin (H&E) stained histopathology images, which learns not only the specific color distribution but also the preserves corresponding histopathological pattern. We perform the process of translation based on the pix2pix framework, which uses the conditional generator adversarial networks (cGANs). Our approach showed excellent results, both mathematically and experimentally against the state of the art methods. We have made the source code publicly available.
Tasks
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00647v1
PDF	https://arxiv.org/pdf/2002.00647v1.pdf
PWC	https://paperswithcode.com/paper/pix2pix-based-stain-to-stain-translation-a
Repo	https://github.com/pegahsalehi/Stain-to-Stain-Translation
Framework	none

Giving Up Control: Neurons as Reinforcement Learning Agents


Title	Giving Up Control: Neurons as Reinforcement Learning Agents
Authors	Jordan Ott
Abstract	Artificial Intelligence has historically relied on planning, heuristics, and handcrafted approaches designed by experts. All the while claiming to pursue the creation of Intelligence. This approach fails to acknowledge that intelligence emerges from the dynamics within a complex system. Neurons in the brain are governed by local rules, where no single neuron, or group of neurons, coordinates or controls the others. This local structure gives rise to the appropriate dynamics in which intelligence can emerge. Populations of neurons must compete with their neighbors for resources, inhibition, and activity representation. At the same time, they must cooperate, so the population and organism can perform high-level functions. To this end, we introduce modeling neurons as reinforcement learning agents. Where each neuron may be viewed as an independent actor, trying to maximize its own self-interest. By framing learning in this way, we open the door to an entirely new approach to building intelligent systems.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.11642v1
PDF	https://arxiv.org/pdf/2003.11642v1.pdf
PWC	https://paperswithcode.com/paper/giving-up-control-neurons-as-reinforcement
Repo	https://github.com/Multi-Agent-Networks/NaRLA
Framework	pytorch

Express Wavenet – a low parameter optical neural network with random shift wavelet pattern


Title	Express Wavenet – a low parameter optical neural network with random shift wavelet pattern
Authors	Yingshi Chen
Abstract	Express Wavenet is an improved optical diffractive neural network. At each layer, it uses wavelet-like pattern to modulate the phase of optical waves. For input image with n2 pixels, express wavenet reduce parameter number from O(n2) to O(n). Only need one percent of the parameters, and the accuracy is still very high. In the MNIST dataset, it only needs 1229 parameters to get accuracy of 92%, while the standard optical network needs 125440 parameters. The random shift wavelets show the characteristics of optical network more vividly. Especially the vanishing gradient phenomenon in the training process. We present a modified expressway structure for this problem. Experiments verified the effect of random shift wavelet and expressway structure. Our work shows optical diffractive network would use much fewer parameters than other neural networks. The source codes are available at https://github.com/closest-git/ONNet.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01458v1
PDF	https://arxiv.org/pdf/2001.01458v1.pdf
PWC	https://paperswithcode.com/paper/express-wavenet-a-low-parameter-optical
Repo	https://github.com/closest-git/ONNet
Framework	pytorch