February 1, 2020

3616 words 17 mins read

Paper Group AWR 125

Rethinking the Evaluation of Video Summaries. Corrigibility with Utility Preservation. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. Oktoberfest Food Dataset. Assessing the Ability of Self-Attention Networks to Learn Word Order. Beyond the Self: Using Grounded Affordances to Interp …

Rethinking the Evaluation of Video Summaries


Title	Rethinking the Evaluation of Video Summaries
Authors	Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä
Abstract	Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison of methods. Currently the established evaluation protocol is to compare the generated summary with respect to a set of reference summaries provided by the dataset. In this paper, we will provide in-depth assessment of this pipeline using two popular benchmark datasets. Surprisingly, we observe that randomly generated summaries achieve comparable or better performance to the state-of-the-art. In some cases, the random summaries outperform even the human generated summaries in leave-one-out experiments. Moreover, it turns out that the video segmentation, which is often considered as a fixed pre-processing method, has the most significant impact on the performance measure. Based on our observations, we propose alternative approaches for assessing the importance scores as well as an intuitive visualization of correlation between the estimated scoring and human annotations.
Tasks	Video Semantic Segmentation, Video Summarization
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11328v2
PDF	http://arxiv.org/pdf/1903.11328v2.pdf
PWC	https://paperswithcode.com/paper/rethinking-the-evaluation-of-video-summaries
Repo	https://github.com/mayu-ot/rethinking-evs
Framework	none

Corrigibility with Utility Preservation


Title	Corrigibility with Utility Preservation
Authors	Koen Holtman
Abstract	Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This paper shows how to construct a safety layer that adds corrigibility to arbitrarily advanced utility maximizing agents, including possible future agents with Artificial General Intelligence (AGI). The layer counter-acts the emergent incentive of advanced agents to resist such alteration. A detailed model for agents which can reason about preserving their utility function is developed, and used to prove that the corrigibility layer works as intended in a large set of non-hostile universes. The corrigible agents have an emergent incentive to protect key elements of their corrigibility layer. However, hostile universes may contain forces strong enough to break safety features. Some open problems related to graceful degradation when an agent is successfully attacked are identified. The results in this paper were obtained by concurrently developing an AGI agent simulator, an agent model, and proofs. The simulator is available under an open source license. The paper contains simulation results which illustrate the safety related properties of corrigible AGI agents in detail.
Tasks
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01695v1
PDF	https://arxiv.org/pdf/1908.01695v1.pdf
PWC	https://paperswithcode.com/paper/corrigibility-with-utility-preservation
Repo	https://github.com/kholtman/agisim
Framework	none

Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them


Title	Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Authors	Hila Gonen, Yoav Goldberg
Abstract	Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society. This phenomenon is pervasive and consistent across different word embedding models, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between “gender-neutralized” words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.
Tasks	Word Embeddings
Published	2019-03-09
URL	https://arxiv.org/abs/1903.03862v2
PDF	https://arxiv.org/pdf/1903.03862v2.pdf
PWC	https://paperswithcode.com/paper/lipstick-on-a-pig-debiasing-methods-cover-up
Repo	https://github.com/TManzini/DebiasMulticlassWordEmbedding
Framework	pytorch

Oktoberfest Food Dataset


Title	Oktoberfest Food Dataset
Authors	Alexander Ziller, Julius Hansjakob, Vitalii Rusinov, Daniel Zügner, Peter Vogel, Stephan Günnemann
Abstract	We release a realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items. We created more than 2,500 object annotations by hand for 1,110 images captured by a video camera above the checkout. We further make available the remaining 600GB of (unlabeled) data containing days of footage. Additionally, we provide our trained models as a benchmark. Possible applications include automated checkout systems which could significantly speed up the process.
Tasks	Object Detection
Published	2019-11-22
URL	https://arxiv.org/abs/1912.05007v1
PDF	https://arxiv.org/pdf/1912.05007v1.pdf
PWC	https://paperswithcode.com/paper/oktoberfest-food-dataset
Repo	https://github.com/a1302z/OktoberfestFoodDataset
Framework	pytorch

Assessing the Ability of Self-Attention Networks to Learn Word Order


Title	Assessing the Ability of Self-Attention Networks to Learn Word Order
Authors	Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu
Abstract	Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when “lacking positional information” have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation.
Tasks	Machine Translation
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00592v1
PDF	https://arxiv.org/pdf/1906.00592v1.pdf
PWC	https://paperswithcode.com/paper/190600592
Repo	https://github.com/baosongyang/WRD
Framework	tf

Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions


Title	Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions
Authors	Giovanni Saponaro, Lorenzo Jamone, Alexandre Bernardino, Giampiero Salvi
Abstract	We propose a developmental approach that allows a robot to interpret and describe the actions of human agents by reusing previous experience. The robot first learns the association between words and object affordances by manipulating the objects in its environment. It then uses this information to learn a mapping between its own actions and those performed by a human in a shared environment. It finally fuses the information from these two models to interpret and describe human actions in light of its own experience. In our experiments, we show that the model can be used flexibly to do inference on different aspects of the scene. We can predict the effects of an action on the basis of object properties. We can revise the belief that a certain action occurred, given the observed effects of the human action. In an early action recognition fashion, we can anticipate the effects when the action has only been partially observed. By estimating the probability of words given the evidence and feeding them into a pre-defined grammar, we can generate relevant descriptions of the scene. We believe that this is a step towards providing robots with the fundamental skills to engage in social collaboration with humans.
Tasks	Temporal Action Localization
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09705v1
PDF	http://arxiv.org/pdf/1902.09705v1.pdf
PWC	https://paperswithcode.com/paper/beyond-the-self-using-grounded-affordances-to
Repo	https://github.com/gsaponaro/tcds-gestures
Framework	none

DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures


Title	DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures
Authors	Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang
Abstract	Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights. However, such models bring the challenges of inference, and further BNNs with weight uncertainty rarely achieve superior performance to standard models. In this paper, we investigate a new line of Bayesian deep learning by performing Bayesian reasoning on the structure of deep neural networks. Drawing inspiration from the neural architecture search, we define the network structure as gating weights on the redundant operations between computational nodes, and apply stochastic variational inference techniques to learn the structure distributions of networks. Empirically, the proposed method substantially surpasses the advanced deep neural networks across a range of classification and segmentation tasks. More importantly, our approach also preserves benefits of Bayesian principles, producing improved uncertainty estimation than the strong baselines including MC dropout and variational BNNs algorithms (e.g. noisy EK-FAC).
Tasks	Bayesian Inference, Neural Architecture Search
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09804v1
PDF	https://arxiv.org/pdf/1911.09804v1.pdf
PWC	https://paperswithcode.com/paper/dbsn-measuring-uncertainty-through-bayesian
Repo	https://github.com/anonymousest/DBSN
Framework	tf

AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture


Title	AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture
Authors	Tunhou Zhang, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai Li, Yiran Chen
Abstract	Resource is an important constraint when deploying Deep Neural Networks (DNNs) on mobile and edge devices. Existing works commonly adopt the cell-based search approach, which limits the flexibility of network patterns in learned cell structures. Moreover, due to the topology-agnostic nature of existing works, including both cell-based and node-based approaches, the search process is time consuming and the performance of found architecture may be sub-optimal. To address these problems, we propose AutoShrink, a topology-aware Neural Architecture Search(NAS) for searching efficient building blocks of neural architectures. Our method is node-based and thus can learn flexible network patterns in cell structures within a topological search space. Directed Acyclic Graphs (DAGs) are used to abstract DNN architectures and progressively optimize the cell structure through edge shrinking. As the search space intrinsically reduces as the edges are progressively shrunk, AutoShrink explores more flexible search space with even less search time. We evaluate AutoShrink on image classification and language tasks by crafting ShrinkCNN and ShrinkRNN models. ShrinkCNN is able to achieve up to 48% parameter reduction and save 34% Multiply-Accumulates (MACs) on ImageNet-1K with comparable accuracy of state-of-the-art (SOTA) models. Specifically, both ShrinkCNN and ShrinkRNN are crafted within 1.5 GPU hours, which is 7.2x and 6.7x faster than the crafting time of SOTA CNN and RNN models, respectively.
Tasks	Image Classification, Neural Architecture Search
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09251v1
PDF	https://arxiv.org/pdf/1911.09251v1.pdf
PWC	https://paperswithcode.com/paper/autoshrink-a-topology-aware-nas-for
Repo	https://github.com/lordzth666/AutoShrink
Framework	tf

Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks


Title	Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks
Authors	Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, Gerhard Rigoll
Abstract	Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enabling offline-working convolutional neural network (CNN) architectures to operate online efficiently by using sliding window approach. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. In order to evaluate the single-time activations of the detected gestures, we propose to use Levenshtein distance as an evaluation metric since it can measure misclassifications, multiple detections, and missing detections at the same time. We evaluate our architecture on two publicly available datasets - EgoGesture and NVIDIA Dynamic Hand Gesture Datasets - which require temporal detection and classification of the performed hand gestures. ResNeXt-101 model, which is used as a classifier, achieves the state-of-the-art offline classification accuracy of 94.04% and 83.82% for depth modality on EgoGesture and NVIDIA benchmarks, respectively. In real-time detection and classification, we obtain considerable early detections while achieving performances close to offline operation. The codes and pretrained models used in this work are publicly available.
Tasks	Action Recognition In Videos, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10323v3
PDF	https://arxiv.org/pdf/1901.10323v3.pdf
PWC	https://paperswithcode.com/paper/real-time-hand-gesture-detection-and
Repo	https://github.com/LEChaney/Real-time-SSAR
Framework	pytorch

Learning to reinforcement learn for Neural Architecture Search


Title	Learning to reinforcement learn for Neural Architecture Search
Authors	J. Gomez Robles, J. Vanschoren
Abstract	Reinforcement learning (RL) is a goal-oriented learning solution that has proven to be successful for Neural Architecture Search (NAS) on the CIFAR and ImageNet datasets. However, a limitation of this approach is its high computational cost, making it unfeasible to replay it on other datasets. Through meta-learning, we could bring this cost down by adapting previously learned policies instead of learning them from scratch. In this work, we propose a deep meta-RL algorithm that learns an adaptive policy over a set of environments, making it possible to transfer it to previously unseen tasks. The algorithm was applied to various proof-of-concept environments in the past, but we adapt it to the NAS problem. We empirically investigate the agent’s behavior during training when challenged to design chain-structured neural architectures for three datasets with increasing levels of hardness, to later fix the policy and evaluate it on two unseen datasets of different difficulty. Our results show that, under resource constraints, the agent effectively adapts its strategy during training to design better architectures than the ones designed by a standard RL algorithm, and can design good architectures during the evaluation on previously unseen environments. We also provide guidelines on the applicability of our framework in a more complex NAS setting by studying the progress of the agent when challenged to design multi-branch architectures.
Tasks	Meta-Learning, Neural Architecture Search
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03769v2
PDF	https://arxiv.org/pdf/1911.03769v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-reinforcement-learn-for-neural
Repo	https://github.com/gomerudo/nas-dmrl
Framework	tf

NAT: Neural Architecture Transformer for Accurate and Compact Architectures


Title	NAT: Neural Architecture Transformer for Accurate and Compact Architectures
Authors	Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang
Abstract	Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.
Tasks	Neural Architecture Search
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14488v5
PDF	https://arxiv.org/pdf/1910.14488v5.pdf
PWC	https://paperswithcode.com/paper/nat-neural-architecture-transformer-for
Repo	https://github.com/guoyongcs/NAT
Framework	pytorch

Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks


Title	Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks
Authors	Ruud van Deursen, Guillaume Godin
Abstract	Graphs and networks are a key research tool for a variety of science fields, most notably chemistry, biology, engineering and social sciences. Modeling and generation of graphs with efficient sampling is a key challenge for graphs. In particular, the non-uniqueness, high dimensionality of the vertices and local dependencies of the edges may render the task challenging. We apply our recently introduced method, Generative Examination Networks (GENs) to create the first text-based generative graph models using one-line text formats as graph representation. In our GEN, a RNN-generative model for a one-line text format learns autonomously to predict the next available character. The training is stopped by an examination mechanism checking validating the percentage of valid graphs generated. We achieved moderate to high validity using dense g6 strings (random 67.8 +/- 0.6, canonical 99.1 +/- 0.2). Based on these results we have adapted the widely used SMILES representation for molecules to a new input format, which we call linear graph input (LGI). Apart from the benefits of a short compressible text-format, a major advantage include the possibility to randomize and augment the format. The generative models are evaluated for overall performance and for reconstruction of the property space. The results show that LGI strings are very well suited for machine-learning and that augmentation is essential for the performance of the model in terms of validity, uniqueness and novelty. Lastly, the format can address smaller and larger dataset of graphs and the format can be easily adapted to define another meaning of the characters used in the LGI-string and can address sparse graph problems in used in other fields of science.
Tasks
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11472v1
PDF	https://arxiv.org/pdf/1909.11472v1.pdf
PWC	https://paperswithcode.com/paper/deep-generative-model-for-sparse-graphs-using
Repo	https://github.com/RuudFirsa/Graph-GEN
Framework	tf

Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning


Title	Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning
Authors	Enrique Fita Sanmartin, Sebastian Damrich, Fred A. Hamprecht
Abstract	The seeded Watershed algorithm / minimax semi-supervised learning on a graph computes a minimum spanning forest which connects every pixel / unlabeled node to a seed / labeled node. We propose instead to consider all possible spanning forests and calculate, for every node, the probability of sampling a forest connecting a certain seed with that node. We dub this approach “Probabilistic Watershed”. Leo Grady (2006) already noted its equivalence to the Random Walker / Harmonic energy minimization. We here give a simpler proof of this equivalence and establish the computational feasibility of the Probabilistic Watershed with Kirchhoff’s matrix tree theorem. Furthermore, we show a new connection between the Random Walker probabilities and the triangle inequality of the effective resistance. Finally, we derive a new and intuitive interpretation of the Power Watershed.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02921v1
PDF	https://arxiv.org/pdf/1911.02921v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-watershed-sampling-all-spanning
Repo	https://github.com/hci-unihd/Probabilistic_Watershed
Framework	none

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search


Title	BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search
Authors	Colin White, Willie Neiswanger, Yash Savani
Abstract	Neural Architecture Search (NAS) has seen an explosion of research in the past few years, with techniques spanning reinforcement learning, evolutionary search, Gaussian process (GP) Bayesian optimization (BO), and gradient descent. While BO with GPs has seen great success in hyperparameter optimization, there are many challenges applying BO to NAS, such as the requirement of a distance function between neural networks. In this work, we develop a suite of techniques for high-performance BO applied to NAS that allows us to achieve state-of-the-art NAS results. We develop a BO procedure that leverages a novel architecture representation (which we term the path encoding) and a neural network-based predictive uncertainty model on this representation. On popular search spaces, we can predict the validation accuracy of a new architecture to within one percent of its true value using only 200 training points. This may be of independent interest beyond NAS. We also show experimentally and theoretically that our method scales far better than existing techniques. We test our algorithm on the NASBench (Ying et al. 2019) and DARTS (Liu et al. 2018) search spaces and show that our algorithm outperforms a variety of NAS methods including regularized evolution, reinforcement learning, BOHB, and DARTS. Our method achieves state-of-the-art performance on the NASBench dataset and is over 100x more efficient than random search. We adhere to the recent NAS research checklist (Lindauer and Hutter 2019) to facilitate NAS research. In particular, our implementation is publicly available and includes all details needed to fully reproduce our results.
Tasks	Hyperparameter Optimization, Neural Architecture Search
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11858v2
PDF	https://arxiv.org/pdf/1910.11858v2.pdf
PWC	https://paperswithcode.com/paper/bananas-bayesian-optimization-with-neural
Repo	https://github.com/naszilla/bananas
Framework	tf

Training Image Estimators without Image Ground-Truth


Title	Training Image Estimators without Image Ground-Truth
Authors	Zhihao Xia, Ayan Chakrabarti
Abstract	Deep neural networks have been very successful in image estimation applications such as compressive-sensing and image restoration, as a means to estimate images from partial, blurry, or otherwise degraded measurements. These networks are trained on a large number of corresponding pairs of measurements and ground-truth images, and thus implicitly learn to exploit domain-specific image statistics. But unlike measurement data, it is often expensive or impractical to collect a large training set of ground-truth images in many application settings. In this paper, we introduce an unsupervised framework for training image estimation networks, from a training set that contains only measurements—with two varied measurements per image—but no ground-truth for the full images desired as output. We demonstrate that our framework can be applied for both regular and blind image estimation tasks, where in the latter case parameters of the measurement model (e.g., the blur kernel) are unknown: during inference, and potentially, also during training. We evaluate our method for training networks for compressive-sensing and blind deconvolution, considering both non-blind and blind training for the latter. Our unsupervised framework yields models that are nearly as accurate as those from fully supervised training, despite not having access to any ground-truth images.
Tasks	Compressive Sensing, Image Restoration
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05775v2
PDF	https://arxiv.org/pdf/1906.05775v2.pdf
PWC	https://paperswithcode.com/paper/training-image-estimators-without-image
Repo	https://github.com/likesum/unsupimg
Framework	tf