Paper Group AWR 125
Rethinking the Evaluation of Video Summaries. Corrigibility with Utility Preservation. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. Oktoberfest Food Dataset. Assessing the Ability of Self-Attention Networks to Learn Word Order. Beyond the Self: Using Grounded Affordances to Interp …
Rethinking the Evaluation of Video Summaries
Title | Rethinking the Evaluation of Video Summaries |
Authors | Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä |
Abstract | Video summarization is a technique to create a short skim of the original video while preserving the main stories/content. There exists a substantial interest in automatizing this process due to the rapid growth of the available material. The recent progress has been facilitated by public benchmark datasets, which enable easy and fair comparison of methods. Currently the established evaluation protocol is to compare the generated summary with respect to a set of reference summaries provided by the dataset. In this paper, we will provide in-depth assessment of this pipeline using two popular benchmark datasets. Surprisingly, we observe that randomly generated summaries achieve comparable or better performance to the state-of-the-art. In some cases, the random summaries outperform even the human generated summaries in leave-one-out experiments. Moreover, it turns out that the video segmentation, which is often considered as a fixed pre-processing method, has the most significant impact on the performance measure. Based on our observations, we propose alternative approaches for assessing the importance scores as well as an intuitive visualization of correlation between the estimated scoring and human annotations. |
Tasks | Video Semantic Segmentation, Video Summarization |
Published | 2019-03-27 |
URL | http://arxiv.org/abs/1903.11328v2 |
http://arxiv.org/pdf/1903.11328v2.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-the-evaluation-of-video-summaries |
Repo | https://github.com/mayu-ot/rethinking-evs |
Framework | none |
Corrigibility with Utility Preservation
Title | Corrigibility with Utility Preservation |
Authors | Koen Holtman |
Abstract | Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This paper shows how to construct a safety layer that adds corrigibility to arbitrarily advanced utility maximizing agents, including possible future agents with Artificial General Intelligence (AGI). The layer counter-acts the emergent incentive of advanced agents to resist such alteration. A detailed model for agents which can reason about preserving their utility function is developed, and used to prove that the corrigibility layer works as intended in a large set of non-hostile universes. The corrigible agents have an emergent incentive to protect key elements of their corrigibility layer. However, hostile universes may contain forces strong enough to break safety features. Some open problems related to graceful degradation when an agent is successfully attacked are identified. The results in this paper were obtained by concurrently developing an AGI agent simulator, an agent model, and proofs. The simulator is available under an open source license. The paper contains simulation results which illustrate the safety related properties of corrigible AGI agents in detail. |
Tasks | |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01695v1 |
https://arxiv.org/pdf/1908.01695v1.pdf | |
PWC | https://paperswithcode.com/paper/corrigibility-with-utility-preservation |
Repo | https://github.com/kholtman/agisim |
Framework | none |
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Title | Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them |
Authors | Hila Gonen, Yoav Goldberg |
Abstract | Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society. This phenomenon is pervasive and consistent across different word embedding models, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between “gender-neutralized” words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling. |
Tasks | Word Embeddings |
Published | 2019-03-09 |
URL | https://arxiv.org/abs/1903.03862v2 |
https://arxiv.org/pdf/1903.03862v2.pdf | |
PWC | https://paperswithcode.com/paper/lipstick-on-a-pig-debiasing-methods-cover-up |
Repo | https://github.com/TManzini/DebiasMulticlassWordEmbedding |
Framework | pytorch |
Oktoberfest Food Dataset
Title | Oktoberfest Food Dataset |
Authors | Alexander Ziller, Julius Hansjakob, Vitalii Rusinov, Daniel Zügner, Peter Vogel, Stephan Günnemann |
Abstract | We release a realistic, diverse, and challenging dataset for object detection on images. The data was recorded at a beer tent in Germany and consists of 15 different categories of food and drink items. We created more than 2,500 object annotations by hand for 1,110 images captured by a video camera above the checkout. We further make available the remaining 600GB of (unlabeled) data containing days of footage. Additionally, we provide our trained models as a benchmark. Possible applications include automated checkout systems which could significantly speed up the process. |
Tasks | Object Detection |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1912.05007v1 |
https://arxiv.org/pdf/1912.05007v1.pdf | |
PWC | https://paperswithcode.com/paper/oktoberfest-food-dataset |
Repo | https://github.com/a1302z/OktoberfestFoodDataset |
Framework | pytorch |
Assessing the Ability of Self-Attention Networks to Learn Word Order
Title | Assessing the Ability of Self-Attention Networks to Learn Word Order |
Authors | Baosong Yang, Longyue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu |
Abstract | Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when “lacking positional information” have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation. |
Tasks | Machine Translation |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00592v1 |
https://arxiv.org/pdf/1906.00592v1.pdf | |
PWC | https://paperswithcode.com/paper/190600592 |
Repo | https://github.com/baosongyang/WRD |
Framework | tf |
Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions
Title | Beyond the Self: Using Grounded Affordances to Interpret and Describe Others’ Actions |
Authors | Giovanni Saponaro, Lorenzo Jamone, Alexandre Bernardino, Giampiero Salvi |
Abstract | We propose a developmental approach that allows a robot to interpret and describe the actions of human agents by reusing previous experience. The robot first learns the association between words and object affordances by manipulating the objects in its environment. It then uses this information to learn a mapping between its own actions and those performed by a human in a shared environment. It finally fuses the information from these two models to interpret and describe human actions in light of its own experience. In our experiments, we show that the model can be used flexibly to do inference on different aspects of the scene. We can predict the effects of an action on the basis of object properties. We can revise the belief that a certain action occurred, given the observed effects of the human action. In an early action recognition fashion, we can anticipate the effects when the action has only been partially observed. By estimating the probability of words given the evidence and feeding them into a pre-defined grammar, we can generate relevant descriptions of the scene. We believe that this is a step towards providing robots with the fundamental skills to engage in social collaboration with humans. |
Tasks | Temporal Action Localization |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09705v1 |
http://arxiv.org/pdf/1902.09705v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-the-self-using-grounded-affordances-to |
Repo | https://github.com/gsaponaro/tcds-gestures |
Framework | none |
DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures
Title | DBSN: Measuring Uncertainty through Bayesian Learning of Deep Neural Network Structures |
Authors | Zhijie Deng, Yucen Luo, Jun Zhu, Bo Zhang |
Abstract | Bayesian neural networks (BNNs) introduce uncertainty estimation to deep networks by performing Bayesian inference on network weights. However, such models bring the challenges of inference, and further BNNs with weight uncertainty rarely achieve superior performance to standard models. In this paper, we investigate a new line of Bayesian deep learning by performing Bayesian reasoning on the structure of deep neural networks. Drawing inspiration from the neural architecture search, we define the network structure as gating weights on the redundant operations between computational nodes, and apply stochastic variational inference techniques to learn the structure distributions of networks. Empirically, the proposed method substantially surpasses the advanced deep neural networks across a range of classification and segmentation tasks. More importantly, our approach also preserves benefits of Bayesian principles, producing improved uncertainty estimation than the strong baselines including MC dropout and variational BNNs algorithms (e.g. noisy EK-FAC). |
Tasks | Bayesian Inference, Neural Architecture Search |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.09804v1 |
https://arxiv.org/pdf/1911.09804v1.pdf | |
PWC | https://paperswithcode.com/paper/dbsn-measuring-uncertainty-through-bayesian |
Repo | https://github.com/anonymousest/DBSN |
Framework | tf |
AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture
Title | AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture |
Authors | Tunhou Zhang, Hsin-Pai Cheng, Zhenwen Li, Feng Yan, Chengyu Huang, Hai Li, Yiran Chen |
Abstract | Resource is an important constraint when deploying Deep Neural Networks (DNNs) on mobile and edge devices. Existing works commonly adopt the cell-based search approach, which limits the flexibility of network patterns in learned cell structures. Moreover, due to the topology-agnostic nature of existing works, including both cell-based and node-based approaches, the search process is time consuming and the performance of found architecture may be sub-optimal. To address these problems, we propose AutoShrink, a topology-aware Neural Architecture Search(NAS) for searching efficient building blocks of neural architectures. Our method is node-based and thus can learn flexible network patterns in cell structures within a topological search space. Directed Acyclic Graphs (DAGs) are used to abstract DNN architectures and progressively optimize the cell structure through edge shrinking. As the search space intrinsically reduces as the edges are progressively shrunk, AutoShrink explores more flexible search space with even less search time. We evaluate AutoShrink on image classification and language tasks by crafting ShrinkCNN and ShrinkRNN models. ShrinkCNN is able to achieve up to 48% parameter reduction and save 34% Multiply-Accumulates (MACs) on ImageNet-1K with comparable accuracy of state-of-the-art (SOTA) models. Specifically, both ShrinkCNN and ShrinkRNN are crafted within 1.5 GPU hours, which is 7.2x and 6.7x faster than the crafting time of SOTA CNN and RNN models, respectively. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09251v1 |
https://arxiv.org/pdf/1911.09251v1.pdf | |
PWC | https://paperswithcode.com/paper/autoshrink-a-topology-aware-nas-for |
Repo | https://github.com/lordzth666/AutoShrink |
Framework | tf |
Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks
Title | Real-time Hand Gesture Detection and Classification Using Convolutional Neural Networks |
Authors | Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, Gerhard Rigoll |
Abstract | Real-time recognition of dynamic hand gestures from video streams is a challenging task since (i) there is no indication when a gesture starts and ends in the video, (ii) performed gestures should only be recognized once, and (iii) the entire architecture should be designed considering the memory and power budget. In this work, we address these challenges by proposing a hierarchical structure enabling offline-working convolutional neural network (CNN) architectures to operate online efficiently by using sliding window approach. The proposed architecture consists of two models: (1) A detector which is a lightweight CNN architecture to detect gestures and (2) a classifier which is a deep CNN to classify the detected gestures. In order to evaluate the single-time activations of the detected gestures, we propose to use Levenshtein distance as an evaluation metric since it can measure misclassifications, multiple detections, and missing detections at the same time. We evaluate our architecture on two publicly available datasets - EgoGesture and NVIDIA Dynamic Hand Gesture Datasets - which require temporal detection and classification of the performed hand gestures. ResNeXt-101 model, which is used as a classifier, achieves the state-of-the-art offline classification accuracy of 94.04% and 83.82% for depth modality on EgoGesture and NVIDIA benchmarks, respectively. In real-time detection and classification, we obtain considerable early detections while achieving performances close to offline operation. The codes and pretrained models used in this work are publicly available. |
Tasks | Action Recognition In Videos, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10323v3 |
https://arxiv.org/pdf/1901.10323v3.pdf | |
PWC | https://paperswithcode.com/paper/real-time-hand-gesture-detection-and |
Repo | https://github.com/LEChaney/Real-time-SSAR |
Framework | pytorch |
Learning to reinforcement learn for Neural Architecture Search
Title | Learning to reinforcement learn for Neural Architecture Search |
Authors | J. Gomez Robles, J. Vanschoren |
Abstract | Reinforcement learning (RL) is a goal-oriented learning solution that has proven to be successful for Neural Architecture Search (NAS) on the CIFAR and ImageNet datasets. However, a limitation of this approach is its high computational cost, making it unfeasible to replay it on other datasets. Through meta-learning, we could bring this cost down by adapting previously learned policies instead of learning them from scratch. In this work, we propose a deep meta-RL algorithm that learns an adaptive policy over a set of environments, making it possible to transfer it to previously unseen tasks. The algorithm was applied to various proof-of-concept environments in the past, but we adapt it to the NAS problem. We empirically investigate the agent’s behavior during training when challenged to design chain-structured neural architectures for three datasets with increasing levels of hardness, to later fix the policy and evaluate it on two unseen datasets of different difficulty. Our results show that, under resource constraints, the agent effectively adapts its strategy during training to design better architectures than the ones designed by a standard RL algorithm, and can design good architectures during the evaluation on previously unseen environments. We also provide guidelines on the applicability of our framework in a more complex NAS setting by studying the progress of the agent when challenged to design multi-branch architectures. |
Tasks | Meta-Learning, Neural Architecture Search |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03769v2 |
https://arxiv.org/pdf/1911.03769v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-reinforcement-learn-for-neural |
Repo | https://github.com/gomerudo/nas-dmrl |
Framework | tf |
NAT: Neural Architecture Transformer for Accurate and Compact Architectures
Title | NAT: Neural Architecture Transformer for Accurate and Compact Architectures |
Authors | Yong Guo, Yin Zheng, Mingkui Tan, Qi Chen, Jian Chen, Peilin Zhao, Junzhou Huang |
Abstract | Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods. |
Tasks | Neural Architecture Search |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14488v5 |
https://arxiv.org/pdf/1910.14488v5.pdf | |
PWC | https://paperswithcode.com/paper/nat-neural-architecture-transformer-for |
Repo | https://github.com/guoyongcs/NAT |
Framework | pytorch |
Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks
Title | Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks |
Authors | Ruud van Deursen, Guillaume Godin |
Abstract | Graphs and networks are a key research tool for a variety of science fields, most notably chemistry, biology, engineering and social sciences. Modeling and generation of graphs with efficient sampling is a key challenge for graphs. In particular, the non-uniqueness, high dimensionality of the vertices and local dependencies of the edges may render the task challenging. We apply our recently introduced method, Generative Examination Networks (GENs) to create the first text-based generative graph models using one-line text formats as graph representation. In our GEN, a RNN-generative model for a one-line text format learns autonomously to predict the next available character. The training is stopped by an examination mechanism checking validating the percentage of valid graphs generated. We achieved moderate to high validity using dense g6 strings (random 67.8 +/- 0.6, canonical 99.1 +/- 0.2). Based on these results we have adapted the widely used SMILES representation for molecules to a new input format, which we call linear graph input (LGI). Apart from the benefits of a short compressible text-format, a major advantage include the possibility to randomize and augment the format. The generative models are evaluated for overall performance and for reconstruction of the property space. The results show that LGI strings are very well suited for machine-learning and that augmentation is essential for the performance of the model in terms of validity, uniqueness and novelty. Lastly, the format can address smaller and larger dataset of graphs and the format can be easily adapted to define another meaning of the characters used in the LGI-string and can address sparse graph problems in used in other fields of science. |
Tasks | |
Published | 2019-09-24 |
URL | https://arxiv.org/abs/1909.11472v1 |
https://arxiv.org/pdf/1909.11472v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-generative-model-for-sparse-graphs-using |
Repo | https://github.com/RuudFirsa/Graph-GEN |
Framework | tf |
Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning
Title | Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning |
Authors | Enrique Fita Sanmartin, Sebastian Damrich, Fred A. Hamprecht |
Abstract | The seeded Watershed algorithm / minimax semi-supervised learning on a graph computes a minimum spanning forest which connects every pixel / unlabeled node to a seed / labeled node. We propose instead to consider all possible spanning forests and calculate, for every node, the probability of sampling a forest connecting a certain seed with that node. We dub this approach “Probabilistic Watershed”. Leo Grady (2006) already noted its equivalence to the Random Walker / Harmonic energy minimization. We here give a simpler proof of this equivalence and establish the computational feasibility of the Probabilistic Watershed with Kirchhoff’s matrix tree theorem. Furthermore, we show a new connection between the Random Walker probabilities and the triangle inequality of the effective resistance. Finally, we derive a new and intuitive interpretation of the Power Watershed. |
Tasks | |
Published | 2019-11-06 |
URL | https://arxiv.org/abs/1911.02921v1 |
https://arxiv.org/pdf/1911.02921v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-watershed-sampling-all-spanning |
Repo | https://github.com/hci-unihd/Probabilistic_Watershed |
Framework | none |
BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search
Title | BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search |
Authors | Colin White, Willie Neiswanger, Yash Savani |
Abstract | Neural Architecture Search (NAS) has seen an explosion of research in the past few years, with techniques spanning reinforcement learning, evolutionary search, Gaussian process (GP) Bayesian optimization (BO), and gradient descent. While BO with GPs has seen great success in hyperparameter optimization, there are many challenges applying BO to NAS, such as the requirement of a distance function between neural networks. In this work, we develop a suite of techniques for high-performance BO applied to NAS that allows us to achieve state-of-the-art NAS results. We develop a BO procedure that leverages a novel architecture representation (which we term the path encoding) and a neural network-based predictive uncertainty model on this representation. On popular search spaces, we can predict the validation accuracy of a new architecture to within one percent of its true value using only 200 training points. This may be of independent interest beyond NAS. We also show experimentally and theoretically that our method scales far better than existing techniques. We test our algorithm on the NASBench (Ying et al. 2019) and DARTS (Liu et al. 2018) search spaces and show that our algorithm outperforms a variety of NAS methods including regularized evolution, reinforcement learning, BOHB, and DARTS. Our method achieves state-of-the-art performance on the NASBench dataset and is over 100x more efficient than random search. We adhere to the recent NAS research checklist (Lindauer and Hutter 2019) to facilitate NAS research. In particular, our implementation is publicly available and includes all details needed to fully reproduce our results. |
Tasks | Hyperparameter Optimization, Neural Architecture Search |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11858v2 |
https://arxiv.org/pdf/1910.11858v2.pdf | |
PWC | https://paperswithcode.com/paper/bananas-bayesian-optimization-with-neural |
Repo | https://github.com/naszilla/bananas |
Framework | tf |
Training Image Estimators without Image Ground-Truth
Title | Training Image Estimators without Image Ground-Truth |
Authors | Zhihao Xia, Ayan Chakrabarti |
Abstract | Deep neural networks have been very successful in image estimation applications such as compressive-sensing and image restoration, as a means to estimate images from partial, blurry, or otherwise degraded measurements. These networks are trained on a large number of corresponding pairs of measurements and ground-truth images, and thus implicitly learn to exploit domain-specific image statistics. But unlike measurement data, it is often expensive or impractical to collect a large training set of ground-truth images in many application settings. In this paper, we introduce an unsupervised framework for training image estimation networks, from a training set that contains only measurements—with two varied measurements per image—but no ground-truth for the full images desired as output. We demonstrate that our framework can be applied for both regular and blind image estimation tasks, where in the latter case parameters of the measurement model (e.g., the blur kernel) are unknown: during inference, and potentially, also during training. We evaluate our method for training networks for compressive-sensing and blind deconvolution, considering both non-blind and blind training for the latter. Our unsupervised framework yields models that are nearly as accurate as those from fully supervised training, despite not having access to any ground-truth images. |
Tasks | Compressive Sensing, Image Restoration |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05775v2 |
https://arxiv.org/pdf/1906.05775v2.pdf | |
PWC | https://paperswithcode.com/paper/training-image-estimators-without-image |
Repo | https://github.com/likesum/unsupimg |
Framework | tf |