Paper Group ANR 1126
Temporal Reasoning Graph for Activity Recognition. Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation. A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text. Deep Representation Learning Characterized by Inter-class Separation for Image Clustering. On Using Retrained and Incremental Machine Learnin …
Temporal Reasoning Graph for Activity Recognition
Title | Temporal Reasoning Graph for Activity Recognition |
Authors | Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen |
Abstract | Despite great success has been achieved in activity analysis, it still has many challenges. Most existing work in activity recognition pay more attention to design efficient architecture or video sampling strategy. However, due to the property of fine-grained action and long term structure in video, activity recognition is expected to reason temporal relation between video sequences. In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales. Specifically, we construct learnable temporal relation graphs to explore temporal relation on the multi-scale range. Additionally, to facilitate multi-scale temporal relation extraction, we design a multi-head temporal adjacent matrix to represent multi-kinds of temporal relations. Eventually, a multi-head temporal relation aggregator is proposed to extract the semantic meaning of those features convolving through the graphs. Extensive experiments are performed on widely-used large-scale datasets, such as Something-Something and Charades, and the results show that our model can achieve state-of-the-art performance. Further analysis shows that temporal relation reasoning with our TRG can extract discriminative features for activity recognition. |
Tasks | Action Recognition In Videos, Activity Recognition, Relation Extraction |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.09995v1 |
https://arxiv.org/pdf/1908.09995v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-reasoning-graph-for-activity |
Repo | |
Framework | |
Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation
Title | Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation |
Authors | Kuniyuki Takahashi, Kenta Yonekura |
Abstract | We propose a method to annotate segmentation masks accurately and automatically using invisible marker for object manipulation. Invisible marker is invisible under visible (regular) light conditions, but becomes visible under invisible light, such as ultraviolet (UV) light. By painting objects with the invisible marker, and by capturing images while alternately switching between regular and UV light at high speed, massive annotated datasets are created quickly and inexpensively. We show a comparison between our proposed method and manual annotations. We demonstrate semantic segmentation for deformable objects including clothes, liquids, and powders under controlled environmental light conditions. In addition, we show demonstrations of liquid pouring tasks under uncontrolled environmental light conditions in complex environments such as inside the office, house, and outdoors. Furthermore, it is possible to capture data while the camera is in motion so it becomes easier to capture large datasets, as shown in our demonstration. |
Tasks | Semantic Segmentation |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1909.12493v2 |
https://arxiv.org/pdf/1909.12493v2.pdf | |
PWC | https://paperswithcode.com/paper/invisible-marker-automatic-annotation-for |
Repo | |
Framework | |
A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text
Title | A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text |
Authors | Bohan Li, Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick, Yiming Yang |
Abstract | When trained effectively, the Variational Autoencoder (VAE) is both a powerful language model and an effective representation learning framework. In practice, however, VAEs are trained with the evidence lower bound (ELBO) as a surrogate objective to the intractable marginal data likelihood. This approach to training yields unstable results, frequently leading to a disastrous local optimum known as posterior collapse. In this paper, we investigate a simple fix for posterior collapse which yields surprisingly effective results. The combination of two known heuristics, previously considered only in isolation, substantially improves held-out likelihood, reconstruction, and latent representation learning when compared with previous state-of-the-art methods. More interestingly, while our experiments demonstrate superiority on these principle evaluations, our method obtains a worse ELBO. We use these results to argue that the typical surrogate objective for VAEs may not be sufficient or necessarily appropriate for balancing the goals of representation learning and data distribution modeling. |
Tasks | Language Modelling, Representation Learning |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00868v1 |
https://arxiv.org/pdf/1909.00868v1.pdf | |
PWC | https://paperswithcode.com/paper/a-surprisingly-effective-fix-for-deep-latent |
Repo | |
Framework | |
Deep Representation Learning Characterized by Inter-class Separation for Image Clustering
Title | Deep Representation Learning Characterized by Inter-class Separation for Image Clustering |
Authors | Dipanjan Das, Ratul Ghosh, Brojeshwar Bhowmick |
Abstract | Despite significant advances in clustering methods in recent years, the outcome of clustering of a natural image dataset is still unsatisfactory due to two important drawbacks. Firstly, clustering of images needs a good feature representation of an image and secondly, we need a robust method which can discriminate these features for making them belonging to different clusters such that intra-class variance is less and inter-class variance is high. Often these two aspects are dealt with independently and thus the features are not sufficient enough to partition the data meaningfully. In this paper, we propose a method where we discover these features required for the separation of the images using deep autoencoder. Our method learns the image representation features automatically for the purpose of clustering and also select a coherent image and an incoherent image simultaneously for a given image so that the feature representation learning can learn better discriminative features for grouping the similar images in a cluster and at the same time separating the dissimilar images across clusters. Experiment results show that our method produces significantly better result than the state-of-the-art methods and we also show that our method is more generalized across different dataset without using any pre-trained model like other existing methods. |
Tasks | Image Clustering, Representation Learning |
Published | 2019-01-19 |
URL | http://arxiv.org/abs/1901.06474v1 |
http://arxiv.org/pdf/1901.06474v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-representation-learning-characterized-by |
Repo | |
Framework | |
On Using Retrained and Incremental Machine Learning for Modeling Performance of Adaptable Software: An Empirical Comparison
Title | On Using Retrained and Incremental Machine Learning for Modeling Performance of Adaptable Software: An Empirical Comparison |
Authors | Tao Chen |
Abstract | Given the ever-increasing complexity of adaptable software systems and their commonly hidden internal information (e.g., software runs in the public cloud), machine learning based performance modeling has gained momentum for evaluating, understanding and predicting software performance, which facilitates better informed self-adaptations. As performance data accumulates during the run of the software, updating the performance models becomes necessary. To this end, there are two conventional modeling methods: the retrained modeling that always discard the old model and retrain a new one using all available data; or the incremental modeling that retains the existing model and tunes it using one newly arrival data sample. Generally, literature on machine learning based performance modeling for adaptable software chooses either of those methods according to a general belief, but they provide insufficient evidences or references to justify their choice. This paper is the first to report on a comprehensive empirical study that examines both modeling methods under distinct domains of adaptable software, 5 performance indicators, 8 learning algorithms and settings, covering a total of 1,360 different conditions. Our findings challenge the general belief, which is shown to be only partially correct, and reveal some of the important, statistically significant factors that are often overlooked in existing work, providing evidence-based insights on the choice. |
Tasks | |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10614v1 |
http://arxiv.org/pdf/1903.10614v1.pdf | |
PWC | https://paperswithcode.com/paper/on-using-retrained-and-incremental-machine |
Repo | |
Framework | |
Attentional Feature-Pair Relation Networks for Accurate Face Recognition
Title | Attentional Feature-Pair Relation Networks for Accurate Face Recognition |
Authors | Bong-Nam Kang, Yonghyun Kim, Bongjin Jun, Daijin Kim |
Abstract | Human face recognition is one of the most important research areas in biometrics. However, the robust face recognition under a drastic change of the facial pose, expression, and illumination is a big challenging problem for its practical application. Such variations make face recognition more difficult. In this paper, we propose a novel face recognition method, called Attentional Feature-pair Relation Network (AFRN), which represents the face by the relevant pairs of local appearance block features with their attention scores. The AFRN represents the face by all possible pairs of the 9x9 local appearance block features, the importance of each pair is considered by the attention map that is obtained from the low-rank bilinear pooling, and each pair is weighted by its corresponding attention score. To increase the accuracy, we select top-K pairs of local appearance block features as relevant facial information and drop the remaining irrelevant. The weighted top-K pairs are propagated to extract the joint feature-pair relation by using bilinear attention network. In experiments, we show the effectiveness of the proposed AFRN and achieve the outstanding performance in the 1:1 face verification and 1:N face identification tasks compared to existing state-of-the-art methods on the challenging LFW, YTF, CALFW, CPLFW, CFP, AgeDB, IJB-A, IJB-B, and IJB-C datasets. |
Tasks | Face Identification, Face Recognition, Face Verification, Robust Face Recognition |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.06255v1 |
https://arxiv.org/pdf/1908.06255v1.pdf | |
PWC | https://paperswithcode.com/paper/attentional-feature-pair-relation-networks |
Repo | |
Framework | |
Primate Face Identification in the Wild
Title | Primate Face Identification in the Wild |
Authors | Ankita Shukla, Gullal Singh Cheema, Saket Anand, Qamar Qureshi, Yadvendradev Jhala |
Abstract | Ecological imbalance owing to rapid urbanization and deforestation has adversely affected the population of several wild animals. This loss of habitat has skewed the population of several non-human primate species like chimpanzees and macaques and has constrained them to co-exist in close proximity of human settlements, often leading to human-wildlife conflicts while competing for resources. For effective wildlife conservation and conflict management, regular monitoring of population and of conflicted regions is necessary. However, existing approaches like field visits for data collection and manual analysis by experts is resource intensive, tedious and time consuming, thus necessitating an automated, non-invasive, more efficient alternative like image based facial recognition. The challenge in individual identification arises due to unrelated factors like pose, lighting variations and occlusions due to the uncontrolled environments, that is further exacerbated by limited training data. Inspired by human perception, we propose to learn representations that are robust to such nuisance factors and capture the notion of similarity over the individual identity sub-manifolds. The proposed approach, Primate Face Identification (PFID), achieves this by training the network to distinguish between positive and negative pairs of images. The PFID loss augments the standard cross entropy loss with a pairwise loss to learn more discriminative and generalizable features, thus making it appropriate for other related identification tasks like open-set, closed set and verification. We report state-of-the-art accuracy on facial recognition of two primate species, rhesus macaques and chimpanzees under the four protocols of classification, verification, closed-set identification and open-set recognition. |
Tasks | Face Identification, Open Set Learning |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02642v1 |
https://arxiv.org/pdf/1907.02642v1.pdf | |
PWC | https://paperswithcode.com/paper/primate-face-identification-in-the-wild |
Repo | |
Framework | |
Mirroring to Build Trust in Digital Assistants
Title | Mirroring to Build Trust in Digital Assistants |
Authors | Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff |
Abstract | We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way that either matched their conversational style, or did not. Using self-reported personality attributes and subjects’ feedback on the interactions, we built models that can reliably predict a user’s preferred conversational style. |
Tasks | |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01664v1 |
http://arxiv.org/pdf/1904.01664v1.pdf | |
PWC | https://paperswithcode.com/paper/mirroring-to-build-trust-in-digital |
Repo | |
Framework | |
TAN: Temporal Affine Network for Real-Time Left Ventricle Anatomical Structure Analysis Based on 2D Ultrasound Videos
Title | TAN: Temporal Affine Network for Real-Time Left Ventricle Anatomical Structure Analysis Based on 2D Ultrasound Videos |
Authors | Sihong Chen, Kai Ma, Yefeng Zheng |
Abstract | With superiorities on low cost, portability, and free of radiation, echocardiogram is a widely used imaging modality for left ventricle (LV) function quantification. However, automatic LV segmentation and motion tracking is still a challenging task. In addition to fuzzy border definition, low contrast, and abounding artifacts on typical ultrasound images, the shape and size of the LV change significantly in a cardiac cycle. In this work, we propose a temporal affine network (TAN) to perform image analysis in a warped image space, where the shape and size variations due to the cardiac motion as well as other artifacts are largely compensated. Furthermore, we perform three frequent echocardiogram interpretation tasks simultaneously: standard cardiac plane recognition, LV landmark detection, and LV segmentation. Instead of using three networks with one dedicating to each task, we use a multi-task network to perform three tasks simultaneously. Since three tasks share the same encoder, the compact network improves the segmentation accuracy with more supervision. The network is further finetuned with optical flow adjusted annotations to enhance motion coherence in the segmentation result. Experiments on 1,714 2D echocardiographic sequences demonstrate that the proposed method achieves state-of-the-art segmentation accuracy with real-time efficiency. |
Tasks | Optical Flow Estimation |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.00631v1 |
http://arxiv.org/pdf/1904.00631v1.pdf | |
PWC | https://paperswithcode.com/paper/tan-temporal-affine-network-for-real-time |
Repo | |
Framework | |
Fairness in representation: quantifying stereotyping as a representational harm
Title | Fairness in representation: quantifying stereotyping as a representational harm |
Authors | Mohsen Abbasi, Sorelle A. Friedler, Carlos Scheidegger, Suresh Venkatasubramanian |
Abstract | While harms of allocation have been increasingly studied as part of the subfield of algorithmic fairness, harms of representation have received considerably less attention. In this paper, we formalize two notions of stereotyping and show how they manifest in later allocative harms within the machine learning pipeline. We also propose mitigation strategies and demonstrate their effectiveness on synthetic datasets. |
Tasks | |
Published | 2019-01-28 |
URL | http://arxiv.org/abs/1901.09565v1 |
http://arxiv.org/pdf/1901.09565v1.pdf | |
PWC | https://paperswithcode.com/paper/fairness-in-representation-quantifying |
Repo | |
Framework | |
Efficient Pipeline for Camera Trap Image Review
Title | Efficient Pipeline for Camera Trap Image Review |
Authors | Sara Beery, Dan Morris, Siyu Yang |
Abstract | Biologists all over the world use camera traps to monitor biodiversity and wildlife population density. The computer vision community has been making strides towards automating the species classification challenge in camera traps, but it has proven difficult to to apply models trained in one region to images collected in different geographic areas. In some cases, accuracy falls off catastrophically in new region, due to both changes in background and the presence of previously-unseen species. We propose a pipeline that takes advantage of a pre-trained general animal detector and a smaller set of labeled images to train a classification model that can efficiently achieve accurate results in a new region. |
Tasks | |
Published | 2019-07-15 |
URL | https://arxiv.org/abs/1907.06772v1 |
https://arxiv.org/pdf/1907.06772v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-pipeline-for-camera-trap-image |
Repo | |
Framework | |
An Empirical Evaluation of Multi-task Learning in Deep Neural Networks for Natural Language Processing
Title | An Empirical Evaluation of Multi-task Learning in Deep Neural Networks for Natural Language Processing |
Authors | Jianquan Li, Xiaokang Liu, Wenpeng Yin, Min Yang, Liqun Ma |
Abstract | Multi-Task Learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple related tasks. It has shown great success in natural language processing (NLP). Currently, a number of MLT architectures and learning mechanisms have been proposed for various NLP tasks. However, there is no systematic exploration and comparison of different MLT architectures and learning mechanisms for their strong performance in-depth. In this paper, we conduct a thorough examination of typical MTL methods on a broad range of representative NLP tasks. Our primary goal is to understand the merits and demerits of existing MTL methods in NLP tasks, thus devising new hybrid architectures intended to combine their strengths. |
Tasks | Multi-Task Learning |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.07820v1 |
https://arxiv.org/pdf/1908.07820v1.pdf | |
PWC | https://paperswithcode.com/paper/190807820 |
Repo | |
Framework | |
SuperNCN: Neighbourhood consensus network for robust outdoor scenes matching
Title | SuperNCN: Neighbourhood consensus network for robust outdoor scenes matching |
Authors | Grzegorz Kurzejamski, Jacek Komorowski, Lukasz Dabala, Konrad Czarnota, Simon Lynen, Tomasz Trzcinski |
Abstract | In this paper, we present a framework for computing dense keypoint correspondences between images under strong scene appearance changes. Traditional methods, based on nearest neighbour search in the feature descriptor space, perform poorly when environmental conditions vary, e.g. when images are taken at different times of the day or seasons. Our method improves finding keypoint correspondences in such difficult conditions. First, we use Neighbourhood Consensus Networks to build spatially consistent matching grid between two images at a coarse scale. Then, we apply Superpoint-like corner detector to achieve pixel-level accuracy. Both parts use features learned with domain adaptation to increase robustness against strong scene appearance variations. The framework has been tested on a RobotCar Seasons dataset, proving large improvement on pose estimation task under challenging environmental conditions. |
Tasks | Domain Adaptation, Pose Estimation |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04627v1 |
https://arxiv.org/pdf/1912.04627v1.pdf | |
PWC | https://paperswithcode.com/paper/superncn-neighbourhood-consensus-network-for |
Repo | |
Framework | |
Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders
Title | Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders |
Authors | Mustafizur Rahman Shahid, Gregory Blanc, Zonghua Zhang, Hervé Debar |
Abstract | Nowadays, IoT devices have been widely deployed for enabling various smart services, such as, smart home or e-healthcare. However, security remains as one of the paramount concern as many IoT devices are vulnerable. Moreover, IoT malware are constantly evolving and getting more sophisticated. IoT devices are intended to perform very specific tasks, so their networking behavior is expected to be reasonably stable and predictable. Any significant behavioral deviation from the normal patterns would indicate anomalous events. In this paper, we present a method to detect anomalous network communications in IoT networks using a set of sparse autoencoders. The proposed approach allows us to differentiate malicious communications from legitimate ones. So that, if a device is compromised only malicious communications can be dropped while the service provided by the device is not totally interrupted. To characterize network behavior, bidirectional TCP flows are extracted and described using statistics on the size of the first N packets sent and received, along with statistics on the corresponding inter-arrival times between packets. A set of sparse autoencoders is then trained to learn the profile of the legitimate communications generated by an experimental smart home network. Depending on the value of N, the developed model achieves attack detection rates ranging from 86.9% to 91.2%, and false positive rates ranging from 0.1% to 0.5%. |
Tasks | |
Published | 2019-12-26 |
URL | https://arxiv.org/abs/1912.11831v1 |
https://arxiv.org/pdf/1912.11831v1.pdf | |
PWC | https://paperswithcode.com/paper/anomalous-communications-detection-in-iot |
Repo | |
Framework | |
Privacy-preserving Active Learning on Sensitive Data for User Intent Classification
Title | Privacy-preserving Active Learning on Sensitive Data for User Intent Classification |
Authors | Oluwaseyi Feyisetan, Thomas Drake, Borja Balle, Tom Diethe |
Abstract | Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. We evaluate our approach by showing the tradeoff between privacy, utility and annotation budget on a binary classification task in a active learning setting. |
Tasks | Active Learning, Intent Classification |
Published | 2019-03-26 |
URL | http://arxiv.org/abs/1903.11112v1 |
http://arxiv.org/pdf/1903.11112v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-active-learning-on |
Repo | |
Framework | |