Paper Group ANR 304
Automatic Recognition of Mammal Genera on Camera-Trap Images using Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks. Graphcut Texture Synthesis for Single-Image Superresolution. DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs. Deep Projective 3D Semantic Segmentation. SEGCloud: …
Automatic Recognition of Mammal Genera on Camera-Trap Images using Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks
Title | Automatic Recognition of Mammal Genera on Camera-Trap Images using Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks |
Authors | Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Alexander Gomez, Angélica Diaz-Pulido |
Abstract | The segmentation and classification of animals from camera-trap images is due to the conditions under which the images are taken, a difficult task. This work presents a method for classifying and segmenting mammal genera from camera-trap images. Our method uses Multi-Layer Robust Principal Component Analysis (RPCA) for segmenting, Convolutional Neural Networks (CNNs) for extracting features, Least Absolute Shrinkage and Selection Operator (LASSO) for selecting features, and Artificial Neural Networks (ANNs) or Support Vector Machines (SVM) for classifying mammal genera present in the Colombian forest. We evaluated our method with the camera-trap images from the Alexander von Humboldt Biological Resources Research Institute. We obtained an accuracy of 92.65% classifying 8 mammal genera and a False Positive (FP) class, using automatic-segmented images. On the other hand, we reached 90.32% of accuracy classifying 10 mammal genera, using ground-truth images only. Unlike almost all previous works, we confront the animal segmentation and genera classification in the camera-trap recognition. This method shows a new approach toward a fully-automatic detection of animals from camera-trap images. |
Tasks | |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02727v1 |
http://arxiv.org/pdf/1705.02727v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-recognition-of-mammal-genera-on |
Repo | |
Framework | |
Graphcut Texture Synthesis for Single-Image Superresolution
Title | Graphcut Texture Synthesis for Single-Image Superresolution |
Authors | Douglas Summers-Stay |
Abstract | Texture synthesis has proven successful at imitating a wide variety of textures. Adding additional constraints (in the form of a low-resolution version of the texture to be synthesized) makes it possible to use texture synthesis methods for texture superresolution. |
Tasks | Texture Synthesis |
Published | 2017-06-21 |
URL | http://arxiv.org/abs/1706.06942v1 |
http://arxiv.org/pdf/1706.06942v1.pdf | |
PWC | https://paperswithcode.com/paper/graphcut-texture-synthesis-for-single-image |
Repo | |
Framework | |
DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs
Title | DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs |
Authors | K. Ram Prabhakar, V. Sai Srikar, R. Venkatesh Babu |
Abstract | We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-of-the-art approaches for a variety of natural images. |
Tasks | |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07384v1 |
http://arxiv.org/pdf/1712.07384v1.pdf | |
PWC | https://paperswithcode.com/paper/deepfuse-a-deep-unsupervised-approach-for |
Repo | |
Framework | |
Deep Projective 3D Semantic Segmentation
Title | Deep Projective 3D Semantic Segmentation |
Authors | Felix Järemo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg |
Abstract | Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets. In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of 7.9 %, compared to the previous best approach. |
Tasks | Semantic Segmentation |
Published | 2017-05-09 |
URL | http://arxiv.org/abs/1705.03428v1 |
http://arxiv.org/pdf/1705.03428v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-projective-3d-semantic-segmentation |
Repo | |
Framework | |
SEGCloud: Semantic Segmentation of 3D Point Clouds
Title | SEGCloud: Semantic Segmentation of 3D Point Clouds |
Authors | Lyne P. Tchapmi, Christopher B. Choy, Iro Armeni, JunYoung Gwak, Silvio Savarese |
Abstract | 3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks (NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance comparable or superior to the state-of-the-art on all datasets. |
Tasks | Semantic Segmentation |
Published | 2017-10-20 |
URL | http://arxiv.org/abs/1710.07563v1 |
http://arxiv.org/pdf/1710.07563v1.pdf | |
PWC | https://paperswithcode.com/paper/segcloud-semantic-segmentation-of-3d-point |
Repo | |
Framework | |
Scientific Information Extraction with Semi-supervised Neural Tagging
Title | Scientific Information Extraction with Semi-supervised Neural Tagging |
Authors | Yi Luan, Mari Ostendorf, Hannaneh Hajishirzi |
Abstract | This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph-based semi-supervised algorithm together with a data selection scheme to leverage unannotated articles. Both inductive and transductive semi-supervised learning strategies outperform state-of-the-art information extraction performance on the 2017 SemEval Task 10 ScienceIE task. |
Tasks | Named Entity Recognition |
Published | 2017-08-21 |
URL | http://arxiv.org/abs/1708.06075v1 |
http://arxiv.org/pdf/1708.06075v1.pdf | |
PWC | https://paperswithcode.com/paper/scientific-information-extraction-with-semi |
Repo | |
Framework | |
3D Camouflaging Object using RGB-D Sensors
Title | 3D Camouflaging Object using RGB-D Sensors |
Authors | Ahmed M. Siddek, Mohsen A. Rashwan, Islam A. Eshrah |
Abstract | This paper proposes a new optical camouflage system that uses RGB-D cameras, for acquiring point cloud of background scene, and tracking observers eyes. This system enables a user to conceal an object located behind a display that surrounded by 3D objects. If we considered here the tracked point of observer s eyes is a light source, the system will work on estimating shadow shape of the display device that falls on the objects in background. The system uses the 3d observer s eyes and the locations of display corners to predict their shadow points which have nearest neighbors in the constructed point cloud of background scene. |
Tasks | |
Published | 2017-09-24 |
URL | http://arxiv.org/abs/1709.08271v1 |
http://arxiv.org/pdf/1709.08271v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-camouflaging-object-using-rgb-d-sensors |
Repo | |
Framework | |
Differential Performance Debugging with Discriminant Regression Trees
Title | Differential Performance Debugging with Discriminant Regression Trees |
Authors | Saeid Tizpaz-Niari, Pavol Cerny, Bor-Yuh Evan Chang, Ashutosh Trivedi |
Abstract | Differential performance debugging is a technique to find performance problems. It applies in situations where the performance of a program is (unexpectedly) different for different classes of inputs. The task is to explain the differences in asymptotic performance among various input classes in terms of program internals. We propose a data-driven technique based on discriminant regression tree (DRT) learning problem where the goal is to discriminate among different classes of inputs. We propose a new algorithm for DRT learning that first clusters the data into functional clusters, capturing different asymptotic performance classes, and then invokes off-the-shelf decision tree learning algorithms to explain these clusters. We focus on linear functional clusters and adapt classical clustering algorithms (K-means and spectral) to produce them. For the K-means algorithm, we generalize the notion of the cluster centroid from a point to a linear function. We adapt spectral clustering by defining a novel kernel function to capture the notion of linear similarity between two data points. We evaluate our approach on benchmarks consisting of Java programs where we are interested in debugging performance. We show that our algorithm significantly outperforms other well-known regression tree learning algorithms in terms of running time and accuracy of classification. |
Tasks | |
Published | 2017-11-11 |
URL | http://arxiv.org/abs/1711.04076v2 |
http://arxiv.org/pdf/1711.04076v2.pdf | |
PWC | https://paperswithcode.com/paper/differential-performance-debugging-with |
Repo | |
Framework | |
Generative Temporal Models with Memory
Title | Generative Temporal Models with Memory |
Authors | Mevlana Gemici, Chia-Chun Hung, Adam Santoro, Greg Wayne, Shakir Mohamed, Danilo J. Rezende, David Amos, Timothy Lillicrap |
Abstract | We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models’ operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs. |
Tasks | |
Published | 2017-02-15 |
URL | http://arxiv.org/abs/1702.04649v2 |
http://arxiv.org/pdf/1702.04649v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-temporal-models-with-memory |
Repo | |
Framework | |
Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline
Title | Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline |
Authors | Ian Stewart, Jacob Eisenstein |
Abstract | In an online community, new words come and go: today’s “haha” may be replaced by tomorrow’s “lol.” Changes in online writing are usually studied as a social process, with innovations diffusing through a network of individuals in a speech community. But unlike other types of innovation, language change is shaped and constrained by the system in which it takes part. To investigate the links between social and structural factors in language change, we undertake a large-scale analysis of nonstandard word growth in the online community Reddit. We find that dissemination across many linguistic contexts is a sign of growth: words that appear in more linguistic contexts grow faster and survive longer. We also find that social dissemination likely plays a less important role in explaining word growth and decline than previously hypothesized. |
Tasks | |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00345v4 |
http://arxiv.org/pdf/1709.00345v4.pdf | |
PWC | https://paperswithcode.com/paper/making-fetch-happen-the-influence-of-social |
Repo | |
Framework | |
Social Media-based Substance Use Prediction
Title | Social Media-based Substance Use Prediction |
Authors | Tao Ding, Warren K. Bickel, Shimei Pan |
Abstract | In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook `"likes” and “status updates” to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user’s social media behavior (e.g., word usage) and substance use. | |
Tasks | |
Published | 2017-05-16 |
URL | http://arxiv.org/abs/1705.05633v2 |
http://arxiv.org/pdf/1705.05633v2.pdf | |
PWC | https://paperswithcode.com/paper/social-media-based-substance-use-prediction |
Repo | |
Framework | |
Zero-Shot Learning via Latent Space Encoding
Title | Zero-Shot Learning via Latent Space Encoding |
Authors | Yunlong Yu, Zhong Ji, Jichang Guo, Zhongfei, Zhang |
Abstract | Zero-Shot Learning (ZSL) is typically achieved by resorting to a class semantic embedding space to transfer the knowledge from the seen classes to unseen ones. Capturing the common semantic characteristics between the visual modality and the class semantic modality (e.g., attributes or word vector) is a key to the success of ZSL. In this paper, we propose a novel encoder-decoder approach, namely Latent Space Encoding (LSE), to connect the semantic relations of different modalities. Instead of requiring a projection function to transfer information across different modalities like most previous work, LSE per- forms the interactions of different modalities via a feature aware latent space, which is learned in an implicit way. Specifically, different modalities are modeled separately but optimized jointly. For each modality, an encoder-decoder framework is performed to learn a feature aware latent space via jointly maximizing the recoverability of the original space from the latent space and the predictability of the latent space from the original space. To relate different modalities together, their features referring to the same concept are enforced to share the same latent codings. In this way, the common semantic characteristics of different modalities are generalized with the latent representations. Another property of the proposed approach is that it is easily extended to more modalities. Extensive experimental results on four benchmark datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of the proposed approach on several ZSL tasks, including traditional ZSL, generalized ZSL, and zero-shot retrieval (ZSR). |
Tasks | Zero-Shot Learning |
Published | 2017-12-26 |
URL | http://arxiv.org/abs/1712.09300v2 |
http://arxiv.org/pdf/1712.09300v2.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-learning-via-latent-space-encoding |
Repo | |
Framework | |
Detecting Statistical Interactions from Neural Network Weights
Title | Detecting Statistical Interactions from Neural Network Weights |
Authors | Michael Tsang, Dehua Cheng, Yan Liu |
Abstract | Interpreting neural networks is a crucial and challenging task in machine learning. In this paper, we develop a novel framework for detecting statistical interactions captured by a feedforward multilayer neural network by directly interpreting its learned weights. Depending on the desired interactions, our method can achieve significantly better or similar interaction detection performance compared to the state-of-the-art without searching an exponential solution space of possible interactions. We obtain this accuracy and efficiency by observing that interactions between input features are created by the non-additive effect of nonlinear activation functions, and that interacting paths are encoded in weight matrices. We demonstrate the performance of our method and the importance of discovered interactions via experimental results on both synthetic datasets and real-world application datasets. |
Tasks | |
Published | 2017-05-14 |
URL | http://arxiv.org/abs/1705.04977v4 |
http://arxiv.org/pdf/1705.04977v4.pdf | |
PWC | https://paperswithcode.com/paper/detecting-statistical-interactions-from |
Repo | |
Framework | |
Tactics of Adversarial Attack on Deep Reinforcement Learning Agents
Title | Tactics of Adversarial Attack on Deep Reinforcement Learning Agents |
Authors | Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun |
Abstract | We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples, namely the strategically-timed attack and the enchanting attack. In the strategically-timed attack, the adversary aims at minimizing the agent’s reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. In the enchanting attack, the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples is then crafted to lure the agent to take the preferred sequence of actions. We apply the two tactics to the agents trained by the state-of-the-art deep reinforcement learning algorithm including DQN and A3C. In 5 Atari games, our strategically timed attack reduces as much reward as the uniform attack (i.e., attacking at every time step) does by attacking the agent 4 times less often. Our enchanting attack lures the agent toward designated target states with a more than 70% success rate. Videos are available at http://yenchenlin.me/adversarial_attack_RL/ |
Tasks | Adversarial Attack, Atari Games |
Published | 2017-03-08 |
URL | https://arxiv.org/abs/1703.06748v4 |
https://arxiv.org/pdf/1703.06748v4.pdf | |
PWC | https://paperswithcode.com/paper/tactics-of-adversarial-attack-on-deep |
Repo | |
Framework | |
A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks
Title | A breakthrough in Speech emotion recognition using Deep Retinal Convolution Neural Networks |
Authors | Yafeng Niu, Dongsheng Zou, Yadong Niu, Zhongshi He, Hua Tan |
Abstract | Speech emotion recognition (SER) is to study the formation and change of speaker’s emotional state from the speech signal perspective, so as to make the interaction between human and computer more intelligent. SER is a challenging task that has encountered the problem of less training data and low prediction accuracy. Here we propose a data augmentation algorithm based on the imaging principle of the retina and convex lens, to acquire the different sizes of spectrogram and increase the amount of training data by changing the distance between the spectrogram and the convex lens. Meanwhile, with the help of deep learning to get the high-level features, we propose the Deep Retinal Convolution Neural Networks (DRCNNs) for SER and achieve the average accuracy over 99%. The experimental results indicate that DRCNNs outperforms the previous studies in terms of both the number of emotions and the accuracy of recognition. Predictably, our results will dramatically improve human-computer interaction. |
Tasks | Data Augmentation, Emotion Recognition, Speech Emotion Recognition |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.09917v1 |
http://arxiv.org/pdf/1707.09917v1.pdf | |
PWC | https://paperswithcode.com/paper/a-breakthrough-in-speech-emotion-recognition |
Repo | |
Framework | |