Paper Group AWR 46
Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing. 3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations. Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation. ShuffleSeg: Real-time Semantic Segmentation Network. Generative Prob …
Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing
Title | Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Typing |
Authors | Yadollah Yaghoobzadeh, Hinrich Schütze |
Abstract | Knowledge bases (KBs) are paramount in NLP. We employ multiview learning for increasing accuracy and coverage of entity type information in KBs. We rely on two metaviews: language and representation. For language, we consider high-resource and low-resource languages from Wikipedia. For representation, we consider representations based on the context distribution of the entity (i.e., on its embedding), on the entity’s name (i.e., on its surface form) and on its description in Wikipedia. The two metaviews language and representation can be freely combined: each pair of language and representation (e.g., German embedding, English description, Spanish name) is a distinct view. Our experiments on entity typing with fine-grained classes demonstrate the effectiveness of multiview learning. We release MVET, a large multiview - and, in particular, multilingual - entity typing dataset we created. Mono- and multilingual fine-grained entity typing systems can be evaluated on this dataset. |
Tasks | Entity Typing, Multiview Learning, MULTI-VIEW LEARNING |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10499v1 |
http://arxiv.org/pdf/1810.10499v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-multi-view-learning-multilingual-and |
Repo | https://github.com/yyaghoobzadeh/MVET |
Framework | none |
3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations
Title | 3D-PhysNet: Learning the Intuitive Physics of Non-Rigid Object Deformations |
Authors | Zhihua Wang, Stefano Rosa, Bo Yang, Sen Wang, Niki Trigoni, Andrew Markham |
Abstract | The ability to interact and understand the environment is a fundamental prerequisite for a wide range of applications from robotics to augmented reality. In particular, predicting how deformable objects will react to applied forces in real time is a significant challenge. This is further confounded by the fact that shape information about encountered objects in the real world is often impaired by occlusions, noise and missing regions e.g. a robot manipulating an object will only be able to observe a partial view of the entire solid. In this work we present a framework, 3D-PhysNet, which is able to predict how a three-dimensional solid will deform under an applied force using intuitive physics modelling. In particular, we propose a new method to encode the physical properties of the material and the applied force, enabling generalisation over materials. The key is to combine deep variational autoencoders with adversarial training, conditioned on the applied force and the material properties. We further propose a cascaded architecture that takes a single 2.5D depth view of the object and predicts its deformation. Training data is provided by a physics simulator. The network is fast enough to be used in real-time applications from partial views. Experimental results show the viability and the generalisation properties of the proposed architecture. |
Tasks | |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1805.00328v2 |
http://arxiv.org/pdf/1805.00328v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-physnet-learning-the-intuitive-physics-of |
Repo | https://github.com/vividda/3D-PhysNet |
Framework | tf |
Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation
Title | Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation |
Authors | István Sárándi, Timm Linder, Kai O. Arras, Bastian Leibe |
Abstract | In this paper we present our winning entry at the 2018 ECCV PoseTrack Challenge on 3D human pose estimation. Using a fully-convolutional backbone architecture, we obtain volumetric heatmaps per body joint, which we convert to coordinates using soft-argmax. Absolute person center depth is estimated by a 1D heatmap prediction head. The coordinates are back-projected to 3D camera space, where we minimize the L1 loss. Key to our good results is the training data augmentation with randomly placed occluders from the Pascal VOC dataset. In addition to reaching first place in the Challenge, our method also surpasses the state-of-the-art on the full Human3.6M benchmark among methods that use no additional pose datasets in training. Code for applying synthetic occlusions is availabe at https://github.com/isarandi/synthetic-occlusion. |
Tasks | 3D Human Pose Estimation, Data Augmentation, Pose Estimation |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04987v3 |
http://arxiv.org/pdf/1809.04987v3.pdf | |
PWC | https://paperswithcode.com/paper/synthetic-occlusion-augmentation-with |
Repo | https://github.com/isarandi/synthetic-occlusion |
Framework | none |
ShuffleSeg: Real-time Semantic Segmentation Network
Title | ShuffleSeg: Real-time Semantic Segmentation Network |
Authors | Mostafa Gamal, Mennatullah Siam, Moemen Abdel-Razek |
Abstract | Real-time semantic segmentation is of significant importance for mobile and robotics related applications. We propose a computationally efficient segmentation network which we term as ShuffleSeg. The proposed architecture is based on grouped convolution and channel shuffling in its encoder for improving the performance. An ablation study of different decoding methods is compared including Skip architecture, UNet, and Dilation Frontend. Interesting insights on the speed and accuracy tradeoff is discussed. It is shown that skip architecture in the decoding method provides the best compromise for the goal of real-time performance, while it provides adequate accuracy by utilizing higher resolution feature maps for a more accurate segmentation. ShuffleSeg is evaluated on CityScapes and compared against the state of the art real-time segmentation networks. It achieves 2x GFLOPs reduction, while it provides on par mean intersection over union of 58.3% on CityScapes test set. ShuffleSeg runs at 15.7 frames per second on NVIDIA Jetson TX2, which makes it of great potential for real-time applications. |
Tasks | Real-Time Semantic Segmentation, Semantic Segmentation |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03816v2 |
http://arxiv.org/pdf/1803.03816v2.pdf | |
PWC | https://paperswithcode.com/paper/shuffleseg-real-time-semantic-segmentation |
Repo | https://github.com/Davidnet/TFSegmentation |
Framework | tf |
Generative Probabilistic Novelty Detection with Adversarial Autoencoders
Title | Generative Probabilistic Novelty Detection with Adversarial Autoencoders |
Authors | Stanislav Pidhorskyi, Ranya Almohsen, Donald A Adjeroh, Gianfranco Doretto |
Abstract | Novelty detection is the problem of identifying whether a new data point is considered to be an inlier or an outlier. We assume that training data is available to describe only the inlier distribution. Recent approaches primarily leverage deep encoder-decoder network architectures to compute a reconstruction error that is used to either compute a novelty score or to train a one-class classifier. While we too leverage a novel network of that kind, we take a probabilistic approach and effectively compute how likely is that a sample was generated by the inlier distribution. We achieve this with two main contributions. First, we make the computation of the novelty probability feasible because we linearize the parameterized manifold capturing the underlying structure of the inlier distribution, and show how the probability factorizes and can be computed with respect to local coordinates of the manifold tangent space. Second, we improved the training of the autoencoder network. An extensive set of results show that the approach achieves state-of-the-art results on several benchmark datasets. |
Tasks | One-class classifier |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02588v2 |
http://arxiv.org/pdf/1807.02588v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-probabilistic-novelty-detection |
Repo | https://github.com/podgorskiy/GPND |
Framework | none |
Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts
Title | Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts |
Authors | Shafin Rahman, Salman Khan, Fatih Porikli |
Abstract | Current Zero-Shot Learning (ZSL) approaches are restricted to recognition of a single dominant unseen object category in a test image. We hypothesize that this setting is ill-suited for real-world applications where unseen objects appear only as a part of a complex scene, warranting both the recognition' and localization’ of an unseen category. To address this limitation, we introduce a new \emph{`Zero-Shot Detection’} (ZSD) problem setting, which aims at simultaneously recognizing and locating object instances belonging to novel categories without any training examples. We also propose a new experimental protocol for ZSD based on the highly challenging ILSVRC dataset, adhering to practical issues, e.g., the rarity of unseen objects. To the best of our knowledge, this is the first end-to-end deep network for ZSD that jointly models the interplay between visual and semantic domain information. To overcome the noise in the automatically derived semantic descriptions, we utilize the concept of meta-classes to design an original loss function that achieves synergy between max-margin class separation and semantic space clustering. Furthermore, we present a baseline approach extended from recognition to detection setting. Our extensive experiments show significant performance boost over the baseline on the imperative yet difficult ZSD problem. | |
Tasks | Object Detection, Zero-Shot Learning, Zero-Shot Object Detection |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06049v1 |
http://arxiv.org/pdf/1803.06049v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-object-detection-learning-to |
Repo | https://github.com/salman-h-khan/ZSD_Release |
Framework | tf |
Efficient Misalignment-Robust Multi-Focus Microscopical Images Fusion
Title | Efficient Misalignment-Robust Multi-Focus Microscopical Images Fusion |
Authors | Yixiong Liang, Yuan Mao, Zhihong Tang, Meng Yan, Yuqian Zhao, Jianfeng Liu |
Abstract | In this paper we propose a very efficient method to fuse the unregistered multi-focus microscopical images based on the speed-up robust features (SURF). Our method follows the pipeline of first registration and then fusion. However, instead of treating the registration and fusion as two completely independent stage, we propose to reuse the determinant of the approximate Hessian generated in SURF detection stage as the corresponding salient response for the final image fusion, thus it enables nearly cost-free saliency map generation. In addition, due to the adoption of SURF scale space representation, our method can generate scale-invariant saliency map which is desired for scale-invariant image fusion. We present an extensive evaluation on the dataset consisting of several groups of unregistered multi-focus 4K ultra HD microscopic images with size of 4112 x 3008. Compared with the state-of-the-art multi-focus image fusion methods, our method is much faster and achieve better results in the visual performance. Our method provides a flexible and efficient way to integrate complementary and redundant information from multiple multi-focus ultra HD unregistered images into a fused image that contains better description than any of the individual input images. Code is available at https://github.com/yiqingmy/JointRF. |
Tasks | Multi-Focus Microscopical Images Fusion |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.08915v1 |
http://arxiv.org/pdf/1812.08915v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-misalignment-robust-multi-focus |
Repo | https://github.com/yiqingmy/JointRF |
Framework | none |
Bipedal Walking Robot using Deep Deterministic Policy Gradient
Title | Bipedal Walking Robot using Deep Deterministic Policy Gradient |
Authors | Arun Kumar, Navneet Paul, S N Omkar |
Abstract | Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern. The video presenting our experiment is available at https://goo.gl/NHXKqR. |
Tasks | Decision Making, Imitation Learning |
Published | 2018-07-16 |
URL | http://arxiv.org/abs/1807.05924v2 |
http://arxiv.org/pdf/1807.05924v2.pdf | |
PWC | https://paperswithcode.com/paper/bipedal-walking-robot-using-deep |
Repo | https://github.com/nav74neet/ddpg_biped |
Framework | tf |
A Corpus for Multilingual Document Classification in Eight Languages
Title | A Corpus for Multilingual Document Classification in Eight Languages |
Authors | Holger Schwenk, Xian Li |
Abstract | Cross-lingual document classification aims at training a document classifier on resources in one language and transferring it to a different language without any additional resources. Several approaches have been proposed in the literature and the current best practice is to evaluate them on a subset of the Reuters Corpus Volume 2. However, this subset covers only few languages (English, German, French and Spanish) and almost all published works focus on the the transfer between English and German. In addition, we have observed that the class prior distributions differ significantly between the languages. We argue that this complicates the evaluation of the multilinguality. In this paper, we propose a new subset of the Reuters corpus with balanced class priors for eight languages. By adding Italian, Russian, Japanese and Chinese, we cover languages which are very different with respect to syntax, morphology, etc. We provide strong baselines for all language transfer directions using multilingual word and sentence embeddings respectively. Our goal is to offer a freely available framework to evaluate cross-lingual document classification, and we hope to foster by these means, research in this important area. |
Tasks | Cross-Lingual Document Classification, Document Classification, Sentence Embeddings |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09821v1 |
http://arxiv.org/pdf/1805.09821v1.pdf | |
PWC | https://paperswithcode.com/paper/a-corpus-for-multilingual-document |
Repo | https://github.com/n-waves/multifit |
Framework | none |
A Unified Batch Online Learning Framework for Click Prediction
Title | A Unified Batch Online Learning Framework for Click Prediction |
Authors | Rishabh Iyer, Nimit Acharya, Tanuja Bompada, Denis Charles, Eren Manavoglu |
Abstract | We present a unified framework for Batch Online Learning (OL) for Click Prediction in Search Advertisement. Machine Learning models once deployed, show non-trivial accuracy and calibration degradation over time due to model staleness. It is therefore necessary to regularly update models, and do so automatically. This paper presents two paradigms of Batch Online Learning, one which incrementally updates the model parameters via an early stopping mechanism, and another which does so through a proximal regularization. We argue how both these schemes naturally trade-off between old and new data. We then theoretically and empirically show that these two seemingly different schemes are closely related. Through extensive experiments, we demonstrate the utility of of our OL framework; how the two OL schemes relate to each other and how they trade-off between the new and historical data. We then compare batch OL to full model retrains, and show how online learning is more robust to data issues. We also demonstrate the long term impact of Online Learning, the role of the initial Models in OL, the impact of delays in the update, and finally conclude with some implementation details and challenges in deploying a real world online learning system in production. While this paper mostly focuses on application of click prediction for search advertisement, we hope that the lessons learned here can be carried over to other problem domains. |
Tasks | Calibration |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04673v1 |
http://arxiv.org/pdf/1809.04673v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-batch-online-learning-framework-for |
Repo | https://github.com/rishabhk108/jensen-ol |
Framework | none |
Multi-time-horizon Solar Forecasting Using Recurrent Neural Network
Title | Multi-time-horizon Solar Forecasting Using Recurrent Neural Network |
Authors | Sakshi Mishra, Praveen Palanisamy |
Abstract | The non-stationarity characteristic of the solar power renders traditional point forecasting methods to be less useful due to large prediction errors. This results in increased uncertainties in the grid operation, thereby negatively affecting the reliability and increased cost of operation. This research paper proposes a unified architecture for multi-time-horizon predictions for short and long-term solar forecasting using Recurrent Neural Networks (RNN). The paper describes an end-to-end pipeline to implement the architecture along with the methods to test and validate the performance of the prediction model. The results demonstrate that the proposed method based on the unified architecture is effective for multi-horizon solar forecasting and achieves a lower root-mean-squared prediction error compared to the previous best-performing methods which use one model for each time-horizon. The proposed method enables multi-horizon forecasts with real-time inputs, which have a high potential for practical applications in the evolving smart grid. |
Tasks | |
Published | 2018-07-14 |
URL | http://arxiv.org/abs/1807.05459v1 |
http://arxiv.org/pdf/1807.05459v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-time-horizon-solar-forecasting-using |
Repo | https://github.com/sakshi-mishra/solar-forecasting-RNN |
Framework | pytorch |
Dual Attention Network for Scene Segmentation
Title | Dual Attention Network for Scene Segmentation |
Authors | Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, Hanqing Lu |
Abstract | In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the selfattention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data. We make the code and trained model publicly available at https://github.com/junfu1115/DANet |
Tasks | Scene Segmentation, Semantic Segmentation |
Published | 2018-09-09 |
URL | http://arxiv.org/abs/1809.02983v4 |
http://arxiv.org/pdf/1809.02983v4.pdf | |
PWC | https://paperswithcode.com/paper/dual-attention-network-for-scene-segmentation |
Repo | https://github.com/yougoforward/hlzhu_DANet_git |
Framework | pytorch |
Dynamic-Net: Tuning the Objective Without Re-training for Synthesis Tasks
Title | Dynamic-Net: Tuning the Objective Without Re-training for Synthesis Tasks |
Authors | Alon Shoshan, Roey Mechrez, Lihi Zelnik-Manor |
Abstract | One of the key ingredients for successful optimization of modern CNNs is identifying a suitable objective. To date, the objective is fixed a-priori at training time, and any variation to it requires re-training a new network. In this paper we present a first attempt at alleviating the need for re-training. Rather than fixing the network at training time, we train a “Dynamic-Net” that can be modified at inference time. Our approach considers an “objective-space” as the space of all linear combinations of two objectives, and the Dynamic-Net is emulating the traversing of this objective-space at test-time, without any further training. We show that this upgrades pre-trained networks by providing an out-of-learning extension, while maintaining the performance quality. The solution we propose is fast and allows a user to interactively modify the network, in real-time, in order to obtain the result he/she desires. We show the benefits of such an approach via several different applications. |
Tasks | |
Published | 2018-11-21 |
URL | https://arxiv.org/abs/1811.08760v2 |
https://arxiv.org/pdf/1811.08760v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-net-tuning-the-objective-without-re |
Repo | https://github.com/AlonShoshan10/dynamic_net |
Framework | pytorch |
Image Forensics: Detecting duplication of scientific images with manipulation-invariant image similarity
Title | Image Forensics: Detecting duplication of scientific images with manipulation-invariant image similarity |
Authors | M. Cicconet, H. Elliott, D. L. Richmond, D. Wainstock, M. Walsh |
Abstract | Manipulation and re-use of images in scientific publications is a concerning problem that currently lacks a scalable solution. Current tools for detecting image duplication are mostly manual or semi-automated, despite the availability of an overwhelming target dataset for a learning-based approach. This paper addresses the problem of determining if, given two images, one is a manipulated version of the other by means of copy, rotation, translation, scale, perspective transform, histogram adjustment, or partial erasing. We propose a data-driven solution based on a 3-branch Siamese Convolutional Neural Network. The ConvNet model is trained to map images into a 128-dimensional space, where the Euclidean distance between duplicate images is smaller than or equal to 1, and the distance between unique images is greater than 1. Our results suggest that such an approach has the potential to improve surveillance of the published and in-peer-review literature for image manipulation. |
Tasks | |
Published | 2018-02-19 |
URL | https://arxiv.org/abs/1802.06515v3 |
https://arxiv.org/pdf/1802.06515v3.pdf | |
PWC | https://paperswithcode.com/paper/image-forensics-detecting-duplication-of |
Repo | https://github.com/teddykoker/image-forensics |
Framework | pytorch |
Attributes as Operators: Factorizing Unseen Attribute-Object Compositions
Title | Attributes as Operators: Factorizing Unseen Attribute-Object Compositions |
Authors | Tushar Nagarajan, Kristen Grauman |
Abstract | We present a new approach to modeling visual attributes. Prior work casts attributes in a similar role as objects, learning a latent representation where properties (e.g., sliced) are recognized by classifiers much in the way objects (e.g., apple) are. However, this common approach fails to separate the attributes observed during training from the objects with which they are composed, making it ineffectual when encountering new attribute-object compositions. Instead, we propose to model attributes as operators. Our approach learns a semantic embedding that explicitly factors out attributes from their accompanying objects, and also benefits from novel regularizers expressing attribute operators’ effects (e.g., blunt should undo the effects of sharp). Not only does our approach align conceptually with the linguistic role of attributes as modifiers, but it also generalizes to recognize unseen compositions of objects and attributes. We validate our approach on two challenging datasets and demonstrate significant improvements over the state-of-the-art. In addition, we show that not only can our model recognize unseen compositions robustly in an open-world setting, it can also generalize to compositions where objects themselves were unseen during training. |
Tasks | |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.09851v2 |
http://arxiv.org/pdf/1803.09851v2.pdf | |
PWC | https://paperswithcode.com/paper/attributes-as-operators-factorizing-unseen |
Repo | https://github.com/Tushar-N/attributes-as-operators |
Framework | pytorch |