Paper Group ANR 616
Multi-level Memory for Task Oriented Dialogs. Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision. Semi-Dense 3D Reconstruction with a Stereo Event Camera. Specular-to-Diffuse Translation for Multi-View Reconstruction. Combining SLAM with muti-spectral photometric stereo for real-time dense 3D reconstruction. Boundary-guided Fea …
Multi-level Memory for Task Oriented Dialogs
Title | Multi-level Memory for Task Oriented Dialogs |
Authors | Revanth Reddy, Danish Contractor, Dinesh Raghu, Sachindra Joshi |
Abstract | Recent end-to-end task oriented dialog systems use memory architectures to incorporate external knowledge in their dialogs. Current work makes simplifying assumptions about the structure of the knowledge base, such as the use of triples to represent knowledge, and combines dialog utterances (context) as well as knowledge base (KB) results as part of the same memory. This causes an explosion in the memory size, and makes the reasoning over memory harder. In addition, such a memory design forces hierarchical properties of the data to be fit into a triple structure of memory. This requires the memory reader to infer relationships across otherwise connected attributes. In this paper we relax the strong assumptions made by existing architectures and separate memories used for modeling dialog context and KB results. Instead of using triples to store KB results, we introduce a novel multi-level memory architecture consisting of cells for each query and their corresponding results. The multi-level memory first addresses queries, followed by results and finally each key-value pair within a result. We conduct detailed experiments on three publicly available task oriented dialog data sets and we find that our method conclusively outperforms current state-of-the-art models. We report a 15-25% increase in both entity F1 and BLEU scores. |
Tasks | |
Published | 2018-10-24 |
URL | https://arxiv.org/abs/1810.10647v2 |
https://arxiv.org/pdf/1810.10647v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-memory-for-task-oriented-dialogs |
Repo | |
Framework | |
Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision
Title | Learning to Generate and Reconstruct 3D Meshes with only 2D Supervision |
Authors | Paul Henderson, Vittorio Ferrari |
Abstract | We present a unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples. These tasks have received considerable attention recently; however, existing approaches rely on 3D supervision, annotation of 2D images with keypoints or poses, and/or training with multiple views of each object instance. Our framework is very general: it can be trained in similar settings to these existing approaches, while also supporting weaker supervision scenarios. Importantly, it can be trained purely from 2D images, without ground-truth pose annotations, and with a single view per instance. We employ meshes as an output representation, instead of voxels used in most prior work. This allows us to exploit shading information during training, which previous 2D-supervised methods cannot. Thus, our method can learn to generate and reconstruct concave object classes. We evaluate our approach on synthetic data in various settings, showing that (i) it learns to disentangle shape from pose; (ii) using shading in the loss improves performance; (iii) our model is comparable or superior to state-of-the-art voxel-based approaches on quantitative metrics, while producing results that are visually more pleasing; (iv) it still performs well when given supervision weaker than in prior works. |
Tasks | 3D Reconstruction |
Published | 2018-07-24 |
URL | http://arxiv.org/abs/1807.09259v3 |
http://arxiv.org/pdf/1807.09259v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-generate-and-reconstruct-3d |
Repo | |
Framework | |
Semi-Dense 3D Reconstruction with a Stereo Event Camera
Title | Semi-Dense 3D Reconstruction with a Stereo Event Camera |
Authors | Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, Davide Scaramuzza |
Abstract | Event cameras are bio-inspired sensors that offer several advantages, such as low latency, high-speed and high dynamic range, to tackle challenging scenarios in computer vision. This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations. |
Tasks | 3D Reconstruction, Simultaneous Localization and Mapping |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07429v1 |
http://arxiv.org/pdf/1807.07429v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-dense-3d-reconstruction-with-a-stereo |
Repo | |
Framework | |
Specular-to-Diffuse Translation for Multi-View Reconstruction
Title | Specular-to-Diffuse Translation for Multi-View Reconstruction |
Authors | Shihao Wu, Hui Huang, Tiziano Portenier, Matan Sela, Danny Cohen-Or, Ron Kimmel, Matthias Zwicker |
Abstract | Most multi-view 3D reconstruction algorithms, especially when shape-from-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial network for transferring multiple views of objects with specular reflection into diffuse ones, so that multi-view reconstruction methods can be applied more effectively. Our network extends unsupervised image-to-image translation to multi-view “specular to diffuse” translation. To preserve object appearance across multiple views, we introduce a Multi-View Coherence loss (MVC) that evaluates the similarity and faithfulness of local patches after the view-transformation. Our MVC loss ensures that the similarity of local correspondences among multi-view images is preserved under the image-to-image translation. As a result, our network yields significantly better results than several single-view baseline techniques. In addition, we carefully design and generate a large synthetic training data set using physically-based rendering. During testing, our network takes only the raw glossy images as input, without extra information such as segmentation masks or lighting estimation. Results demonstrate that multi-view reconstruction can be significantly improved using the images filtered by our network. We also show promising performance on real world training and testing data. |
Tasks | 3D Reconstruction, Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2018-07-14 |
URL | http://arxiv.org/abs/1807.05439v3 |
http://arxiv.org/pdf/1807.05439v3.pdf | |
PWC | https://paperswithcode.com/paper/specular-to-diffuse-translation-for-multi |
Repo | |
Framework | |
Combining SLAM with muti-spectral photometric stereo for real-time dense 3D reconstruction
Title | Combining SLAM with muti-spectral photometric stereo for real-time dense 3D reconstruction |
Authors | Yuanhong Xu, Pei Dong, Junyu Dong, Lin Qi |
Abstract | Obtaining dense 3D reconstrution with low computational cost is one of the important goals in the field of SLAM. In this paper we propose a dense 3D reconstruction framework from monocular multispectral video sequences using jointly semi-dense SLAM and Multispectral Photometric Stereo approaches. Starting from multispectral video, SALM (a) reconstructs a semi-dense 3D shape that will be densified;(b) recovers relative sparse depth map that is then fed as prioris into optimization-based multispectral photometric stereo for a more accurate dense surface normal recovery;(c)obtains camera pose that is subsequently used for conversion of view in the process of fusion where we combine the relative sparse point cloud with the dense surface normal using the automated cross-scale fusion method proposed in this paper to get a dense point cloud with subtle texture information. Experiments show that our method can effectively obtain denser 3D reconstructions. |
Tasks | 3D Reconstruction |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02294v1 |
http://arxiv.org/pdf/1807.02294v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-slam-with-muti-spectral-photometric |
Repo | |
Framework | |
Boundary-guided Feature Aggregation Network for Salient Object Detection
Title | Boundary-guided Feature Aggregation Network for Salient Object Detection |
Authors | Yunzhi Zhuge, Pingping Zhang, Huchuan Lu |
Abstract | Fully convolutional networks (FCN) has significantly improved the performance of many pixel-labeling tasks, such as semantic segmentation and depth estimation. However, it still remains non-trivial to thoroughly utilize the multi-level convolutional feature maps and boundary information for salient object detection. In this paper, we propose a novel FCN framework to integrate multi-level convolutional features recurrently with the guidance of object boundary information. First, a deep convolutional network is used to extract multi-level feature maps and separately aggregate them into multiple resolutions, which can be used to generate coarse saliency maps. Meanwhile, another boundary information extraction branch is proposed to generate boundary features. Finally, an attention-based feature fusion module is designed to fuse boundary information into salient regions to achieve accurate boundary inference and semantic enhancement. The final saliency maps are the combination of the predicted boundary maps and integrated saliency maps, which are more closer to the ground truths. Experiments and analysis on four large-scale benchmarks verify that our framework achieves new state-of-the-art results. |
Tasks | Depth Estimation, Object Detection, Salient Object Detection, Semantic Segmentation |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.10821v1 |
http://arxiv.org/pdf/1809.10821v1.pdf | |
PWC | https://paperswithcode.com/paper/boundary-guided-feature-aggregation-network |
Repo | |
Framework | |
Road surface 3d reconstruction based on dense subpixel disparity map estimation
Title | Road surface 3d reconstruction based on dense subpixel disparity map estimation |
Authors | Rui Fan, Xiao Ai, Naim Dahnoun |
Abstract | Various 3D reconstruction methods have enabled civil engineers to detect damage on a road surface. To achieve the millimetre accuracy required for road condition assessment, a disparity map with subpixel resolution needs to be used. However, none of the existing stereo matching algorithms are specially suitable for the reconstruction of the road surface. Hence in this paper, we propose a novel dense subpixel disparity estimation algorithm with high computational efficiency and robustness. This is achieved by first transforming the perspective view of the target frame into the reference view, which not only increases the accuracy of the block matching for the road surface but also improves the processing speed. The disparities are then estimated iteratively using our previously published algorithm where the search range is propagated from three estimated neighbouring disparities. Since the search range is obtained from the previous iteration, errors may occur when the propagated search range is not sufficient. Therefore, a correlation maxima verification is performed to rectify this issue, and the subpixel resolution is achieved by conducting a parabola interpolation enhancement. Furthermore, a novel disparity global refinement approach developed from the Markov Random Fields and Fast Bilateral Stereo is introduced to further improve the accuracy of the estimated disparity map, where disparities are updated iteratively by minimising the energy function that is related to their interpolated correlation polynomials. The algorithm is implemented in C language with a near real-time performance. The experimental results illustrate that the absolute error of the reconstruction varies from 0.1 mm to 3 mm. |
Tasks | 3D Reconstruction, Disparity Estimation, Stereo Matching, Stereo Matching Hand |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.01874v1 |
http://arxiv.org/pdf/1807.01874v1.pdf | |
PWC | https://paperswithcode.com/paper/road-surface-3d-reconstruction-based-on-dense |
Repo | |
Framework | |
Deep Genetic Network
Title | Deep Genetic Network |
Authors | Siddhartha Dhar Choudhury, Shashank Pandey, Kunal Mehrotra |
Abstract | Optimizing a neural network’s performance is a tedious and time taking process, this iterative process does not have any defined solution which can work for all the problems. Optimization can be roughly categorized into - Architecture and Hyperparameter optimization. Many algorithms have been devised to address this problem. In this paper we introduce a neural network architecture (Deep Genetic Network) which will optimize its parameters during training based on its fitness. Deep Genetic Net uses genetic algorithms along with deep neural networks to address the hyperparameter optimization problem, this approach uses ideas like mating and mutation which are key to genetic algorithms which help the neural net architecture to learn to optimize its hyperparameters by itself rather than depending on a person to explicitly set the values. Using genetic algorithms for this problem proved to work exceptionally well when given enough time to train the network. The proposed architecture is found to work well in optimizing hyperparameters in affine, convolutional and recurrent layers proving to be a good choice for conventional supervised learning tasks. |
Tasks | Hyperparameter Optimization |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01845v2 |
http://arxiv.org/pdf/1811.01845v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-genetic-network |
Repo | |
Framework | |
Preprocessor Selection for Machine Learning Pipelines
Title | Preprocessor Selection for Machine Learning Pipelines |
Authors | Brandon Schoenfeld, Christophe Giraud-Carrier, Mason Poggemann, Jarom Christensen, Kevin Seppi |
Abstract | Much of the work in metalearning has focused on classifier selection, combined more recently with hyperparameter optimization, with little concern for data preprocessing. Yet, it is generally well accepted that machine learning applications require not only model building, but also data preprocessing. In other words, practical solutions consist of pipelines of machine learning operators rather than single algorithms. Interestingly, our experiments suggest that, on average, data preprocessing hinders accuracy, while the best performing pipelines do actually make use of preprocessors. Here, we conduct an extensive empirical study over a wide range of learning algorithms and preprocessors, and use metalearning to determine when one should make use of preprocessors in ML pipeline design. |
Tasks | Hyperparameter Optimization |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09942v1 |
http://arxiv.org/pdf/1810.09942v1.pdf | |
PWC | https://paperswithcode.com/paper/preprocessor-selection-for-machine-learning |
Repo | |
Framework | |
Interpretable Textual Neuron Representations for NLP
Title | Interpretable Textual Neuron Representations for NLP |
Authors | Nina Poerner, Benjamin Roth, Hinrich Schütze |
Abstract | Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture. |
Tasks | |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07291v1 |
http://arxiv.org/pdf/1809.07291v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-textual-neuron-representations |
Repo | |
Framework | |
Feature extraction without learning in an analog Spatial Pooler memristive-CMOS circuit design of Hierarchical Temporal Memory
Title | Feature extraction without learning in an analog Spatial Pooler memristive-CMOS circuit design of Hierarchical Temporal Memory |
Authors | Olga Krestinskaya, Alex Pappachen James |
Abstract | Hierarchical Temporal Memory (HTM) is a neuromorphic algorithm that emulates sparsity, hierarchy and modularity resembling the working principles of neocortex. Feature encoding is an important step to create sparse binary patterns. This sparsity is introduced by the binary weights and random weight assignment in the initialization stage of the HTM. We propose the alternative deterministic method for the HTM initialization stage, which connects the HTM weights to the input data and preserves natural sparsity of the input information. Further, we introduce the hardware implementation of the deterministic approach and compare it to the traditional HTM and existing hardware implementation. We test the proposed approach on the face recognition problem and show that it outperforms the conventional HTM approach. |
Tasks | Face Recognition |
Published | 2018-03-14 |
URL | http://arxiv.org/abs/1803.05131v1 |
http://arxiv.org/pdf/1803.05131v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-extraction-without-learning-in-an |
Repo | |
Framework | |
CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms
Title | CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms |
Authors | Jinwoong Kim, Minkyu Kim, Heungseok Park, Ernar Kusdavletov, Dongjun Lee, Adrian Kim, Ji-Hoon Kim, Jung-Woo Ha, Nako Sung |
Abstract | Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance. Here we propose a novel cloud-based HyperOpt (CHOPT) framework which can efficiently utilize shared computing resources while supporting various HyperOpt algorithms. We incorporate convenient web-based user interfaces, visualization, and analysis tools, enabling users to easily control optimization procedures and build up valuable insights with an iterative analysis procedure. Furthermore, our framework can be incorporated with any cloud platform, thus complementarily increasing the efficiency of conventional deep learning frameworks. We demonstrate applications of CHOPT with tasks such as image recognition and question-answering, showing that our framework can find hyperparameter configurations competitive with previous work. We also show CHOPT is capable of providing interesting observations through its analysing tools |
Tasks | Hyperparameter Optimization, Question Answering |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03527v2 |
http://arxiv.org/pdf/1810.03527v2.pdf | |
PWC | https://paperswithcode.com/paper/chopt-automated-hyperparameter-optimization |
Repo | |
Framework | |
Large Scale Automated Forecasting for Monitoring Network Safety and Security
Title | Large Scale Automated Forecasting for Monitoring Network Safety and Security |
Authors | Roi Naveiro, Simón Rodríguez, David Ríos Insua |
Abstract | Real time large scale streaming data pose major challenges to forecasting, in particular defying the presence of human experts to perform the corresponding analysis. We present here a class of models and methods used to develop an automated, scalable and versatile system for large scale forecasting oriented towards safety and security monitoring. Our system provides short and long term forecasts and uses them to detect safety and security issues in relation with multiple internet connected devices well in advance they might take place. |
Tasks | |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06678v2 |
http://arxiv.org/pdf/1802.06678v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-automated-forecasting-for |
Repo | |
Framework | |
Gradient Reversal Against Discrimination
Title | Gradient Reversal Against Discrimination |
Authors | Edward Raff, Jared Sylvester |
Abstract | No methods currently exist for making arbitrary neural networks fair. In this work we introduce GRAD, a new and simplified method to producing fair neural networks that can be used for auto-encoding fair representations or directly with predictive networks. It is easy to implement and add to existing architectures, has only one (insensitive) hyper-parameter, and provides improved individual and group fairness. We use the flexibility of GRAD to demonstrate multi-attribute protection. |
Tasks | |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00392v1 |
http://arxiv.org/pdf/1807.00392v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-reversal-against-discrimination |
Repo | |
Framework | |
Leveraging Unlabeled Whole-Slide-Images for Mitosis Detection
Title | Leveraging Unlabeled Whole-Slide-Images for Mitosis Detection |
Authors | Saad Ullah Akram, Talha Qaiser, Simon Graham, Juho Kannala, Janne Heikkilä, Nasir Rajpoot |
Abstract | Mitosis count is an important biomarker for prognosis of various cancers. At present, pathologists typically perform manual counting on a few selected regions of interest in breast whole-slide-images (WSIs) of patient biopsies. This task is very time-consuming, tedious and subjective. Automated mitosis detection methods have made great advances in recent years. However, these methods require exhaustive labeling of a large number of selected regions of interest. This task is very expensive because expert pathologists are needed for reliable and accurate annotations. In this paper, we present a semi-supervised mitosis detection method which is designed to leverage a large number of unlabeled breast cancer WSIs. As a result, our method capitalizes on the growing number of digitized histology images, without relying on exhaustive annotations, subsequently improving mitosis detection. Our method first learns a mitosis detector from labeled data, uses this detector to mine additional mitosis samples from unlabeled WSIs, and then trains the final model using this larger and diverse set of mitosis samples. The use of unlabeled data improves F1-score by $\sim$5% compared to our best performing fully-supervised model on the TUPAC validation set. Our submission (single model) to TUPAC challenge ranks highly on the leaderboard with an F1-score of 0.64. |
Tasks | Mitosis Detection |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11677v1 |
http://arxiv.org/pdf/1807.11677v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-unlabeled-whole-slide-images-for |
Repo | |
Framework | |