Paper Group ANR 46
Talking Face Generation by Conditional Recurrent Adversarial Network. Approximate k-space models and Deep Learning for fast photoacoustic reconstruction. BIN-CT: Urban Waste Collection based in Predicting the Container Fill Level. Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter. Marrying up Regular Expressions …
Talking Face Generation by Conditional Recurrent Adversarial Network
Title | Talking Face Generation by Conditional Recurrent Adversarial Network |
Authors | Yang Song, Jingwen Zhu, Dawei Li, Xiaolong Wang, Hairong Qi |
Abstract | Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generating the talking face video with accurate lip synchronization while maintaining smooth transition of both lip and facial movement over the entire video clip. Existing works either do not consider temporal dependency on face images across different video frames thus easily yielding noticeable/abrupt facial and lip movement or are only limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional video generation network where the audio input is treated as a condition for the recurrent adversarial network such that temporal dependency is incorporated to realize smooth transition for the lip and facial movement. In addition, we deploy a multi-task adversarial training scheme in the context of video generation to improve both photo-realism and the accuracy for lip synchronization. Finally, based on the phoneme distribution information extracted from the audio clip, we develop a sample selection method that effectively reduces the size of the training dataset without sacrificing the quality of the generated video. Extensive experiments on both controlled and uncontrolled datasets demonstrate the superiority of the proposed approach in terms of visual quality, lip sync accuracy, and smooth transition of lip and facial movement, as compared to the state-of-the-art. |
Tasks | Face Generation, Talking Face Generation, Video Generation |
Published | 2018-04-13 |
URL | https://arxiv.org/abs/1804.04786v3 |
https://arxiv.org/pdf/1804.04786v3.pdf | |
PWC | https://paperswithcode.com/paper/talking-face-generation-by-conditional |
Repo | |
Framework | |
Approximate k-space models and Deep Learning for fast photoacoustic reconstruction
Title | Approximate k-space models and Deep Learning for fast photoacoustic reconstruction |
Authors | Andreas Hauptmann, Ben Cox, Felix Lucka, Nam Huynh, Marta Betcke, Paul Beard, Simon Arridge |
Abstract | We present a framework for accelerated iterative reconstructions using a fast and approximate forward model that is based on k-space methods for photoacoustic tomography. The approximate model introduces aliasing artefacts in the gradient information for the iterative reconstruction, but these artefacts are highly structured and we can train a CNN that can use the approximate information to perform an iterative reconstruction. We show feasibility of the method for human in-vivo measurements in a limited-view geometry. The proposed method is able to produce superior results to total variation reconstructions with a speed-up of 32 times. |
Tasks | |
Published | 2018-07-09 |
URL | http://arxiv.org/abs/1807.03191v1 |
http://arxiv.org/pdf/1807.03191v1.pdf | |
PWC | https://paperswithcode.com/paper/approximate-k-space-models-and-deep-learning |
Repo | |
Framework | |
BIN-CT: Urban Waste Collection based in Predicting the Container Fill Level
Title | BIN-CT: Urban Waste Collection based in Predicting the Container Fill Level |
Authors | Javier Ferrer, Enrique Alba |
Abstract | The fast demographic growth, together with the concentration of the population in cities and the increasing amount of daily waste, are factors that push to the limit the ability of waste assimilation by Nature. Therefore, we need technological means to make an optimal management of the waste collection process, which represents 70% of the operational cost in waste treatment. In this article, we present a free intelligent software system, based on computational learning algorithms, which plans the best routes for waste collection supported by past (historical) and future (predictions) data. The objective of the system is the cost reduction of the waste collection service by means of the minimization in distance traveled by any truck to collect a container, hence the fuel consumption. At the same time the quality of service to the citizen is increased avoiding the annoying overflows of containers thanks to the accurate fill level predictions performed by BIN-CT. In this article we show the features of our software system, illustrating it operation with a real case study of a Spanish city. We conclude that the use of BIN-CT avoids unnecessary visits to containers, reduces the distance traveled to collect a container and therefore we obtain a reduction of total costs and harmful emissions thrown to the atmosphere. |
Tasks | |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.01603v2 |
http://arxiv.org/pdf/1807.01603v2.pdf | |
PWC | https://paperswithcode.com/paper/bin-ct-urban-waste-collection-based-in |
Repo | |
Framework | |
Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter
Title | Photometric Stereo in Participating Media Considering Shape-Dependent Forward Scatter |
Authors | Yuki Fujimura, Masaaki Iiyama, Atsushi Hashimoto, Michihiko Minoh |
Abstract | Images captured in participating media such as murky water, fog, or smoke are degraded by scattered light. Thus, the use of traditional three-dimensional (3D) reconstruction techniques in such environments is difficult. In this paper, we propose a photometric stereo method for participating media. The proposed method differs from previous studies with respect to modeling shape-dependent forward scatter. In the proposed model, forward scatter is described as an analytical form using lookup tables and is represented by spatially-variant kernels. We also propose an approximation of a large-scale dense matrix as a sparse matrix, which enables the removal of forward scatter. Experiments with real and synthesized data demonstrate that the proposed method improves 3D reconstruction in participating media. |
Tasks | 3D Reconstruction |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.02836v2 |
http://arxiv.org/pdf/1804.02836v2.pdf | |
PWC | https://paperswithcode.com/paper/photometric-stereo-in-participating-media |
Repo | |
Framework | |
Marrying up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding
Title | Marrying up Regular Expressions with Neural Networks: A Case Study for Spoken Language Understanding |
Authors | Bingfeng Luo, Yansong Feng, Zheng Wang, Songfang Huang, Rui Yan, Dongyan Zhao |
Abstract | The success of many natural language processing (NLP) tasks is bound by the number and quality of annotated data, but there is often a shortage of such training data. In this paper, we ask the question: “Can we combine a neural network (NN) with regular expressions (RE) to improve supervised learning for NLP?". In answer, we develop novel methods to exploit the rich expressiveness of REs at different levels within a NN, showing that the combination significantly enhances the learning effectiveness when a small number of training examples are available. We evaluate our approach by applying it to spoken language understanding for intent detection and slot filling. Experimental results show that our approach is highly effective in exploiting the available training data, giving a clear boost to the RE-unaware NN. |
Tasks | Intent Detection, Slot Filling, Spoken Language Understanding |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05588v1 |
http://arxiv.org/pdf/1805.05588v1.pdf | |
PWC | https://paperswithcode.com/paper/marrying-up-regular-expressions-with-neural |
Repo | |
Framework | |
PADDIT: Probabilistic Augmentation of Data using Diffeomorphic Image Transformation
Title | PADDIT: Probabilistic Augmentation of Data using Diffeomorphic Image Transformation |
Authors | Mauricio Orbes Arteaga, Lauge Sørensen, M. Jorge Cardoso, Marc Modat, Sebastien Ourselin, Stefan Sommer, Mads Nielsen, Christian Igel, Akshay Pai |
Abstract | For proper generalization performance of convolutional neural networks (CNNs) in medical image segmentation, the learnt features should be invariant under particular non-linear shape variations of the input. To induce invariance in CNNs to such transformations, we propose Probabilistic Augmentation of Data using Diffeomorphic Image Transformation (PADDIT) – a systematic framework for generating realistic transformations that can be used to augment data for training CNNs. We show that CNNs trained with PADDIT outperforms CNNs trained without augmentation and with generic augmentation in segmenting white matter hyperintensities from T1 and FLAIR brain MRI scans. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2018-10-03 |
URL | https://arxiv.org/abs/1810.01928v2 |
https://arxiv.org/pdf/1810.01928v2.pdf | |
PWC | https://paperswithcode.com/paper/paddit-probabilistic-augmentation-of-data |
Repo | |
Framework | |
DeeSIL: Deep-Shallow Incremental Learning
Title | DeeSIL: Deep-Shallow Incremental Learning |
Authors | Eden Belouadah, Adrian Popescu |
Abstract | Incremental Learning (IL) is an interesting AI problem when the algorithm is assumed to work on a budget. This is especially true when IL is modeled using a deep learning approach, where two com- plex challenges arise due to limited memory, which induces catastrophic forgetting and delays related to the retraining needed in order to incorpo- rate new classes. Here we introduce DeeSIL, an adaptation of a known transfer learning scheme that combines a fixed deep representation used as feature extractor and learning independent shallow classifiers to in- crease recognition capacity. This scheme tackles the two aforementioned challenges since it works well with a limited memory budget and each new concept can be added within a minute. Moreover, since no deep re- training is needed when the model is incremented, DeeSIL can integrate larger amounts of initial data that provide more transferable features. Performance is evaluated on ImageNet LSVRC 2012 against three state of the art algorithms. Results show that, at scale, DeeSIL performance is 23 and 33 points higher than the best baseline when using the same and more initial data respectively. |
Tasks | Transfer Learning |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06396v1 |
http://arxiv.org/pdf/1808.06396v1.pdf | |
PWC | https://paperswithcode.com/paper/deesil-deep-shallow-incremental-learning |
Repo | |
Framework | |
PRIL: Perceptron Ranking Using Interval Labeled Data
Title | PRIL: Perceptron Ranking Using Interval Labeled Data |
Authors | Naresh Manwani |
Abstract | In this paper, we propose an online learning algorithm PRIL for learning ranking classifiers using interval labeled data and show its correctness. We show its convergence in finite number of steps if there exists an ideal classifier such that the rank given by it for an example always lies in its label interval. We then generalize this mistake bound result for the general case. We also provide regret bound for the proposed algorithm. We propose a multiplicative update algorithm for PRIL called M-PRIL. We provide its correctness and convergence results. We show the effectiveness of PRIL by showing its performance on various datasets. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03873v1 |
http://arxiv.org/pdf/1802.03873v1.pdf | |
PWC | https://paperswithcode.com/paper/pril-perceptron-ranking-using-interval |
Repo | |
Framework | |
Modern Convex Optimization to Medical Image Analysis
Title | Modern Convex Optimization to Medical Image Analysis |
Authors | Jing Yuan, Aaron Fenster |
Abstract | Recently, diagnosis, therapy and monitoring of human diseases involve a variety of imaging modalities, such as magnetic resonance imaging(MRI), computed tomography(CT), Ultrasound(US) and Positron-emission tomography(PET) as well as a variety of modern optical techniques. Over the past two decade, it has been recognized that advanced image processing techniques provide valuable information to physicians for diagnosis, image guided therapy and surgery, and monitoring of the treated organ to the therapy. Many researchers and companies have invested significant efforts in the developments of advanced medical image analysis methods; especially in the two core studies of medical image segmentation and registration, segmentations of organs and lesions are used to quantify volumes and shapes used in diagnosis and monitoring treatment; registration of multimodality images of organs improves detection, diagnosis and staging of diseases as well as image-guided surgery and therapy, registration of images obtained from the same modality are used to monitor progression of therapy. These challenging clinical-motivated applications introduce novel and sophisticated mathematical problems which stimulate developments of advanced optimization and computing methods, especially convex optimization attaining optimum in a global sense, hence, bring an enormous spread of research topics for recent computational medical image analysis. Particularly, distinct from the usual image processing, most medical images have a big volume of acquired data, often in 3D or 4D (3D + t) along with great noises or incomplete image information, and form the challenging large-scale optimization problems; how to process such poor ‘big data’ of medical images efficiently and solve the corresponding optimization problems robustly are the key factors of modern medical image analysis. |
Tasks | Computed Tomography (CT), Medical Image Segmentation, Semantic Segmentation |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.08734v1 |
http://arxiv.org/pdf/1809.08734v1.pdf | |
PWC | https://paperswithcode.com/paper/modern-convex-optimization-to-medical-image |
Repo | |
Framework | |
Computation of Total Kidney Volume from CT images in Autosomal Dominant Polycystic Kidney Disease using Multi-Task 3D Convolutional Neural Networks
Title | Computation of Total Kidney Volume from CT images in Autosomal Dominant Polycystic Kidney Disease using Multi-Task 3D Convolutional Neural Networks |
Authors | Deepak Keshwani, Yoshiro Kitamura, Yuanzhong Li |
Abstract | Autosomal Dominant Polycystic Kidney Disease (ADPKD) characterized by progressive growth of renal cysts is the most prevalent and potentially lethal monogenic renal disease, affecting one in every 500-100 people. Total Kidney Volume (TKV) and its growth computed from Computed Tomography images has been accepted as an essential prognostic marker for renal function loss. Due to large variation in shape and size of kidney in ADPKD, existing methods to compute TKV (i.e. to segment ADKP) including those based on 2D convolutional neural networks are not accurate enough to be directly useful in clinical practice. In this work, we propose multi-task 3D Convolutional Neural Networks to segment ADPK and achieve a mean DICE score of 0.95 and mean absolute percentage TKV error of 3.86. Additionally, to solve the challenge of class imbalance, we propose to simply bootstrap cross entropy loss and compare results with recently prevalent dice loss in medical image segmentation community. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02268v1 |
http://arxiv.org/pdf/1809.02268v1.pdf | |
PWC | https://paperswithcode.com/paper/computation-of-total-kidney-volume-from-ct |
Repo | |
Framework | |
Segmentation of Microscopy Data for finding Nuclei in Divergent Images
Title | Segmentation of Microscopy Data for finding Nuclei in Divergent Images |
Authors | Shivam Singh, Stuti Pathak |
Abstract | Every year millions of people die due to disease of Cancer. Due to its invasive nature it is very complex to cure even in primary stages. Hence, only method to survive this disease completely is via forecasting by analyzing the early mutation in cells of the patient biopsy. Cell Segmentation can be used to find cell which have left their nuclei. This enables faster cure and high rate of survival. Cell counting is a hard, yet tedious task that would greatly benefit from automation. To accomplish this task, segmentation of cells need to be accurate. In this paper, we have improved the learning of training data by our network. It can annotate precise masks on test data. we examine the strength of activation functions in medical image segmentation task by improving learning rates by our proposed Carving Technique. Identifying the cells nuclei is the starting point for most analyses, identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work. Experimental results shows the efficiency of the proposed work. |
Tasks | Cell Segmentation, Medical Image Segmentation, Semantic Segmentation |
Published | 2018-08-19 |
URL | http://arxiv.org/abs/1808.06914v2 |
http://arxiv.org/pdf/1808.06914v2.pdf | |
PWC | https://paperswithcode.com/paper/segmentation-of-microscopy-data-for-finding |
Repo | |
Framework | |
Multi-Agent Actor-Critic with Generative Cooperative Policy Network
Title | Multi-Agent Actor-Critic with Generative Cooperative Policy Network |
Authors | Heechang Ryu, Hayong Shin, Jinkyoo Park |
Abstract | We propose an efficient multi-agent reinforcement learning approach to derive equilibrium strategies for multi-agents who are participating in a Markov game. Mainly, we are focused on obtaining decentralized policies for agents to maximize the performance of a collaborative task by all the agents, which is similar to solving a decentralized Markov decision process. We propose to use two different policy networks: (1) decentralized greedy policy network used to generate greedy action during training and execution period and (2) generative cooperative policy network (GCPN) used to generate action samples to make other agents improve their objectives during training period. We show that the samples generated by GCPN enable other agents to explore the policy space more effectively and favorably to reach a better policy in terms of achieving the collaborative tasks. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09206v1 |
http://arxiv.org/pdf/1810.09206v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-actor-critic-with-generative |
Repo | |
Framework | |
An Analytic Solution to the Inverse Ising Problem in the Tree-reweighted Approximation
Title | An Analytic Solution to the Inverse Ising Problem in the Tree-reweighted Approximation |
Authors | Takashi Sano |
Abstract | Many iterative and non-iterative methods have been developed for inverse problems associated with Ising models. Aiming to derive an accurate non-iterative method for the inverse problems, we employ the tree-reweighted approximation. Using the tree-reweighted approximation, we can optimize the rigorous lower bound of the objective function. By solving the moment-matching and self-consistency conditions analytically, we can derive the interaction matrix as a function of the given data statistics. With this solution, we can obtain the optimal interaction matrix without iterative computation. To evaluate the accuracy of the proposed inverse formula, we compared our results to those obtained by existing inverse formulae derived with other approximations. In an experiment to reconstruct the interaction matrix, we found that the proposed formula returns the best estimates in strongly-attractive regions for various graph structures. We also performed an experiment using real-world biological data. When applied to finding the connectivity of neurons from spike train data, the proposed formula gave the closest result to that obtained by a gradient ascent algorithm, which typically requires thousands of iterations. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11452v1 |
http://arxiv.org/pdf/1805.11452v1.pdf | |
PWC | https://paperswithcode.com/paper/an-analytic-solution-to-the-inverse-ising |
Repo | |
Framework | |
Interest point detectors stability evaluation on ApolloScape dataset
Title | Interest point detectors stability evaluation on ApolloScape dataset |
Authors | Jacek Komorowski, Konrad Czarnota, Tomasz Trzcinski, Lukasz Dabala, Simon Lynen |
Abstract | In the recent years, a number of novel, deep-learning based, interest point detectors, such as LIFT, DELF, Superpoint or LF-Net was proposed. However there’s a lack of a standard benchmark to evaluate suitability of these novel keypoint detectors for real-live applications such as autonomous driving. Traditional benchmarks (e.g. Oxford VGG) are rather limited, as they consist of relatively few images of mostly planar scenes taken in favourable conditions. In this paper we verify if the recent, deep-learning based interest point detectors have the advantage over the traditional, hand-crafted keypoint detectors. To this end, we evaluate stability of a number of hand crafted and recent, learning-based interest point detectors on the street-level view ApolloScape dataset. |
Tasks | Autonomous Driving |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.11039v1 |
http://arxiv.org/pdf/1809.11039v1.pdf | |
PWC | https://paperswithcode.com/paper/interest-point-detectors-stability-evaluation |
Repo | |
Framework | |
Not All Ops Are Created Equal!
Title | Not All Ops Are Created Equal! |
Authors | Liangzhen Lai, Naveen Suda, Vikas Chandra |
Abstract | Efficient and compact neural network models are essential for enabling the deployment on mobile and embedded devices. In this work, we point out that typical design metrics for gauging the efficiency of neural network architectures – total number of operations and parameters – are not sufficient. These metrics may not accurately correlate with the actual deployment metrics such as energy and memory footprint. We show that throughput and energy varies by up to 5X across different neural network operation types on an off-the-shelf Arm Cortex-M7 microcontroller. Furthermore, we show that the memory required for activation data also need to be considered, apart from the model parameters, for network architecture exploration studies. |
Tasks | |
Published | 2018-01-12 |
URL | http://arxiv.org/abs/1801.04326v2 |
http://arxiv.org/pdf/1801.04326v2.pdf | |
PWC | https://paperswithcode.com/paper/not-all-ops-are-created-equal |
Repo | |
Framework | |