7/24/2019

Mapping roads through deep learning and weakly supervised training

Saikat Basu, Derrick Bonafilia, James Gill, Danil Kirsanov, and David Yang, Mapping roads through deep learning and weakly supervised training, Facebook, July 23, 2019.
We collected our training data as a set of 2,048-by-2,048-pixel tiles, with a resolution of approximately 24 inches per pixel. We discarded tiles where fewer than 25 roads had been mapped, because we found that they often included only major roads (with no examples of smaller roads that would be more challenging to label correctly). For each remaining tile, we rasterized the road vectors and used the resulting mask as our training label. To work at the same resolution as the DeepGlobe data set, we randomly cropped each image to 1,024 by 1,024 pixels, thereby producing roughly 1.8 million tiles covering more than 700,000 square miles of terrain. The result was 1,000x more than the roughly 630 square miles that the DeepGlobe data set covered. To create segmentation masks from these road vectors, we simply rasterized each road vector to five pixels. Semantic segmentation labels tend to be pixel-perfect, but the labels we create with this heuristic are not. Roads vary in width and contour in ways that these rasterized vectors could not capture perfectly. Furthermore, roads in different regions around the globe are mapped from different satellite imagery sources and thus do not always align completely with the imagery we use for our training data. 
Using only the noisy labels that our data collection process generated, we were able to produce results competitive with many entrants in the DeepGlobe challenge. After fine-tuning the training data in the DeepGlobe challenge data set, our model achieved state-of-the-art results. 
What is more noteworthy than these fine-tuned results is that the model performs well on a global scale, even when trained only on OSM data. Most data sets available for training road segmentation models are heavily biased toward particular regions or levels of development. For example, the DeepGlobe roads data set contains data only from India, Indonesia, and Thailand, and the SpaceNet Road Extraction Challenge data set focuses only on major cities. The data set we created spans six continents and all levels of developments, providing much more data to train on than other available alternatives. To evaluate how larger, more diverse data sets affect the generalizability of our model, we evaluated our OSM-trained model as well as the DeepGlobe model (trained on DeepGlobe data). We evaluated both models on several other data sets (Las Vegas, Paris, Shanghai, etc. — see our paper for details), which are outside the geographic distribution of the DeepGlobe data set. Across these test sets, the mean Intersection over Union (IoU) score of the DeepGlobe model is 0.218, and the mean IoU score of the OSM-trained model is 0.355. These scores give us a 62 percent relative improvement and a 13.7 percent absolute improvement.

沒有留言:

張貼留言