<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>GeoSpatial ML</title>
<link>https://geospatialml.com/</link>
<atom:link href="https://geospatialml.com/index.xml" rel="self" type="application/rss+xml"/>
<description>A blog about geospatial machine learning — remote sensing, earth observation, foundation models, and applied ML for understanding our planet.</description>
<generator>quarto-1.9.36</generator>
<lastBuildDate>Tue, 07 Apr 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Compressing Earth Embeddings, pt. 2 – TerraBit</title>
  <dc:creator>Isaac Corley</dc:creator>
  <dc:creator>Caleb Robinson</dc:creator>
  <link>https://geospatialml.com/posts/terrabit/</link>
  <description><![CDATA[ 





<section id="unfinished-business" class="level2">
<h2 class="anchored" data-anchor-id="unfinished-business">Unfinished business</h2>
<p><a href="../../posts/compressing-earth-embeddings/">Last time</a>, we compressed earth embeddings 64× with less than 2% loss on patch classification. We found int8 was statistically indistinguishable from float32 and that PCA(64)+int8 was the sweet spot. Binary quantization — reducing each dimension to its sign bit — achieved 16.5× end-to-end compression on disk (32× on the raw embedding payload alone), but we hadn’t yet measured retrieval quality at scale.</p>
<p>We were clear about what we didn’t test. From our limitations section:</p>
<blockquote class="blockquote">
<p>We have not tested: semantic segmentation, pixel regression, object detection, <strong>change detection</strong>, or <strong>retrieval</strong> — ranking quality over large databases may be more sensitive to distance distortion than top-1 classification.</p>
</blockquote>
<p>In other words, patch classification on <a href="https://github.com/phelber/EuroSAT">EuroSAT</a> is a controlled benchmark, not a real workflow. <strong>Can you actually do useful things with aggressively compressed embeddings?</strong> This time we work with <a href="https://clay-foundation.github.io/model/release-notes/specification.html">Clay v1.5</a> — a foundation model trained on multi-sensor satellite imagery — at global scale. <a href="https://lgnd.ai/">LGND</a> made the <a href="https://source.coop/clay/lgnd-clay-v1-5-sentinel-2-l2a">full global corpus available</a> in float32 on Source Cooperative, which gave us the raw material to test compression at scale.</p>
</section>
<section id="terrabit" class="level2">
<h2 class="anchored" data-anchor-id="terrabit">TerraBit</h2>
<video autoplay="" muted="" loop="" playsinline="" style="width:100%; border-radius:6px;">
<source src="terrabit-demo.mp4" type="video/mp4">
</video>
<p>To test this, we built <a href="https://isaac.earth/terrabit/">TerraBit</a> — a global retrieval demo that runs entirely in the browser with no backend or server-side computation. We binary-quantize the full Clay v1.5 corpus into packed bit vectors, store them as spatially-partitioned cloud-native <a href="https://parquet.apache.org/">Parquet</a> on public object storage, and let the browser handle shard discovery, data fetching, and in-memory Hamming scoring. The entire “backend” is a static S3 bucket; all compute happens on your machine.</p>
<p><strong>How it works:</strong></p>
<ol type="1">
<li>You draw one or more regions of interest (ROI) anywhere — each of these are loaded independently; regions can be rectangles or freehand polygons</li>
<li>You click to create exemplar patches on the map (one or many); positive exemplars outside the AOI have their embeddings fetched on the fly; negatives work anywhere on the globe for contrastive scoring (<code>pos_dist − neg_dist</code>); you can also invert search (bitwise NOT) to find the opposite of a reference!</li>
<li>DuckDB-WASM queries a manifest for intersecting geohash shards; only those shards are fetched via HTTP range requests — no full-corpus scan</li>
<li>A Web Worker scores all candidates with brute-force Hamming distance and returns ranked results</li>
<li>The results render via MapLibre GL across several view modes (<em>top-k, heatmap, threshold, outlier, surprise, gradient</em>)</li>
</ol>
<p>Multiple exemplars can be combined via mean distance, by applying bitwise <code>AND</code> / <code>OR</code> / <code>XOR</code> directly on the packed binary vectors before scoring — exact, lossless ops that compose semantically because binary embeddings have nice arithmetic properties.</p>
<video autoplay="" muted="" loop="" playsinline="" style="width:100%; border-radius:6px;">
<source src="terrabit-across-the-world-demo.mp4" type="video/mp4">
</video>
<p>The 50M embeddings are partitioned into geohash-aligned Parquet shards and published on <a href="https://source.coop/geospatialml/terrabit">Source Cooperative</a>, which serves them cloud-natively out of S3 — public HTTP with byte-range support, no egress fees, no intermediate server. A single manifest file records the path, row count, and spatial extent of every shard.</p>
<p>When you draw an ROI, <a href="https://duckdb.org/docs/api/wasm/overview.html">DuckDB-WASM</a> queries the manifest with a bounding-box predicate — manifest-based shard pruning: the manifest acts as a coarse spatial index so the browser never opens metadata on shards outside the ROI. Once the intersecting shard list is resolved, DuckDB streams those shard files over HTTP (via <a href="https://duckdb.org/docs/extensions/httpfs/overview.html"><code>httpfs</code></a> range requests) and applies a second filter at the row level — a bbox predicate for rectangles, or <code>ST_Intersects</code> for freehand polygons — to extract only patches within the drawn region. Ranking over the candidate slice is exact brute-force Hamming: binary embeddings arrive as packed <code>Uint8Array</code> columns (128 bytes per 1024-dim vector) and are scored in a Web Worker via XOR+<a href="https://nimrod.blog/posts/algorithms-behind-popcount/">popcount</a>, which maps directly to hardware-accelerated popcount instructions and completes in milliseconds for a typical AOI partition.</p>
<p>The binary embeddings are lossy though — we find they have ~65% <a href="https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Recall">recall@10</a> (the fraction of true float32 nearest neighbors recovered by the binary representation) which means roughly a third of true neighbors are missed (Figure&nbsp;2). Good enough for coarse exploration; not a claim about downstream curation or labeling productivity. How coarse is too coarse though? 65% recall goes further than you’d expect — <a href="https://isaac.earth/terrabit/">try the demo</a> on your own region!</p>
<p>A few examples of what this enables: a) click a center-pivot irrigation field in Kansas and separate it from rectangular fields across the state, b) pick a greenhouse cluster in Rotterdam and highlight dense greenhouse and vineyard complexes across the region or c) select a solar installation in northwest India and find others at similar scale. None of these queries require labeled data, a trained classifier, or even a definition of what you’re looking for beyond a single click. This is useful for data exploration, bootstrapping training datasets for supervised models, and narrowing the search space before running expensive high-resolution models over targeted areas. The demo also supports exporting ranked candidates as GeoParquet!</p>
<video autoplay="" muted="" loop="" playsinline="" style="width:100%; border-radius:6px;">
<source src="terrabit-places.mp4" type="video/mp4">
</video>
</section>
<section id="binary-earth-embedding-retrieval-at-planet-scale" class="level2">
<h2 class="anchored" data-anchor-id="binary-earth-embedding-retrieval-at-planet-scale">Binary Earth Embedding Retrieval at Planet Scale</h2>
<p>Clay v1.5 produces 1024-dimensional embeddings from <a href="https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2">Sentinel-2</a> imagery. The global corpus spans two years of observations — roughly 50 million embeddings covering Earth’s land surface — and is <strong>183 GiB</strong> on disk in ZSTD-compressed Parquet (≈190 GiB as raw float32 – float32s don’t compress well even if they come from a GeoFM). Serving float32 vectors at this scale to a browser isn’t viable; the question we ask is <strong>how aggressively you can compress without destroying retrieval quality.</strong></p>
<p>Binary quantization reduces each dimension to a single sign bit. 1024 floats (4,096 bytes) become 128 bytes — a 32× reduction on the raw payload. End-to-end on disk (Parquet with ZSTD, geometry and STAC metadata columns, row-group overhead), the full 49.8M-row corpus drops from 182.9 GiB to <strong>11.1 GiB</strong> — <strong>16.5× compression</strong>. The on-disk number is what you pay for on object storage (32× is the raw payload reduction). The web demo corpus is smaller still (~7 GiB) because several columns were dropped and the compression level was increased — a demo-specific optimization on top of the 16.5× quantization win.</p>
<div id="fig-storage" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-storage-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://geospatialml.com/posts/terrabit/storage.png" class="img-fluid figure-img" style="width:100.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-storage-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: On-disk storage of the full 49.8M Clay v1.5 corpus across quantization levels. fp32 → binary gives 16.5× end-to-end compression.
</figcaption>
</figure>
</div>
<div id="fig-knn-recall" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-knn-recall-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://geospatialml.com/posts/terrabit/knn_recall_k.png" class="img-fluid figure-img" style="width:100.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-knn-recall-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: kNN recall@k vs.&nbsp;on-disk compression ratio across quantization methods. int8 is near-lossless; binary hits 65% recall at 16.5×.
</figcaption>
</figure>
</div>
<p>Why does aggressive quantization work at all on 1024-dimensional vectors? One diagnostic is the <a href="https://en.wikipedia.org/wiki/Intrinsic_dimension">intrinsic dimension</a> (ID) — the degrees of freedom the data actually uses, regardless of ambient dimensionality [<a href="https://www.nature.com/articles/s41598-017-11873-y">Facco et al., 2017</a>; <a href="https://proceedings.neurips.cc/paper/2004/hash/74934548253bcab8490ebd74afed7031-Abstract.html">Levina &amp; Bickel, 2004</a>]. This framing is directly motivated by <a href="https://arxiv.org/abs/2511.02101">Rao et al., 2025</a>, who find that geographic representations — despite operating in 256–512 dimensional spaces — compress to just 2–10 intrinsic dimensions, and that ID correlates with downstream task performance. <strong>For Clay v1.5 we estimate ID ≈ 13–17 (MLE: 17.0, TwoNN: 12.6, Local PCA: 17.0, on a 10k sample subset).</strong> Three estimators with different assumptions agree on a narrow range. Low ID is why aggressive compression is worth attempting — the data simply isn’t using most of its dimensions.</p>
</section>
<section id="turboquant-aka-rotate-before-you-quantize" class="level2">
<h2 class="anchored" data-anchor-id="turboquant-aka-rotate-before-you-quantize">TurboQuant aka rotate before you quantize</h2>
<p>Binary is the extreme end of the compression spectrum, and the retrieval demo uses it — but what if you need more recall than binary while keeping storage well below float32?</p>
<p>Standard affine quantization at low bit-widths (int2–int4) suffers from high variance disparity across embedding dimensions: some dimensions carry far more signal than others, and a uniform quantization grid wastes bits on low-variance dimensions while clipping high-variance ones. <a href="https://arxiv.org/abs/2504.19874">TurboQuant</a> fixes this by applying a fixed random orthogonal rotation <img src="https://latex.codecogs.com/png.latex?R%20%5Cin%20%5Cmathbb%7BR%7D%5E%7Bd%20%5Ctimes%20d%7D"> (sampled once from a Haar-distributed ensemble via QR decomposition) before symmetric affine quantization: <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bx%7D%20=%20R%5E%5Ctop%20Q_b(Rx)">. The rotation spreads variance across dimensions so no channel dominates the bit budget. <img src="https://latex.codecogs.com/png.latex?R"> is generated once, stored with the quantized embeddings, and reused for all queries — one matrix multiply at encode/decode, no retraining.</p>
<p>Earth embeddings have the same property: ID ≈ 13–17 in a 1024-d space leaves a lot of variance to redistribute.</p>
<p>We ran TurboQuant across bit-widths on the Clay v1.5 embeddings. The gains are largest at low precision and vanish at high precision:</p>
<div id="fig-turbo-vs-int" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-turbo-vs-int-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://geospatialml.com/posts/terrabit/turbo_vs_int.png" class="img-fluid figure-img" style="width:100.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-turbo-vs-int-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: TurboQuant vs.&nbsp;standard scalar quantization across bit-widths. The rotation provides the largest recall improvement at 2–3 bits, where inter-channel variance disparity hurts most. By int8, affine quantization is already near-lossless and the rotation adds nothing.
</figcaption>
</figure>
</div>
<div id="fig-pareto" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-pareto-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://geospatialml.com/posts/terrabit/pareto.png" class="img-fluid figure-img" style="width:100.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-pareto-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Quality–compression Pareto front across standard quantization methods, under both cosine and Euclidean ground truth (the two overlap almost exactly). Standard int4 is the sweet spot at ~6× on-disk compression and 91% recall@10; int2 is dominated by binary, which recovers some recall despite 16.5× compression.
</figcaption>
</figure>
</div>
<p>Practical takeaway: if binary recall is too coarse but you still want aggressive compression, TurboQuant at int2–int4 is worth trying first. <strong>At TurboQuant int4, 95% recall at ~6× on-disk compression (8× on the raw payload).</strong> By int8, the affine grid is fine enough on its own and the rotation adds nothing.</p>
<section id="search-throughput" class="level3">
<h3 class="anchored" data-anchor-id="search-throughput">Search throughput</h3>
<p>We also benchmarked brute-force kNN on a 1M-vector subset (1K queries, k=10) using <a href="https://github.com/facebookresearch/faiss">FAISS</a> on CPU and PyTorch’s <a href="https://docs.pytorch.org/docs/stable/generated/torch.cdist.html">torch.cdist</a> on an RTX 3090. While other bit-widths benefit from GPU acceleration, binary search is unreasonably fast on CPU thanks to SIMD acceleration.</p>
<div id="fig-search-benchmark" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-search-benchmark-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://geospatialml.com/posts/terrabit/search_benchmark.png" class="img-fluid figure-img" style="width:90.0%">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-search-benchmark-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Search throughput benchmark: brute-force kNN on 1M vectors (1K queries, k=10). Binary Hamming search dominates on CPU (800 QPS) thanks to hardware-accelerated popcount — roughly 16× faster than the dequantize-then-search path on CPU. Dequantized methods benefit from GPU acceleration (&gt;1,100 QPS on an RTX 3090, a 23–29× speedup over CPU). Recall tracks quantization fidelity independently of hardware.
</figcaption>
</figure>
</div>
</section>
</section>
<section id="why-not-just-build-a-backend" class="level2">
<h2 class="anchored" data-anchor-id="why-not-just-build-a-backend">Why not just build a backend?</h2>
<p>Reasonable reaction: “Cool demo, but real systems need a database and an API.” Maybe — but in geospatial ML, the gap between a working prototype and a deployed tool is almost all infrastructure: vector databases, REST APIs, auth, scaling, monitoring. Each layer is individually reasonable but collectively large enough of a barrier to prevent someone from shipping and maintaining a useful tool.</p>
<p>Furthermore, existing vector DBs primarily partition by embedding similarity; a small AOI query still touches shards scattered across the index with geospatial filtering applied AFTER the expensive approximate nearest neighbor (ANN) step. Getting geo-first partitioning right takes careful co-design, and no existing systems target zero-ops browser-native serving of a static corpus. Our approach sidesteps that: embeddings partitioned spatially by geohash, a manifest for shard pruning, and a throwaway Hamming scan.</p>
<p>To be clear, backends still have their place. Full-corpus ANN, multi-user serving, auth, and strict SLAs are backend territory. But <strong>for exploration and dataset curation, the barrier to useful interaction with embeddings should be as close to zero as possible, and for a lot of real problems, client-side is enough.</strong></p>
<p><strong>Links:</strong> <a href="https://isaac.earth/terrabit/">TerraBit retrieval demo</a> · <a href="https://source.coop/geospatialml/terrabit">binarized embedding corpus</a> · <a href="../../posts/compressing-earth-embeddings/">pt.&nbsp;1: Compressing Earth Embeddings</a></p>
<div style="font-size: 0.85em; color: gray;">
<p><strong>Acknowledgments.</strong> Thanks to <a href="https://www.linkedin.com/in/jeff-albrecht-5a2b86148">Jeff Albrecht</a> for his review and feedback on this post.</p>
</div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{corley2026,
  author = {Corley, Isaac and Robinson, Caleb},
  title = {Compressing {Earth} {Embeddings,} Pt. 2 -\/- {TerraBit}},
  date = {2026-04-07},
  url = {https://geospatialml.com/posts/terrabit/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-corley2026" class="csl-entry quarto-appendix-citeas">
Corley, Isaac, and Caleb Robinson. 2026. <span>“Compressing Earth
Embeddings, Pt. 2 -- TerraBit.”</span> April 7. <a href="https://geospatialml.com/posts/terrabit/">https://geospatialml.com/posts/terrabit/</a>.
</div></div></section></div> ]]></description>
  <category>embeddings</category>
  <category>quantization</category>
  <category>compression</category>
  <category>retrieval</category>
  <category>foundation-models</category>
  <category>sentinel-2</category>
  <category>clay</category>
  <category>browser</category>
  <category>demo</category>
  <guid>https://geospatialml.com/posts/terrabit/</guid>
  <pubDate>Tue, 07 Apr 2026 00:00:00 GMT</pubDate>
  <media:content url="https://geospatialml.com/posts/terrabit/thumbnail.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Compressing Earth Embeddings</title>
  <dc:creator>Caleb Robinson</dc:creator>
  <dc:creator>Isaac Corley</dc:creator>
  <link>https://geospatialml.com/posts/compressing-earth-embeddings/</link>
  <description><![CDATA[ 





<blockquote class="blockquote">
<p><strong>Update (2026-03-26):</strong> OlmoEarth-nano results throughout have been recomputed with properly normalized inputs. The initial version we released used unnormalized inputs, which significantly underestimated OlmoEarth-nano’s performance. Thanks Gabriel Tseng for flagging this issue!</p>
</blockquote>
<p>Foundation models like Tessera [1], OlmoEarth [2], and AlphaEarth [3] produce dense per-pixel embeddings from satellite imagery. With a kNN classifier or linear probe, you can do classification, change detection, or similarity search — no fine-tuning needed. The appeal here is that you can skip expensive image preprocessing and model inference, download some embeddings, then plug into your task. But the cost of actually storing these embedding products can get out of hand fast.</p>
<p><a href="https://isaac.earth/earth-embedding-products">Isaac’s recent survey</a> of earth embedding products [4] catalogued this growing ecosystem — AlphaEarth, Tessera, Clay, Major-TOM, MOSAIKS — and identified a common problem: <strong>at continental or global scale, embedding storage costs dwarf the compute savings that motivated precomputation in the first place.</strong> Distribution is fragmented across incompatible formats (COG, GeoParquet, raw NumPy), and there are no shared standards for tiling, CRS, or provenance. But the most fundamental issue is size.</p>
<section id="the-storage-problem" class="level2">
<h2 class="anchored" data-anchor-id="the-storage-problem">The storage problem</h2>
<p>Earth’s land surface covers about 150 million km^2. At Sentinel-2’s 10m resolution, that’s <a href="https://www.wolframalpha.com/input?i=land+area+of+the+world+%2F+%2810+meters+*+10+meters%29"><strong>1.5 trillion pixels</strong></a>. Multiply by embedding dimension and bytes per element:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Dims</th>
<th>Bytes/embedding</th>
<th>1 year (global)</th>
<th>S3 cost/year</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>DINOv3 ViT-L (float32)</td>
<td>1,024</td>
<td>4,096</td>
<td><strong>6.1 PB</strong></td>
<td><strong>$1.7M</strong></td>
</tr>
<tr class="even">
<td>DINOv3 ViT-L (int8)</td>
<td>1,024</td>
<td>1,024</td>
<td>1.5 PB</td>
<td>$424K</td>
</tr>
<tr class="odd">
<td>Tessera encoder (float32)</td>
<td>512</td>
<td>2,048</td>
<td>3.1 PB</td>
<td>$847K</td>
</tr>
<tr class="even">
<td>Tessera product (int8)</td>
<td>128</td>
<td>128</td>
<td>192 TB</td>
<td>$53K</td>
</tr>
<tr class="odd">
<td>OlmoEarth-nano (float32)</td>
<td>128</td>
<td>512</td>
<td>768 TB</td>
<td>$212K</td>
</tr>
<tr class="even">
<td>AEF (int8)</td>
<td>64</td>
<td>64</td>
<td>96 TB</td>
<td>$26K</td>
</tr>
</tbody>
</table>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/compressing-earth-embeddings/storage_costs.png" class="img-fluid figure-img" style="width:75.0%"></p>
<figcaption>Global storage for 1024-dimensional embeddings at 10m resolution under different compression schemes. The dashed line marks the annual Sentinel-2 archive volume (~3-4 PB/year). Float32 baseline exceeds the Sentinel-2 annual output; PCA(64)+int8 brings it under 100 TB.</figcaption>
</figure>
</div>
<p>For context, the entire Sentinel-2 archive — every L1C and L2A product collected since 2015 — was roughly 22 PB in 2022 [5] and exceeded 50 PB by mid-2025 [6]. The archive grows by 3-4 PB per year. <strong>A single year of 1024d float32 embeddings (6.1 PB) would exceed the annual Sentinel-2 data volume that produced them.</strong> The embeddings are larger than the source imagery.</p>
<blockquote class="blockquote">
<p>The embeddings are larger than the source imagery.</p>
</blockquote>
<p>And these are per-year numbers. AlphaEarth covers 2017-2025 (9 years). <a href="https://anil.recoil.org/notes/geotessera-python">Tessera plans the same</a>. Multi-year archives at these scales reach tens of petabytes even for compact models. So how much can you compress before the embeddings stop being useful?</p>
</section>
<section id="eo-representations-are-redundant" class="level2">
<h2 class="anchored" data-anchor-id="eo-representations-are-redundant">EO representations are redundant</h2>
<p>Two recent papers provide evidence that earth observation representations carry substantial redundancy.</p>
<p><strong>Model-level redundancy.</strong> Hackel et al.&nbsp;[7] applied post-hoc “slimming” to remote sensing foundation models — uniformly reducing the width of transformer layers after training. At just 1% of the original FLOPs, these models retained over 71% of their full-scale accuracy (relative retention). An ImageNet-trained MAE dropped below 10% relative retention under the same treatment. Intermediate model sizes sometimes <em>outperformed</em> the full model, suggesting the extra capacity adds noise rather than signal. If the intermediate representations are this redundant, the output embeddings are too.</p>
<p><strong>Image-level redundancy.</strong> Papazafeiropoulos et al.&nbsp;[8] applied patch-level masking during training and inference of a ViT model, retaining only a fraction of image patches. On BigEarthNet, 15% patch retention achieved 99.4% of baseline accuracy. Even segmentation tolerated 50% patch removal while recovering ~97% of full performance.</p>
<p>These results suggest that standard embedding compression methods — including quantization and dimensionality reduction — may be effective for remotely sensed data as well. So we tested it!</p>
</section>
<section id="experimental-setup" class="level2">
<h2 class="anchored" data-anchor-id="experimental-setup">Experimental setup</h2>
<p>We evaluate combinations of quantization (float32, int8, int4, int2, binary, ternary, product quantization) and dimensionality reduction (PCA, truncated SVD, random projection, feature selection) across 5 embedding models and 6 classification datasets.</p>
<p>All experiments in this section use <a href="https://github.com/phelber/eurosat">EuroSAT</a> [9] — a 10-class Sentinel-2 land cover dataset with 21,600 images — as the primary benchmark. We use the precomputed embeddings for AEF, OlmoEarth, and Tessera from Isaac’s <a href="https://github.com/isaaccorley/geopool">geopool</a> repository, and we compute DINOv3 and ResNet50 embeddings separately. Then, we validate our findings on 5 additional datasets (RESISC45 [10] and 4 GeoBench [11] benchmarks) with the DINOv3 and ResNet50 based embeddings in the cross-dataset section below.</p>
<section id="models" class="level3">
<h3 class="anchored" data-anchor-id="models">Models</h3>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Architecture</th>
<th>Dims</th>
<th>Bytes/emb</th>
<th>Pretraining</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AlphaEarth (AEF)</td>
<td>STP Encoder</td>
<td>64</td>
<td>256</td>
<td>3B+ multi-source EO obs.</td>
</tr>
<tr class="even">
<td>OlmoEarth-nano</td>
<td>Transformer</td>
<td>128</td>
<td>512</td>
<td>S1/S2/Landsat self-supervised</td>
</tr>
<tr class="odd">
<td>Tessera</td>
<td>Transformer</td>
<td>512</td>
<td>2,048</td>
<td>S1/S2 self-supervised</td>
</tr>
<tr class="even">
<td>DINOv3 ViT-L/16</td>
<td>Vision Transformer</td>
<td>1,024</td>
<td>4,096</td>
<td>SAT-493M (0.6m Maxar RGB)</td>
</tr>
<tr class="odd">
<td>ResNet50</td>
<td>CNN</td>
<td>2,048</td>
<td>8,192</td>
<td>ImageNet supervised</td>
</tr>
</tbody>
</table>
<p>DINOv3 ViT-L/16 uses Meta’s SAT-493M checkpoint [12] — a ViT-L distilled from the DINOv3 ViT-7B, trained on 493 million 0.6m Maxar RGB tiles. Tessera’s encoder outputs 512-dim embeddings, but the <a href="https://geotessera.readthedocs.io/">distributed product</a> compresses these to 128-dim int8 with per-pixel scale factors. Our experiments use the 512-dim encoder output via <a href="https://github.com/isaaccorley/geopool">geopool</a>.</p>
<p>We evaluate with <strong>kNN</strong> (k=5, cosine distance) and <strong>linear probes</strong> (logistic regression with tuned regularization). All quantization and reduction parameters are fit on training data only. DINOv3 results use mean-pooled patch tokens throughout unless noted.</p>
</section>
<section id="baselines" class="level3">
<h3 class="anchored" data-anchor-id="baselines">Baselines</h3>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Dims</th>
<th>B/emb</th>
<th>EuroSAT kNN</th>
<th>EuroSAT Linear</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AEF</td>
<td>64</td>
<td>256</td>
<td>94.5%</td>
<td>95.4%</td>
</tr>
<tr class="even">
<td>OlmoEarth-nano</td>
<td>128</td>
<td>512</td>
<td><strong>94.8%</strong></td>
<td>96.5%</td>
</tr>
<tr class="odd">
<td>Tessera</td>
<td>512</td>
<td>2,048</td>
<td>87.6%</td>
<td>94.2%</td>
</tr>
<tr class="even">
<td>DINOv3</td>
<td>1,024</td>
<td>4,096</td>
<td>94.5%</td>
<td><strong>98.0%</strong></td>
</tr>
<tr class="odd">
<td>ResNet50</td>
<td>2,048</td>
<td>8,192</td>
<td>92.6%</td>
<td>95.8%</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="int8-is-always-free" class="level2">
<h2 class="anchored" data-anchor-id="int8-is-always-free">int8 is always free</h2>
<p>The simplest compression: reduce each float32 value to int8. For each dimension, compute the min and max across the training set, then linearly map the range into 256 integer levels (4x compression).</p>
<table class="caption-top table">
<colgroup>
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
</colgroup>
<thead>
<tr class="header">
<th>Method</th>
<th>Bits</th>
<th>B/emb (1024d)</th>
<th>AEF</th>
<th>OlmoEarth-nano</th>
<th>Tessera</th>
<th>DINOv3</th>
<th>ResNet50</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>float32</td>
<td>32</td>
<td>4,096</td>
<td>94.5%</td>
<td>94.8%</td>
<td>87.6%</td>
<td>94.5%</td>
<td>92.6%</td>
</tr>
<tr class="even">
<td><strong>int8</strong></td>
<td><strong>8</strong></td>
<td><strong>1,024</strong></td>
<td><strong>94.6%</strong></td>
<td><strong>94.8%</strong></td>
<td><strong>87.8%</strong></td>
<td><strong>94.5%</strong></td>
<td><strong>92.5%</strong></td>
</tr>
<tr class="odd">
<td>int4</td>
<td>4</td>
<td>512</td>
<td>94.2%</td>
<td>94.4%</td>
<td>86.5%</td>
<td>94.4%</td>
<td>92.4%</td>
</tr>
<tr class="even">
<td>int2</td>
<td>2</td>
<td>256</td>
<td>91.7%</td>
<td>91.0%</td>
<td>84.7%</td>
<td>92.3%</td>
<td>—</td>
</tr>
<tr class="odd">
<td>binary</td>
<td>1</td>
<td>128</td>
<td>88.8%</td>
<td>90.8%</td>
<td>81.4%</td>
<td>91.8%</td>
<td>86.8%</td>
</tr>
</tbody>
</table>
<p><em>EuroSAT kNN accuracy. Bold row = statistically indistinguishable from float32 baseline.</em></p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/compressing-earth-embeddings/quant_comparison.png" class="img-fluid figure-img" style="width:75.0%"></p>
<figcaption>EuroSAT kNN accuracy under different quantization levels for each model. int8 is visually indistinguishable from float32 across all five models.</figcaption>
</figure>
</div>
<p>We find <strong>int8 is never statistically distinguishable from float32.</strong> McNemar’s test gives p &gt;= 0.12 for every model-dataset pair (smallest p = 0.12). The 95% bootstrap confidence interval for the accuracy difference is within +/-0.2% everywhere. <strong>There is no reason to store float32 embeddings.</strong></p>
<blockquote class="blockquote">
<p>There is no reason to store float32 embeddings.</p>
</blockquote>
<p>We also find int4 loses less than 1% for AEF and DINOv3. Binary quantization (1 bit per dimension, 32x compression) is worth a closer look — DINOv3 at 128 bytes still hits 91.8%! More on this below.</p>
</section>
<section id="most-embedding-dimensions-are-redundant" class="level2">
<h2 class="anchored" data-anchor-id="most-embedding-dimensions-are-redundant">Most embedding dimensions are redundant</h2>
<p>PCA variance analysis reveals different spectral structures across models:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>4d</th>
<th>8d</th>
<th>16d</th>
<th>32d</th>
<th>64d</th>
<th>256d</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AEF (64d)</td>
<td>57%</td>
<td>80%</td>
<td>91%</td>
<td>97%</td>
<td>100%</td>
<td>—</td>
</tr>
<tr class="even">
<td>OlmoEarth-nano (128d)</td>
<td>77%</td>
<td>88%</td>
<td>95%</td>
<td>98%</td>
<td>100%</td>
<td>—</td>
</tr>
<tr class="odd">
<td>Tessera (512d)</td>
<td>94%</td>
<td>98%</td>
<td>99%</td>
<td>100%</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr class="even">
<td>DINOv3 mean (1024d)</td>
<td>52%</td>
<td>64%</td>
<td>74%</td>
<td>82%</td>
<td>88%</td>
<td>97%</td>
</tr>
<tr class="odd">
<td>DINOv3 cls (1024d)</td>
<td>35%</td>
<td>46%</td>
<td>57%</td>
<td>67%</td>
<td>76%</td>
<td>91%</td>
</tr>
</tbody>
</table>
<p><em>Cumulative variance explained by top-k PCA components, fitted on EuroSAT training embeddings.</em></p>
<p>OlmoEarth-nano spreads its variance more broadly than Tessera, with 77% in 4 dimensions and needing 32 dimensions for 98%. DINOv3 distributes variance more evenly still, needing 256 dimensions for 97%.</p>
<p>DINOv3 spreads its variance across many dimensions, so you might expect it to compress poorly — if no dimension is dispensable, PCA can’t help. But DINOv3 at PCA(64)+int8 (6% of its original dimensions) still hits 93.1% kNN accuracy, only 1.4% below baseline. The dimensions PCA discards carry variance but apparently not much task-relevant information.</p>
</section>
<section id="combined-compression-the-pareto-frontier" class="level2">
<h2 class="anchored" data-anchor-id="combined-compression-the-pareto-frontier">Combined compression: the Pareto frontier</h2>
<p>The best configurations combine PCA with quantization — reduce dimensions first, then quantize:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Config</th>
<th>B/emb</th>
<th>EuroSAT kNN</th>
<th>EuroSAT Linear</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AEF</td>
<td>int8</td>
<td>64</td>
<td>94.6%</td>
<td>95.5%</td>
</tr>
<tr class="even">
<td>AEF</td>
<td>int4</td>
<td>32</td>
<td>94.2%</td>
<td>95.0%</td>
</tr>
<tr class="odd">
<td>DINOv3</td>
<td>int8</td>
<td>1,024</td>
<td>94.5%</td>
<td>98.0%</td>
</tr>
<tr class="even">
<td>DINOv3</td>
<td>int4</td>
<td>512</td>
<td>94.4%</td>
<td>97.9%</td>
</tr>
<tr class="odd">
<td>DINOv3</td>
<td>PCA(128)+int8</td>
<td>128</td>
<td>93.6%</td>
<td>97.3%</td>
</tr>
<tr class="even">
<td>DINOv3</td>
<td>PCA(64)+int8</td>
<td>64</td>
<td>93.1%</td>
<td>96.4%</td>
</tr>
<tr class="odd">
<td>DINOv3</td>
<td>PCA(32)+int8</td>
<td>32</td>
<td>92.4%</td>
<td>94.1%</td>
</tr>
<tr class="even">
<td>DINOv3</td>
<td>PCA(16)+int4</td>
<td>8</td>
<td>89.3%</td>
<td>90.5%</td>
</tr>
</tbody>
</table>
<p><em>EuroSAT accuracy. kNN: k=5, cosine. Linear: logistic regression, C tuned.</em></p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/compressing-earth-embeddings/pareto_knn.png" class="img-fluid figure-img" style="width:75.0%"></p>
<figcaption>Pareto frontiers on EuroSAT: storage cost vs.&nbsp;kNN accuracy (left) and linear probe accuracy (right). Each point is one compression configuration; lines trace the best accuracy at each storage budget.</figcaption>
</figure>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/compressing-earth-embeddings/pareto_linear.png" class="img-fluid figure-img" style="width:75.0%"></p>
<figcaption>Pareto frontier: storage cost vs.&nbsp;linear probe accuracy on EuroSAT.</figcaption>
</figure>
</div>
<p>The two plots tell different stories. For kNN, AEF dominates at every storage budget — its 64 dimensions are compact enough that int8 at 64 bytes is nearly unbeatable, and larger models can’t overcome the dimensionality tax. For linear probes, DINOv3 pulls ahead once budgets exceed ~16 bytes, because a trained classifier can exploit the richer representation even after PCA compression.</p>
<p><strong>PCA(64)+int8 at 64 bytes/embedding is the sweet spot for DINOv3</strong>: 64x compression with only 1.4% kNN loss and 96.4% linear accuracy. That brings a year of global DINOv3 embeddings from 6.1 PB down to 96 TB — the same footprint as AlphaEarth’s native int8 representation. Which model to choose depends on your task: kNN retrieval favors AEF, classification with a trained head favors DINOv3.</p>
</section>
<section id="binary-quantization-on-dinov3" class="level2">
<h2 class="anchored" data-anchor-id="binary-quantization-on-dinov3">Binary quantization on DINOv3</h2>
<p>DINOv3 loses only 2.7% kNN accuracy under binary quantization (1 bit per dimension, 32x compression), while AEF and Tessera lose 5.7-6.2%. We hypothesize that this might be due to:</p>
<ol type="1">
<li><p><strong>High dimensionality.</strong> 1,024 binary dimensions give 2^1024 possible codes — enormous capacity for separating 10 classes.</p></li>
<li><p><strong>Balanced dimensions.</strong> DINOv3’s dimensions are nearly symmetric around their means (average imbalance = 0.018). Each threshold bit carries close to 1 bit of entropy. OlmoEarth-nano is also well-balanced (0.052), while AEF’s higher imbalance (0.082) means many bits are nearly constant.</p></li>
</ol>
<p>A related finding with the binary quantizations: <strong>Hamming distance on raw bits outperforms reconstructing float32 vectors and computing cosine distance.</strong> The reconstruction step replaces each bit with a centroid value (the mean of all above-threshold or below-threshold values for that dimension). We find that KNN with a Hamming distance (count the differing bits between the two vectors) outperforms using cosine distance on the reconstructed vectors. This seems to preserve the ranking of neighbor distances better:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
<col style="width: 20%">
</colgroup>
<thead>
<tr class="header">
<th>Model</th>
<th>Dims</th>
<th>Baseline</th>
<th>Reconstructed + cosine</th>
<th>Hamming on raw bits</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>DINOv3</td>
<td>1,024</td>
<td>94.5%</td>
<td>91.8%</td>
<td><strong>93.2%</strong></td>
</tr>
<tr class="even">
<td>Tessera</td>
<td>512</td>
<td>87.6%</td>
<td>81.4%</td>
<td><strong>87.0%</strong></td>
</tr>
<tr class="odd">
<td>OlmoEarth-nano</td>
<td>128</td>
<td>94.8%</td>
<td>90.8%</td>
<td><strong>92.7%</strong></td>
</tr>
<tr class="even">
<td>AEF</td>
<td>64</td>
<td>94.5%</td>
<td>88.8%</td>
<td><strong>89.1%</strong></td>
</tr>
</tbody>
</table>
<p>Hamming distance is also significantly faster to compute than cosine distance on reconstructed vectors — it reduces to a popcount on XOR’d bit vectors.</p>
</section>
<section id="cross-dataset-consistency" class="level2">
<h2 class="anchored" data-anchor-id="cross-dataset-consistency">Cross-dataset consistency</h2>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/compressing-earth-embeddings/pareto_all_datasets.png" class="img-fluid figure-img" style="width:75.0%"></p>
<figcaption>Pareto frontiers for kNN accuracy vs.&nbsp;storage cost across all 6 datasets. The relative ordering of models is consistent, though absolute accuracy varies with task difficulty.</figcaption>
</figure>
</div>
<p>The compression patterns hold across all 6 datasets:</p>
<ul>
<li><strong>int8 is effectively lossless on all datasets</strong>, including the 45-class RESISC45.</li>
<li><strong>PCA(64)+int8 at 64 bytes</strong> gives 93.1% on EuroSAT (10 classes) and 82.0% on RESISC45 (45 classes) — proportionally similar retention.</li>
<li><strong>m-forestnet</strong> (deforestation driver classification) is the hardest task at ~40% kNN for DINOv3 and ~36% for ResNet50 — likely because RGB-only embeddings lose the spectral bands needed for this task.</li>
</ul>
</section>
<section id="per-class-failure-modes" class="level2">
<h2 class="anchored" data-anchor-id="per-class-failure-modes">Per-class failure modes</h2>
<p>Under aggressive compression, <strong>specific classes are disproportionately affected</strong>:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Config (B/emb)</th>
<th>Worst class</th>
<th>F1 drop</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AEF</td>
<td>int4 (32 B)</td>
<td>Highway</td>
<td>-0.012</td>
</tr>
<tr class="even">
<td>AEF</td>
<td>binary (8 B)</td>
<td>Highway</td>
<td>-0.170</td>
</tr>
<tr class="odd">
<td>AEF</td>
<td>PCA(8)+binary (1 B)</td>
<td>Highway</td>
<td>-0.486</td>
</tr>
<tr class="even">
<td>OlmoEarth-nano</td>
<td>PCA(8)+binary (1 B)</td>
<td>PermanentCrop</td>
<td>-0.570</td>
</tr>
<tr class="odd">
<td>Tessera</td>
<td>PCA(8)+binary (1 B)</td>
<td>PermanentCrop</td>
<td>-0.479</td>
</tr>
</tbody>
</table>
<p>Highway and PermanentCrop are consistently the most affected — narrow categories relying on fine-grained spectral or spatial features that aggressive quantization destroys. If your application needs balanced per-class performance (rare category detection, e.g.), avoid extreme compression and verify per-class metrics.</p>
</section>
<section id="limitations" class="level2">
<h2 class="anchored" data-anchor-id="limitations">Limitations</h2>
<p>The main experiments here use EuroSAT — a 10-class patch classification dataset that most models find relatively easy (baselines already at 94-98%). We have some evidence from DINOv3 and ResNet50 results on RESISC45 (45 classes) and 4 GeoBench benchmarks that the core findings generalize across patch classification tasks — int8 is effectively lossless on all of them. But all 6 datasets are patch classification. We have not tested:</p>
<ul>
<li><strong>Semantic segmentation</strong> — pixel-level predictions may be more sensitive to per-dimension quantization error</li>
<li><strong>Pixel regression</strong> (e.g., canopy height, biomass estimation) — continuous targets could amplify small reconstruction errors that classification absorbs</li>
<li><strong>Object detection</strong> — localization accuracy may degrade differently than classification accuracy</li>
<li><strong>Change detection</strong> — differencing compressed embeddings across time steps could compound quantization noise</li>
<li><strong>Retrieval</strong> — ranking quality over large databases may be more sensitive to distance distortion than top-1 classification</li>
</ul>
<p>If you are saving embeddings for one of these tasks, we recommend validating compression effects on a representative sample before committing to a storage format.</p>
<p>We also only test OlmoEarth-nano (1.4M params, 128d) — the smallest model in the OlmoEarth family. The larger variants (Tiny at 192d, Base at 768d, Large at 1024d) may have different compression characteristics. And input normalization and patch size play a role in downstream performance that we haven’t disentangled from the compression effects here.</p>
</section>
<section id="takeaways" class="level2">
<h2 class="anchored" data-anchor-id="takeaways">Takeaways</h2>
<p>Some takeaways from these experiments (given the above caveat about patch classification):</p>
<ol type="1">
<li><p><strong>Always use int8.</strong> It is statistically indistinguishable from float32 across every model and dataset we tested (p &gt; 0.12). 4x compression, zero engineering effort, no reason not to.</p></li>
<li><p><strong>Check intrinsic dimensionality before storing.</strong> Many geospatial embeddings carry redundant dimensions. Tessera packs 94% of its variance into 4 dimensions; even DINOv3 can be PCA-reduced to 64d with only 1.5% kNN loss.</p></li>
<li><p><strong>PCA(64)+int8 is the sweet spot for DINOv3.</strong> 64 bytes/embedding, 64x compression, 1.4% kNN loss, 96.4% linear accuracy.</p></li>
<li><p><strong>For binary search indices, use Hamming distance directly on binary embeddings.</strong> Skip dequantization — it introduces correlated noise that hurts more than it helps.</p></li>
<li><p><strong>Don’t use ternary quantization.</strong> Binary is simpler, uses fewer bits, and performs better in every configuration we tested.</p></li>
<li><p><strong>Tune regularization (C) for linear probes.</strong> The default C=1.0 leaves performance on the table: Tessera gains 0.9% from C=10, DINOv3 gains 0.4% from C=0.1.</p></li>
<li><p><strong>Verify per-class metrics under compression.</strong> Highway and PermanentCrop degrade disproportionately — aggregate accuracy can mask category-level failures.</p></li>
</ol>
</section>
<section id="bibliography" class="level2">
<h2 class="anchored" data-anchor-id="bibliography">Bibliography</h2>
<p><a id="ref-tessera"></a> <strong>[1]</strong> Feng, Z., et al.&nbsp;“Tessera: Global-Scale Pixel Embeddings from Sentinel-2.” arXiv:2506.20380, 2025. <a href="https://arxiv.org/abs/2506.20380">[paper]</a> <a href="https://github.com/ucam-eo/tessera">[code]</a></p>
<p><a id="ref-olmoearth"></a> <strong>[2]</strong> Herzog, H., et al.&nbsp;“OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation.” arXiv:2511.13655, 2025. <a href="https://arxiv.org/abs/2511.13655">[paper]</a> <a href="https://github.com/allenai/olmoearth_pretrain">[code]</a></p>
<p><a id="ref-alphaearth"></a> <strong>[3]</strong> Brown, C.F., et al.&nbsp;“AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data.” arXiv:2507.22291, 2025. <a href="https://arxiv.org/abs/2507.22291">[paper]</a> <a href="https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL">[GEE catalog]</a></p>
<p><a id="ref-embproducts"></a> <strong>[4]</strong> Fang, H., et al.&nbsp;“Earth Embeddings as Products.” arXiv:2601.13134, 2026. <a href="https://arxiv.org/abs/2601.13134">[paper]</a> <a href="https://isaac.earth/earth-embedding-products">[blog]</a></p>
<p><a id="ref-wastingpb"></a> <strong>[5]</strong> Bauer-Marschallinger, B. and Falkner, K. “Wasting Petabytes: A Survey of the Sentinel-2 UTM Tiling Grid and its Spatial Overhead.” <em>ISPRS Journal of Photogrammetry and Remote Sensing</em>, 2023. <a href="https://doi.org/10.1016/j.isprsjprs.2023.07.015">[paper]</a></p>
<p><a id="ref-lps25"></a> <strong>[6]</strong> ESA. “Copernicus Sentinels Mission and Data Management.” Living Planet Symposium, 2025. <a href="https://lps25.esa.int/lps25-presentations/presentations/2505/_2505.pdf">[slides]</a></p>
<p><a id="ref-hackel"></a> <strong>[7]</strong> Hackel, L., Burgert, T., and Demir, B. “How Much of a Model Do We Need? Redundancy and Slimmability in Remote Sensing Foundation Models.” arXiv:2601.22841, 2026. <a href="https://arxiv.org/abs/2601.22841">[paper]</a></p>
<p><a id="ref-hideseek"></a> <strong>[8]</strong> Papazafeiropoulos, T., et al.&nbsp;“Hide and Seek: Investigating Redundancy in Earth Observation Imagery.” arXiv:2603.13524, 2026. <a href="https://arxiv.org/abs/2603.13524">[paper]</a></p>
<p><a id="ref-eurosat"></a> <strong>[9]</strong> Helber, P., et al.&nbsp;“EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification.” <em>IEEE JSTARS</em>, 2019. <a href="https://doi.org/10.1109/JSTARS.2019.2918242">[paper]</a></p>
<p><a id="ref-resisc"></a> <strong>[10]</strong> Cheng, G., Han, J., and Lu, X. “Remote Sensing Image Scene Classification: Benchmark and State of the Art.” <em>Proceedings of the IEEE</em>, 2017. <a href="https://doi.org/10.1109/JPROC.2017.2675998">[paper]</a></p>
<p><a id="ref-geobench"></a> <strong>[11]</strong> Lacoste, A., et al.&nbsp;“GEO-Bench: Toward Foundation Models for Earth Monitoring.” <em>NeurIPS</em>, 2023. <a href="https://arxiv.org/abs/2306.03831">[paper]</a></p>
<p><a id="ref-dinov3"></a> <strong>[12]</strong> Simeoni, O., et al.&nbsp;“DINOv3” arXiv:2508.10104, 2025. <a href="https://arxiv.org/abs/2508.10104">[paper]</a> <a href="https://github.com/facebookresearch/dinov3">[code]</a></p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{robinson2026,
  author = {Robinson, Caleb and Corley, Isaac},
  title = {Compressing {Earth} {Embeddings}},
  date = {2026-03-24},
  url = {https://geospatialml.com/posts/compressing-earth-embeddings/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-robinson2026" class="csl-entry quarto-appendix-citeas">
Robinson, Caleb, and Isaac Corley. 2026. <span>“Compressing Earth
Embeddings.”</span> March 24. <a href="https://geospatialml.com/posts/compressing-earth-embeddings/">https://geospatialml.com/posts/compressing-earth-embeddings/</a>.
</div></div></section></div> ]]></description>
  <category>embeddings</category>
  <category>quantization</category>
  <category>compression</category>
  <category>foundation-models</category>
  <guid>https://geospatialml.com/posts/compressing-earth-embeddings/</guid>
  <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://geospatialml.com/posts/compressing-earth-embeddings/storage_costs.png" medium="image" type="image/png" height="95" width="144"/>
</item>
<item>
  <title>Seeing the Roads Through the Trees: Do Segmentation Models Actually Use Long-Range Context?</title>
  <dc:creator>Caleb Robinson</dc:creator>
  <dc:creator>Isaac Corley</dc:creator>
  <link>https://geospatialml.com/posts/long-range-dependencies/</link>
  <description><![CDATA[ 





<p>How well do segmentation models actually use long-range spatial information to make decisions? No existing benchmark directly measures this, especially in remote sensing where most datasets can be solved with relatively local texture and color cues. This matters beyond any single task — remote sensing is full of cases where local appearance is ambiguous and the correct label depends on spatial context, from mapping flooded areas under tree canopy during disaster response to identifying informal settlements where the signal is the neighborhood-level pattern rather than any individual structure. In <a href="https://arxiv.org/abs/2401.06762">Seeing the Roads Through the Trees</a> we designed a dataset and metric to measure spatial reasoning directly, and found that standard CNN encoder-decoder models are generally bad at it. In this post we revisit the problem with transformer-based architectures and gradient-based receptive field analysis to understand <em>why</em>.</p>
<section id="the-dataset" class="level2">
<h2 class="anchored" data-anchor-id="the-dataset">The dataset</h2>
<p><a href="https://huggingface.co/datasets/torchgeo/ChesapeakeRSC">Chesapeake Roads Spatial Context (RSC)</a> contains 30,000 512x512 NAIP patches from Maryland with 4-band imagery (RGB + near-infrared) and labels for three classes: background, road, and tree canopy over road. The class balance is extreme — 96.3% background, 3.0% road, 0.7% tree canopy over road.</p>
<div style="display: flex; gap: 0.75rem; justify-content: center; max-width: 80%; margin: 0 auto;">
  <img src="https://geospatialml.com/posts/long-range-dependencies/example1.png" alt="Example patch from the dataset" style="width: 48%; height: auto;">
  <img src="https://geospatialml.com/posts/long-range-dependencies/example2.png" alt="Second example patch" style="width: 48%; height: auto;">
</div>
<figcaption style="text-align: center; font-size: 0.9em; color: #666; margin-top: 0.5rem;">Example patches from the dataset. <span style="color: #4a90d9; font-weight: bold;">Blue</span> = visible road pixels, <span style="color: #d94a4a; font-weight: bold;">red</span> = tree canopy over road. The model must classify both as "road," but the red pixels have no local evidence of being road.</figcaption>
<p>The idea is simple: roads pass under tree canopy, and when they do, the local appearance at those pixels looks like trees, not road. A model can only classify those pixels correctly by looking at nearby visible road segments and inferring that the road continues underneath. The distance from each tree-canopy-over-road pixel to the nearest visible road pixel has a median of 4 pixels but a 95th percentile of 107 pixels, so some of these inferences require connecting evidence across a large spatial span.</p>
<figure class="figure">
<img src="https://geospatialml.com/posts/long-range-dependencies/dataset_map.png" alt="Map of Maryland showing the distribution of 30,000 train, validation, and test patches" style="width: 80%;" class="figure-img">
<figcaption style="text-align: center; font-size: 0.9em; color: #666;">
Distribution of 30,000 train, validation, and test patches across Maryland.
</figcaption>
</figure>
<p>Other remote sensing datasets with roads (ISPRS Vaihingen/Potsdam, LandCover.ai, DeepGlobe, SpaceNet, RoadTracer) are strong tests of segmentation quality, topology, or connectivity, but none explicitly separate easy road pixels from locally ambiguous ones. Chesapeake RSC partitions the road class by spatial difficulty, which makes it possible to ask not just “how well does this model segment roads?” but “how far away can the model look to make a correct decision?”</p>
</section>
<section id="distance-weighted-recall" class="level2">
<h2 class="anchored" data-anchor-id="distance-weighted-recall">Distance-weighted recall</h2>
<p>To quantify how well a model uses spatial context, we introduced <strong>distance-weighted recall (DWR)</strong>. For each tree-canopy-over-road pixel, we measure its distance to the nearest visible road pixel, then weight the pixel’s contribution to recall by that distance. A model that only gets the easy nearby pixels right will have a high unweighted recall but a low DWR; a model that correctly classifies tree canopy far from any visible road will score much higher.</p>
</section>
<section id="theoretical-vs.-effective-receptive-fields" class="level2">
<h2 class="anchored" data-anchor-id="theoretical-vs.-effective-receptive-fields">Theoretical vs.&nbsp;effective receptive fields</h2>
<p>Every segmentation architecture has a <strong>theoretical receptive field (TRF)</strong> — the maximum region of the input that could influence a given output pixel, determined purely by kernel sizes, strides, and network depth. <a href="https://distill.pub/2019/computing-receptive-fields">Araujo et al.&nbsp;(2019)</a> give a clear treatment of how to compute this for convolutional networks.</p>
<p>The <strong>effective receptive field (ERF)</strong> is what the model actually uses. <a href="https://papers.nips.cc/paper/6203-understanding-the-effective-receptive-field-in-deep-convolutional-neural-networks">Luo et al.&nbsp;(2016)</a> showed that in deep CNNs the ERF is typically much smaller than the TRF and has a Gaussian-like concentration around the center pixel. A model can have a 527-pixel theoretical receptive field and still behave as though it only looks at a small local neighborhood. For transformers, self-attention gives a global TRF by construction, but global access does not automatically mean global use.</p>
</section>
<section id="models-and-results" class="level2">
<h2 class="anchored" data-anchor-id="models-and-results">Models and results</h2>
<p>We trained a U-Net with a ResNet-18 backbone (14M params, TRF of 527 pixels) and two SegFormer variants: MiT-B0 (3.7M params) and MiT-B2 (25M params), both with global TRFs via self-attention. <strong>All models were trained on a binary task</strong> (road vs.&nbsp;background, with canopy-over-road grouped into road) using AdamW, cosine annealing, and cross-entropy loss for 150 epochs.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Model</th>
<th>Params</th>
<th>Road R</th>
<th>Road P</th>
<th>TC/Road R</th>
<th>DWR</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>U-Net (ResNet-18)</td>
<td>14M</td>
<td>83.6</td>
<td>71.8</td>
<td>62.4</td>
<td><strong>44.0</strong></td>
</tr>
<tr class="even">
<td>U-Net (ResNet-18) + Cutout</td>
<td>14M</td>
<td>83.4</td>
<td>71.7</td>
<td>61.8</td>
<td>43.4</td>
</tr>
<tr class="odd">
<td>SegFormer (MiT-B0)</td>
<td>3.7M</td>
<td>83.1</td>
<td>71.7</td>
<td>58.9</td>
<td>37.9</td>
</tr>
<tr class="even">
<td>SegFormer (MiT-B2)</td>
<td>25M</td>
<td><strong>84.6</strong></td>
<td><strong>72.2</strong></td>
<td><strong>63.2</strong></td>
<td>42.3</td>
</tr>
</tbody>
</table>
<p><em>R = recall, P = precision, TC/Road R = recall on tree canopy over road subgroup. Background metrics omitted (all ~99.5%).</em></p>
<p>SegFormer MiT-B2 leads on overall metrics — best road recall (84.6%) and best tree canopy recall (63.2%). But the U-Net wins on DWR (44.0 vs 42.3), meaning it’s better at classifying tree canopy pixels that are far from visible road. The SegFormers’ ability to attend to distant tokens doesn’t translate into better performance on the spatially hardest pixels. This isn’t to say the U-Net is <em>good</em> at spatial reasoning (62.4% tree canopy recall is still a 21-point drop from visible road recall) — it’s that the ViT’s global attention doesn’t magically help here.</p>
<p>We also tested cutout augmentation — randomly masking 64x64 patches of the input during training — to force the model to reconstruct missing regions and improve spatial reasoning. We tested several cutout sizes and the story was the same: it doesn’t help. The variant shown here achieves 61.8% tree canopy recall, comparable to the baseline’s 62.4%.</p>
</section>
<section id="performance-degrades-with-distance" class="level2">
<h2 class="anchored" data-anchor-id="performance-degrades-with-distance">Performance degrades with distance</h2>
<p>For each tree-canopy-over-road pixel in the test set, we measure the distance to the nearest visible road pixel, bin into log-spaced groups, and compute recall within each bin.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/long-range-dependencies/recall_vs_distance.png" class="img-fluid figure-img" style="width:65.0%"></p>
<figcaption>Recall on tree canopy over road pixels as a function of distance from the nearest visible road pixel (log scale). Both models start at ~74-76% recall for adjacent pixels and decay monotonically.</figcaption>
</figure>
</div>
<p>Both models show monotonic performance degradation. At distance ~1 pixel, the U-Net achieves ~76% recall and the SegFormer ~73%. By ~100 pixels, both are in the 36-43% range. At 400+ pixels, recall falls to 20-28%. The U-Net outperforms the SegFormer MiT-B0 at every distance despite having a narrower effective receptive field.</p>
</section>
<section id="measuring-the-effective-receptive-field" class="level2">
<h2 class="anchored" data-anchor-id="measuring-the-effective-receptive-field">Measuring the effective receptive field</h2>
<p>The distance-stratified recall shows <em>that</em> models fail to use long-range context. Gradient-based ERF analysis shows <em>why</em>.</p>
<p>We computed gradient attributions by backpropagating from pre-softmax road logits to the input for 200 test images, then measured how gradient mass distributes as a function of radius from the output pixel. The effective diameter at a given percentile is the smallest circle centered on the output pixel that encloses that fraction of total gradient mass.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/long-range-dependencies/erf_cumulative_simple.png" class="img-fluid figure-img" style="width:65.0%"></p>
<figcaption>Cumulative gradient mass as a function of radius from the output pixel. The U-Net reaches 50% of its gradient mass within ~92 pixel radius; the SegFormer reaches 50% at ~146 pixels. Dashed lines mark 50th, 90th, and 99th percentile radii.</figcaption>
</figure>
</div>
<p>The U-Net concentrates half its gradient mass within a 184-pixel diameter circle despite having a 527-pixel theoretical receptive field. The SegFormer reaches 50% at 292 pixels — 1.6x wider, but the bulk of its attention (90%) still stays within a 542-pixel diameter.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/long-range-dependencies/erf_effective_diameter_road.png" class="img-fluid figure-img" style="width:65.0%"></p>
<figcaption>Effective receptive field diameter (in pixels) for road-class predictions at three percentile cutoffs. The SegFormer is 1.4-1.6x wider than the U-Net depending on percentile.</figcaption>
</figure>
</div>
<p>The road class has the widest ERF across all models, which suggests models do allocate more spatial attention for road-related predictions. But tree canopy ERFs are approximately equal to background ERFs — when a model needs to look farther to identify a canopy-covered road pixel, it doesn’t.</p>
</section>
<section id="gradient-attribution-interactive-explorer" class="level2">
<h2 class="anchored" data-anchor-id="gradient-attribution-interactive-explorer">Gradient attribution: interactive explorer</h2>
<p>The aggregate ERF statistics above summarize behavior across many pixels. The visualizer below lets you explore gradient attributions for individual predictions — hover over any 8x8 block to see which input pixels the model relies on for that block’s prediction. Toggle the mask overlay to see ground truth labels.</p>
<div id="gv-root">
    <div class="gv-controls">
        <div class="gv-sample-picker">
            <span class="gv-label">Sample</span>
            <div class="gv-thumbs" id="gv-thumbs"></div>
        </div>
        <div class="gv-model-picker">
            <span class="gv-label">Model</span>
            <div class="gv-radios">
                <label class="gv-radio">
                    <input type="radio" name="gv-model" value="unet" checked="">
                    U-Net (ResNet-18)
                </label>
                <label class="gv-radio">
                    <input type="radio" name="gv-model" value="segformer">
                    SegFormer (MIT-B0)
                </label>
            </div>
        </div>
    </div>
    <div class="gv-viewer">
        <div class="gv-image-container" id="gv-hover-area">
            <img id="gv-base-image" class="gv-base-image" alt="Input image" width="512" height="512">
            <img id="gv-mask-overlay" class="gv-mask-overlay" alt="Mask overlay" width="512" height="512">
            <canvas id="gv-gradient-overlay" class="gv-gradient-overlay" width="512" height="512"></canvas>
            <div id="gv-block-highlight" class="gv-block-highlight"></div>
            <div id="gv-loading" class="gv-loading-indicator">Loading…</div>
        </div>
        <div class="gv-info-panel" id="gv-info-panel">
            <div class="gv-info-header">Block Info</div>
            <div id="gv-info-rows">
                <p class="gv-placeholder">Hover over the image to see gradient attribution</p>
            </div>
            <div class="gv-slider-group">
                <label>
                    Gradient opacity: <span id="gv-opacity-val">75%</span>
                </label>
                <input type="range" id="gv-opacity" min="0" max="100" value="75">
            </div>
            <div class="gv-toggle-group">
                <input type="checkbox" id="gv-show-mask">
                <label for="gv-show-mask">Show mask overlay</label>
            </div>
            <div class="gv-toggle-group gv-mask-source-hidden" id="gv-mask-source-group">
                <label class="gv-radio">
                    <input type="radio" name="gv-mask-src" value="pred" checked="">
                    Prediction
                </label>
                <label class="gv-radio">
                    <input type="radio" name="gv-mask-src" value="gt">
                    Ground Truth
                </label>
            </div>
            <div class="gv-legend">
                <div class="gv-legend-title">Legend</div>
                <div class="gv-legend-item">
                    <span class="gv-swatch gv-swatch-bg"></span> Background
                </div>
                <div class="gv-legend-item">
                    <span class="gv-swatch gv-swatch-road"></span> Road
                </div>
                <div class="gv-legend-item">
                    <span class="gv-swatch gv-swatch-canopy"></span> Tree Canopy Over Road
                </div>
                <div class="gv-legend-item">
                    <span class="gv-swatch gv-swatch-gradient"></span> Gradient (inferno)
                </div>
            </div>
        </div>
    </div>
</div>
<style>
#gv-root img {
  margin: 0;
  border-radius: 0;
  max-width: 100%;
}

#gv-root {
  background: #111827;
  border-radius: 8px;
  padding: 20px;
  margin: 1.5rem 0;
  color: #e5e7eb;
  font-family: system-ui, -apple-system, sans-serif;
  font-size: 14px;
  width: 100vw;
  position: relative;
  left: 50%;
  transform: translateX(-50%);
  max-width: 1100px;
  box-sizing: border-box;
}

.gv-controls {
  display: flex;
  gap: 24px;
  margin-bottom: 16px;
  flex-wrap: wrap;
  align-items: flex-start;
}

.gv-label {
  display: block;
  font-size: 11px;
  color: #9ca3af;
  text-transform: uppercase;
  letter-spacing: 0.08em;
  margin-bottom: 6px;
}

.gv-thumbs {
  display: flex;
  gap: 6px;
  flex-wrap: wrap;
}

.gv-thumb {
  width: 64px;
  height: 64px;
  border-radius: 4px;
  border: 2px solid #374151;
  overflow: hidden;
  cursor: pointer;
  transition: border-color 0.15s;
  flex-shrink: 0;
}
.gv-thumb:hover { border-color: #6b7280; }
.gv-thumb.selected { border-color: #22d3ee; box-shadow: 0 0 8px rgba(34,211,238,0.3); }
.gv-thumb img {
  width: 100%;
  height: 100%;
  object-fit: cover;
  image-rendering: pixelated;
  display: block;
}

.gv-radios {
  display: flex;
  gap: 16px;
  flex-wrap: wrap;
}
.gv-radio {
  font-size: 13px;
  color: #9ca3af;
  cursor: pointer;
  display: flex;
  align-items: center;
  gap: 5px;
}
.gv-radio input { accent-color: #22d3ee; cursor: pointer; }

.gv-viewer {
  display: flex;
  gap: 20px;
  align-items: flex-start;
  flex-wrap: wrap;
}

.gv-image-container {
  position: relative;
  width: 512px;
  max-width: 100%;
  aspect-ratio: 1;
  border-radius: 6px;
  overflow: hidden;
  border: 1px solid #374151;
  cursor: crosshair;
  flex-shrink: 0;
  background: #000;
}
.gv-image-container img,
.gv-image-container canvas {
  position: absolute;
  top: 0; left: 0;
  width: 100%; height: 100%;
}
.gv-base-image { z-index: 1; image-rendering: pixelated; }
.gv-mask-overlay {
  z-index: 2; opacity: 0; pointer-events: none;
  image-rendering: pixelated; transition: opacity 0.2s;
}
.gv-gradient-overlay { z-index: 3; opacity: 0; pointer-events: none; }
.gv-block-highlight {
  position: absolute; z-index: 4;
  border: 1.5px solid #22d3ee; border-radius: 1px;
  pointer-events: none; display: none;
  box-shadow: 0 0 6px rgba(34,211,238,0.45);
}
.gv-loading-indicator {
  position: absolute; z-index: 5;
  top: 50%; left: 50%;
  transform: translate(-50%, -50%);
  color: #9ca3af; font-size: 14px;
  display: none;
}

.gv-info-panel {
  background: #1f2937;
  border-radius: 8px;
  padding: 16px 18px;
  min-width: 200px;
  flex: 1;
  max-width: 300px;
}
.gv-info-header {
  font-size: 11px;
  color: #9ca3af;
  text-transform: uppercase;
  letter-spacing: 0.08em;
  margin-bottom: 12px;
}
.gv-info-row {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 5px 0;
  border-bottom: 1px solid #111827;
  font-size: 13px;
}
.gv-info-row:last-child { border-bottom: none; }
.gv-info-label { color: #9ca3af; }
.gv-info-value { color: #fff; font-family: 'SF Mono', Consolas, monospace; font-size: 12px; }

.gv-placeholder { color: #6b7280; font-style: italic; font-size: 13px; padding: 10px 0; }

.gv-slider-group { margin-top: 14px; }
.gv-slider-group label { display: block; font-size: 12px; color: #9ca3af; margin-bottom: 4px; }
.gv-slider-group input[type=range] { width: 100%; accent-color: #22d3ee; cursor: pointer; }

.gv-toggle-group { margin-top: 10px; display: flex; align-items: center; gap: 8px; flex-wrap: wrap; }
.gv-toggle-group label { font-size: 13px; color: #9ca3af; cursor: pointer; }
.gv-toggle-group input[type=checkbox] { accent-color: #22d3ee; cursor: pointer; }

.gv-legend { margin-top: 14px; padding-top: 12px; border-top: 1px solid #374151; }
.gv-legend-title {
  font-size: 11px; color: #9ca3af; text-transform: uppercase;
  letter-spacing: 0.08em; margin-bottom: 8px;
}
.gv-legend-item { display: flex; align-items: center; gap: 8px; font-size: 13px; margin-bottom: 3px; color: #d1d5db; }
.gv-swatch { width: 14px; height: 14px; border-radius: 2px; border: 1px solid #4b5563; flex-shrink: 0; }
.gv-swatch-bg { background: #000; }
.gv-swatch-road { background: #22d3ee; }
.gv-swatch-canopy { background: #f59e0b; }
.gv-swatch-gradient { background: linear-gradient(90deg,#000004,#420a68,#932667,#dd513a,#fca50a,#fcffa4); }
.gv-mask-source-hidden { display: none; }

@media (max-width: 800px) {
  #gv-root { padding: 12px; }
  .gv-image-container { width: 100%; }
  .gv-info-panel { max-width: 100%; }
  .gv-viewer { flex-direction: column; }
}
</style>
<script>
(function() {
  var SAMPLES = ['1717', '2056', '2762', '6212', '8180', '8782'];
  var MODELS = {
    unet:      { suffix: '',                   label: 'U-Net (ResNet-18)' },
    segformer: { suffix: '_segformer-mit-b0',  label: 'SegFormer (MIT-B0)' }
  };
  var STORAGE = 'https://geospatialml.z5.web.core.windows.net/gradients';

  var currentSample = SAMPLES[0];
  var currentModel = 'unet';
  var meta = null;
  var gradCache = new Map();
  var metaCache = {};
  var inflight = new Set();
  var wantedPath = null;
  var currentKey = null;
  var preloadQueue = [];
  var gradOpacity = 0.75;
  var PRELOAD_RADIUS = 3;
  var MAX_PRELOADS = 4;

  var container   = document.getElementById('gv-hover-area');
  var baseImage   = document.getElementById('gv-base-image');
  var maskOvl     = document.getElementById('gv-mask-overlay');
  var gradCanvas  = document.getElementById('gv-gradient-overlay');
  var gradCtx     = gradCanvas.getContext('2d');
  var highlight   = document.getElementById('gv-block-highlight');
  var infoRows    = document.getElementById('gv-info-rows');
  var opacityEl   = document.getElementById('gv-opacity');
  var opacityVal  = document.getElementById('gv-opacity-val');
  var showMaskCb  = document.getElementById('gv-show-mask');
  var loadingEl   = document.getElementById('gv-loading');
  var thumbsEl    = document.getElementById('gv-thumbs');

  function getDir() {
    return STORAGE + '/' + currentSample + MODELS[currentModel].suffix + '/';
  }
  function pad3(n) { return String(n).padStart(3, '0'); }

  SAMPLES.forEach(function(id, i) {
    var div = document.createElement('div');
    div.className = 'gv-thumb' + (i === 0 ? ' selected' : '');
    div.dataset.id = id;
    var img = document.createElement('img');
    img.src = STORAGE + '/' + id + '/image.png';
    img.alt = 'Sample ' + id;
    img.loading = 'lazy';
    div.appendChild(img);
    div.addEventListener('click', function() {
      if (currentSample === id) return;
      currentSample = id;
      document.querySelectorAll('.gv-thumb').forEach(function(t) { t.classList.remove('selected'); });
      div.classList.add('selected');
      loadSample();
    });
    thumbsEl.appendChild(div);
  });

  document.querySelectorAll('input[name="gv-model"]').forEach(function(radio) {
    radio.addEventListener('change', function() {
      if (currentModel === this.value) return;
      currentModel = this.value;
      loadSample();
    });
  });

  document.querySelectorAll('input[name="gv-mask-src"]').forEach(function(radio) {
    radio.addEventListener('change', function() {
      var dir = getDir();
      maskOvl.src = this.value === 'gt' ? dir + 'gt_mask.png' : dir + 'mask.png';
    });
  });

  async function loadSample() {
    var dir = getDir();
    var cacheKey = currentSample + '|' + currentModel;
    gradCache.clear();
    inflight.clear();
    preloadQueue = [];
    currentKey = null;
    wantedPath = null;
    gradCtx.clearRect(0, 0, gradCanvas.width, gradCanvas.height);
    gradCanvas.style.opacity = '0';
    highlight.style.display = 'none';
    infoRows.innerHTML = '<p class="gv-placeholder">Hover over the image to see gradient attribution</p>';
    loadingEl.style.display = 'block';
    if (metaCache[cacheKey]) {
      meta = metaCache[cacheKey];
    } else {
      try {
        var resp = await fetch(dir + 'metadata.json');
        meta = await resp.json();
        metaCache[cacheKey] = meta;
      } catch(e) {
        loadingEl.textContent = 'Failed to load';
        return;
      }
    }
    gradCanvas.width = meta.image_shape[2];
    gradCanvas.height = meta.image_shape[1];
    baseImage.src = dir + 'image.png';
    maskOvl.src = dir + 'mask.png';
    var predRadio = document.querySelector('input[name="gv-mask-src"][value="pred"]');
    if (predRadio) predRadio.checked = true;
    var maskSrcGroup = document.getElementById('gv-mask-source-group');
    if (meta.has_gt_mask) {
      maskSrcGroup.classList.remove('gv-mask-source-hidden');
    } else {
      maskSrcGroup.classList.add('gv-mask-source-hidden');
    }
    baseImage.onload = function() { loadingEl.style.display = 'none'; };
  }

  function gradPath(r, c) {
    var ext = meta.gradient_format || 'png';
    return getDir() + pad3(r) + '/' + pad3(c) + '.' + ext;
  }

  function drawGradient(bitmap) {
    gradCtx.clearRect(0, 0, gradCanvas.width, gradCanvas.height);
    gradCtx.drawImage(bitmap, 0, 0, gradCanvas.width, gradCanvas.height);
    gradCanvas.style.opacity = gradOpacity;
  }

  function loadBitmap(path) {
    if (gradCache.has(path) || inflight.has(path)) return;
    inflight.add(path);
    fetch(path)
      .then(function(r) { return r.blob(); })
      .then(function(b) { return createImageBitmap(b); })
      .then(function(bmp) {
        gradCache.set(path, bmp);
        inflight.delete(path);
        if (path === wantedPath) drawGradient(bmp);
        pumpPreloads();
      })
      .catch(function() { inflight.delete(path); pumpPreloads(); });
  }

  function requestGradient(r, c) {
    var key = r + ',' + c;
    if (key === currentKey) return;
    currentKey = key;
    var path = gradPath(r, c);
    wantedPath = path;
    if (gradCache.has(path)) {
      drawGradient(gradCache.get(path));
      schedulePreload(r, c);
      return;
    }
    loadBitmap(path);
    schedulePreload(r, c);
  }

  function schedulePreload(r, c) {
    if (!meta) return;
    var GRID_R = meta.grid[0], GRID_C = meta.grid[1];
    var items = [];
    for (var dr = -PRELOAD_RADIUS; dr <= PRELOAD_RADIUS; dr++) {
      for (var dc = -PRELOAD_RADIUS; dc <= PRELOAD_RADIUS; dc++) {
        if (dr === 0 && dc === 0) continue;
        var nr = r + dr, nc = c + dc;
        if (nr >= 0 && nr < GRID_R && nc >= 0 && nc < GRID_C) {
          var p = gradPath(nr, nc);
          if (!gradCache.has(p) && !inflight.has(p)) {
            items.push({ path: p, dist: dr*dr + dc*dc });
          }
        }
      }
    }
    items.sort(function(a, b) { return a.dist - b.dist; });
    preloadQueue = items.map(function(i) { return i.path; });
    pumpPreloads();
  }

  function pumpPreloads() {
    while (inflight.size < MAX_PRELOADS && preloadQueue.length > 0) {
      var p = preloadQueue.shift();
      if (!gradCache.has(p) && !inflight.has(p)) loadBitmap(p);
    }
  }

  function updateInfo(r, c) {
    if (!meta) return;
    var BLOCK_SIZE = meta.block_size;
    var road = meta.block_pred[r][c];
    var cls = road > 0.5 ? 'Road' : 'Background';
    infoRows.innerHTML =
      '<div class="gv-info-row"><span class="gv-info-label">Row, Col</span><span class="gv-info-value">' + r + ', ' + c + '</span></div>' +
      '<div class="gv-info-row"><span class="gv-info-label">Pixel range</span><span class="gv-info-value">' +
        (r*BLOCK_SIZE) + '\u2013' + ((r+1)*BLOCK_SIZE-1) + ', ' +
        (c*BLOCK_SIZE) + '\u2013' + ((c+1)*BLOCK_SIZE-1) + '</span></div>' +
      '<div class="gv-info-row"><span class="gv-info-label">Pred class</span><span class="gv-info-value">' + cls + '</span></div>' +
      '<div class="gv-info-row"><span class="gv-info-label">Road prob</span><span class="gv-info-value">' + (road*100).toFixed(1) + '%</span></div>';
  }

  var raf = null;
  container.addEventListener('mousemove', function(e) {
    if (raf || !meta) return;
    raf = requestAnimationFrame(function() {
      raf = null;
      var rect = container.getBoundingClientRect();
      var x = (e.clientX - rect.left) / rect.width;
      var y = (e.clientY - rect.top) / rect.height;
      var GRID_R = meta.grid[0], GRID_C = meta.grid[1];
      var c = Math.max(0, Math.min(GRID_C-1, Math.floor(x * GRID_C)));
      var r = Math.max(0, Math.min(GRID_R-1, Math.floor(y * GRID_R)));
      var bw = rect.width / GRID_C;
      var bh = rect.height / GRID_R;
      highlight.style.left   = (c * bw) + 'px';
      highlight.style.top    = (r * bh) + 'px';
      highlight.style.width  = bw + 'px';
      highlight.style.height = bh + 'px';
      requestGradient(r, c);
      updateInfo(r, c);
    });
  });

  container.addEventListener('mouseenter', function() {
    highlight.style.display = 'block';
    if (currentKey) gradCanvas.style.opacity = gradOpacity;
  });

  container.addEventListener('mouseleave', function() {
    highlight.style.display = 'none';
    gradCanvas.style.opacity = '0';
    currentKey = null;
    wantedPath = null;
    preloadQueue = [];
    infoRows.innerHTML = '<p class="gv-placeholder">Hover over the image to see gradient attribution</p>';
  });

  container.addEventListener('touchmove', function(e) {
    e.preventDefault();
    if (!meta) return;
    var touch = e.touches[0];
    var rect = container.getBoundingClientRect();
    var x = (touch.clientX - rect.left) / rect.width;
    var y = (touch.clientY - rect.top) / rect.height;
    var GRID_R = meta.grid[0], GRID_C = meta.grid[1];
    var c = Math.max(0, Math.min(GRID_C-1, Math.floor(x * GRID_C)));
    var r = Math.max(0, Math.min(GRID_R-1, Math.floor(y * GRID_R)));
    var bw = rect.width / GRID_C;
    var bh = rect.height / GRID_R;
    highlight.style.left   = (c * bw) + 'px';
    highlight.style.top    = (r * bh) + 'px';
    highlight.style.width  = bw + 'px';
    highlight.style.height = bh + 'px';
    highlight.style.display = 'block';
    requestGradient(r, c);
    updateInfo(r, c);
  }, { passive: false });

  container.addEventListener('touchend', function() {
    highlight.style.display = 'none';
    gradCanvas.style.opacity = '0';
    currentKey = null;
    wantedPath = null;
    preloadQueue = [];
    infoRows.innerHTML = '<p class="gv-placeholder">Hover over the image to see gradient attribution</p>';
  });

  opacityEl.addEventListener('input', function() {
    gradOpacity = this.value / 100;
    opacityVal.textContent = this.value + '%';
    if (currentKey) gradCanvas.style.opacity = gradOpacity;
  });

  showMaskCb.addEventListener('change', function() {
    maskOvl.style.opacity = this.checked ? '0.55' : '0';
  });

  loadSample();
})();
</script>
</section>
<section id="whats-next" class="level2">
<h2 class="anchored" data-anchor-id="whats-next">What’s next</h2>
<p>Across the architectures we tested, the bottleneck appears to be the training signal, not the architecture. Switching from CNN to transformer, increasing model capacity (MiT-B0 -&gt; B2), and adding cutout augmentation all fail to substantially improve spatial reasoning on the hardest pixels. The binary cross-entropy loss treats all road pixels equally — it doesn’t reward the model for propagating information from distant visible road segments to occluded ones. Distance-aware loss functions or auxiliary connectivity tasks might provide a stronger learning signal.</p>
<p>Chesapeake RSC is a controlled version of a broader challenge in remote sensing, and the effective receptive field tools we use here apply directly to any task where local appearance is ambiguous and the correct label depends on spatial context.</p>
</section>
<section id="links" class="level2">
<h2 class="anchored" data-anchor-id="links">Links</h2>
<ul>
<li><strong>Paper</strong>: <a href="https://arxiv.org/abs/2401.06762">Seeing the roads through the trees (arXiv)</a></li>
<li><strong>Code</strong>: <a href="https://github.com/isaaccorley/ChesapeakeRSC">github.com/isaaccorley/ChesapeakeRSC</a></li>
<li><strong>Dataset</strong>: <a href="https://huggingface.co/datasets/torchgeo/ChesapeakeRSC">torchgeo/ChesapeakeRSC on HuggingFace</a></li>
</ul>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{robinson2026,
  author = {Robinson, Caleb and Corley, Isaac},
  title = {Seeing the {Roads} {Through} the {Trees:} {Do} {Segmentation}
    {Models} {Actually} {Use} {Long-Range} {Context?}},
  date = {2026-03-17},
  url = {https://geospatialml.com/posts/long-range-dependencies/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-robinson2026" class="csl-entry quarto-appendix-citeas">
Robinson, Caleb, and Isaac Corley. 2026. <span>“Seeing the Roads Through
the Trees: Do Segmentation Models Actually Use Long-Range
Context?”</span> March 17. <a href="https://geospatialml.com/posts/long-range-dependencies/">https://geospatialml.com/posts/long-range-dependencies/</a>.
</div></div></section></div> ]]></description>
  <category>receptive-fields</category>
  <category>spatial-context</category>
  <category>road-segmentation</category>
  <category>chesapeake-rsc</category>
  <category>semantic-segmentation</category>
  <guid>https://geospatialml.com/posts/long-range-dependencies/</guid>
  <pubDate>Tue, 17 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://geospatialml.com/posts/long-range-dependencies/example1.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Characterizing Census Blocks with Satellite Embedding Statistics</title>
  <dc:creator>Caleb Robinson</dc:creator>
  <dc:creator>Isaac Corley</dc:creator>
  <link>https://geospatialml.com/posts/aef-census-block-embeddings/</link>
  <description><![CDATA[ 





<p>How can you join AEF embeddings to census blocks, and how well do they predict different variables? We wrote a <a href="https://gist.github.com/calebrob6/e71adbc64a94e362ec7c251e4fbc5223">script</a> for doing this! We find, for example, that statistics of AEF embeddings can differentiate between urban and rural blocks in Washington with <strong>92.5% accuracy</strong> using a simple logistic regression.</p>
<p>There’s a growing ecosystem of <a href="https://isaac.earth/earth-embedding-products">pixel-level embedding products</a> covering the entire planet — AEF, Clay, Prithvi, and others. These are potentially powerful features for research well beyond remote sensing: sociology, demography, public health, economics — any field that works with administrative boundaries. But there’s still a high technical barrier to actually <em>using</em> them. Going from a wall of raster tiles to a clean feature table keyed by census tract or district requires spatial joins, CRS wrangling, and careful aggregation.</p>
<p>This post is a practical, end-to-end example of how to do exactly that. We take <a href="https://source.coop/tge-labs/aef">Alpha Earth Foundation (AEF)</a> embeddings from <a href="https://source.coop">Source Cooperative</a>, summarize them across the ~149K census blocks in Washington State, and see how well these purely satellite-derived features predict census variables like population density and urban/rural classification.</p>
<section id="the-data" class="level2">
<h2 class="anchored" data-anchor-id="the-data">The data</h2>
<p><strong>AEF embeddings</strong> are 64-dimensional vectors produced by a <a href="https://arxiv.org/abs/2507.22291">geospatial foundation model</a> for every 10m pixel on Earth. See the Google Earth Engine catalog page <a href="https://developers.google.com/earth-engine/datasets/catalog/GOOGLE_SATELLITE_EMBEDDING_V1_ANNUAL">here</a>. They capture land use, land cover, vegetation, and built environment characteristics from satellite imagery. The data is distributed as cloud-optimized GeoTIFFs tiled at 8192x8192 pixels in UTM projection, with a <a href="https://data.source.coop/tge-labs/aef/v1/annual/aef_index.gpkg">GeoPackage spatial index</a> mapping tile footprints to file paths across years 2018-2025.</p>
<p><strong>Census block boundaries</strong> come from the 2020 US Census <a href="https://www.census.gov/cgi-bin/geo/shapefiles/index.php">TIGER/Line shapefiles</a> — the finest-grained census geography, with attributes like population (<code>POP20</code>), housing units (<code>HOUSING20</code>), land/water area, and an urban/rural flag (<code>UR20</code>).</p>
</section>
<section id="method" class="level2">
<h2 class="anchored" data-anchor-id="method">Method</h2>
<section id="step-1-compute-per-block-embedding-statistics" class="level3">
<h3 class="anchored" data-anchor-id="step-1-compute-per-block-embedding-statistics">Step 1: Compute per-block embedding statistics</h3>
<p>For each census block, we compute the <strong>mean</strong> and <strong>standard deviation</strong> of each of the 64 AEF embedding dimensions across all valid 10m pixels within the block for 2020. This produces a 128-dimensional feature vector per block (64 means + 64 stdevs).</p>
<p>Before processing, we filter out blocks that would be uninformative or expensive:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Filter</th>
<th>Blocks removed</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>All-water (<code>ALAND20 == 0</code>)</td>
<td>8,272 (5.2%)</td>
<td>No land pixels to sample</td>
</tr>
<tr class="even">
<td>Oversized (<code>ALAND20 &gt; 25 km^2</code>)</td>
<td>1,138 (0.7%)</td>
<td>Too expensive to mask at 10m resolution</td>
</tr>
<tr class="odd">
<td><strong>Total kept</strong></td>
<td><strong>148,683 (94.0%)</strong></td>
<td>Covers 99.6% of WA’s population</td>
</tr>
</tbody>
</table>
<p>The pipeline spatial-joins blocks with the AEF tile index, downloads the needed tiles (44 tiles, ~34 GB for Washington in 2020), then masks and aggregates pixel values per block using multithreaded I/O. See <a href="https://gist.github.com/calebrob6/e71adbc64a94e362ec7c251e4fbc5223#file-compute_aef_block_stats-py"><code>compute_aef_block_stats.py</code></a> for the full script.</p>
</section>
<section id="step-2-pca-visualization" class="level3">
<h3 class="anchored" data-anchor-id="step-2-pca-visualization">Step 2: PCA visualization</h3>
<p>To visualize the embedding space geographically, we fit a 3-component PCA on the 128-dimensional block feature vectors, scale each component to uint8, and rasterize block polygons at 10m resolution into a 3-band GeoTIFF. The resulting RGB composite shows blocks with similar land use in similar colors. See <a href="https://gist.github.com/calebrob6/e71adbc64a94e362ec7c251e4fbc5223#file-pca_rasterize-py"><code>pca_rasterize.py</code></a> for the full script.</p>
<p>Here’s what the PCA-3 RGB rendering looks like across all of Washington State:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/aef-census-block-embeddings/washington_state_block_pca.png" class="img-fluid figure-img" style="width:85.0%"></p>
<figcaption>PCA-3 RGB rendering of AEF embedding statistics across Washington State census blocks. Color differences reflect differences in the embedding space — similar land use appears in similar colors.</figcaption>
</figure>
</div>
<p>And zoomed into the Seattle/Bellevue metro area, where urban structure is clearly visible at the block level:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/aef-census-block-embeddings/seattle_block_pca.png" class="img-fluid figure-img" style="width:80.0%"></p>
<figcaption>PCA-3 RGB rendering zoomed into Seattle/Bellevue. The dense urban core, suburban neighborhoods, and surrounding forests are clearly differentiated by color.</figcaption>
</figure>
</div>
</section>
<section id="step-3-correlation-with-census-variables" class="level3">
<h3 class="anchored" data-anchor-id="step-3-correlation-with-census-variables">Step 3: Correlation with census variables</h3>
<p>We joined the embedding statistics with census block attributes and tested predictiveness using simple linear models.</p>
<p>We tested two feature representations:</p>
<ul>
<li><strong>128-dim</strong>: The raw 64 per-band means + 64 per-band standard deviations</li>
<li><strong>PCA-10</strong>: A 10-component PCA capturing 79.0% of total variance</li>
</ul>
</section>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">Results</h2>
<section id="linear-regression-r2" class="level3">
<h3 class="anchored" data-anchor-id="linear-regression-r2">Linear regression R^2</h3>
<p>We fit ordinary least squares on all 148,683 blocks and report in-sample R^2. No train/test split here — these numbers are upper bounds on what a linear model can extract, meant to gauge the information content of the features rather than predict on held-out data:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Census Variable</th>
<th>R^2 (128-dim)</th>
<th>R^2 (PCA-10)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Log(land area)</td>
<td><strong>0.843</strong></td>
<td>0.665</td>
</tr>
<tr class="even">
<td>Urban/Rural</td>
<td><strong>0.740</strong></td>
<td>0.676</td>
</tr>
<tr class="odd">
<td>Log(population)</td>
<td>0.575</td>
<td>0.366</td>
</tr>
<tr class="even">
<td>Water fraction</td>
<td>0.414</td>
<td>0.073</td>
</tr>
<tr class="odd">
<td>Pop. density</td>
<td>0.380</td>
<td>0.260</td>
</tr>
<tr class="even">
<td>Housing density</td>
<td>0.299</td>
<td>0.165</td>
</tr>
<tr class="odd">
<td>Raw population</td>
<td>0.252</td>
<td>0.105</td>
</tr>
<tr class="even">
<td>Raw housing units</td>
<td>0.229</td>
<td>0.094</td>
</tr>
</tbody>
</table>
<p><strong>Log(land area)</strong> is the most predictable (R^2 = 0.84), which makes intuitive sense — block size directly corresponds to land cover homogeneity, and the embeddings capture this. <strong>Urban/rural</strong> is next at R^2 = 0.74, confirming that AEF embeddings strongly encode built environment characteristics. The gap between 128-dim and PCA-10 is also telling: compressing to 10 components loses a lot of signal for some variables (water fraction drops from 0.41 to 0.07), suggesting the tail dimensions carry real information.</p>
</section>
<section id="classification-accuracy" class="level3">
<h3 class="anchored" data-anchor-id="classification-accuracy">Classification accuracy</h3>
<p>Using <code>LogisticRegression</code>, now with a stratified 5-fold cross-validation (no block sees its own label during training in any given fold), we attempt to train a model to predict Urban vs.&nbsp;Rural:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Task</th>
<th>Baseline</th>
<th>Accuracy (128-dim)</th>
<th>Accuracy (PCA-10)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Urban vs.&nbsp;Rural</td>
<td>59.4%</td>
<td><strong>92.5% +/- 0.1%</strong></td>
<td>90.1% +/- 0.2%</td>
</tr>
</tbody>
</table>
<p>A logistic regression on AEF embedding statistics achieves <strong>92.5% accuracy</strong> at distinguishing urban from rural blocks — a 33 percentage point improvement over the majority-class baseline – using only statistics of pixel embeddings.</p>
</section>
<section id="pca-component-correlations" class="level3">
<h3 class="anchored" data-anchor-id="pca-component-correlations">PCA component correlations</h3>
<p>To interpret what the PCA components actually capture, we compute the Pearson correlation between each component’s per-block score and several census variables. Values near +1 or -1 indicate a strong linear relationship:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Variable</th>
<th>PC1</th>
<th>PC2</th>
<th>PC3</th>
<th>PC4</th>
<th>PC5</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Urban/Rural</td>
<td><strong>-0.71</strong></td>
<td>-0.34</td>
<td>-0.07</td>
<td>+0.13</td>
<td>+0.00</td>
</tr>
<tr class="even">
<td>Log(land area)</td>
<td><strong>+0.58</strong></td>
<td>+0.39</td>
<td>+0.09</td>
<td>-0.21</td>
<td>+0.01</td>
</tr>
<tr class="odd">
<td>Pop. density</td>
<td><strong>-0.42</strong></td>
<td>-0.21</td>
<td>-0.12</td>
<td>+0.03</td>
<td>-0.07</td>
</tr>
<tr class="even">
<td>Log(population)</td>
<td><strong>-0.40</strong></td>
<td>-0.01</td>
<td>-0.06</td>
<td>+0.29</td>
<td>+0.17</td>
</tr>
<tr class="odd">
<td>Housing density</td>
<td>-0.30</td>
<td>-0.16</td>
<td>-0.08</td>
<td>-0.01</td>
<td>-0.10</td>
</tr>
<tr class="even">
<td>Water fraction</td>
<td>+0.05</td>
<td>+0.06</td>
<td>+0.07</td>
<td>-0.08</td>
<td>-0.09</td>
</tr>
</tbody>
</table>
<p><strong>PC1</strong> is clearly an urban-to-rural axis (r = -0.71 with the urban flag), while <strong>PC2</strong> adds spatial scale information. The higher-order components pick up more nuanced variation — but even PC5 barely correlates with anything in the census, suggesting those dimensions capture land use patterns that don’t map neatly onto sociodemographic variables.</p>
</section>
</section>
<section id="takeaways" class="level2">
<h2 class="anchored" data-anchor-id="takeaways">Takeaways</h2>
<ol type="1">
<li><p><strong>AEF embeddings encode urban/rural character.</strong> While not very surprising, these can get 92.5% classification accuracy from a linear model alone.</p></li>
<li><p><strong>The full 128-dim representation is substantially richer than PCA-10.</strong> For log population, R^2 jumps from 0.37 to 0.58 — the higher-order embedding dimensions contain useful information.</p></li>
<li><p><strong>Block-level embedding statistics are a practical feature engineering approach.</strong> Mean and stdev per block compresses many 64-dim pixel vectors into a fixed 128-dim representation per geographic unit — simple enough to throw into any tabular ML pipeline. How to aggregate these was also the question of our recent paper, “From Pixels to Patches: Pooling Strategies for Earth Embeddings” (preprint <a href="https://arxiv.org/abs/2603.02080">here</a>).</p></li>
</ol>
</section>
<section id="reproduction" class="level2">
<h2 class="anchored" data-anchor-id="reproduction">Reproduction</h2>
<p>Both scripts are available in <a href="https://gist.github.com/calebrob6/e71adbc64a94e362ec7c251e4fbc5223">this gist</a>. The precomputed Washington State block-level embedding statistics are available as a GeoParquet file on <a href="https://huggingface.co/datasets/calebrob6/wa-block-aef-stats">Hugging Face</a> if you want to skip the ~34 GB download and play with the data instead.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download census blocks from:</span></span>
<span id="cb1-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># https://www.census.gov/cgi-bin/geo/shapefiles/index.php</span></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># (year 2025, layer "Blocks (2020)", state "Washington")</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Download AEF index</span></span>
<span id="cb1-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">curl</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-O</span> https://data.source.coop/tge-labs/aef/v1/annual/aef_index.gpkg</span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute block statistics (downloads ~34 GB of tiles)</span></span>
<span id="cb1-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">python3</span> compute_aef_block_stats.py <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--year</span> 2020 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--max-land-km2</span> 25 <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--output</span> wa_block_aef_stats.geoparquet</span>
<span id="cb1-12"></span>
<span id="cb1-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Quick test on Seattle/Bellevue area (~2 tiles)</span></span>
<span id="cb1-14"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">python3</span> compute_aef_block_stats.py <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-15">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--bbox</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-122.44</span> 47.49 <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-122.07</span> 47.73 <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb1-16">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--output</span> seattle_bellevue_aef_stats.geoparquet</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate PCA visualization</span></span>
<span id="cb1-19"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">python3</span> pca_rasterize.py wa_block_aef_stats.geoparquet <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-o</span> wa_pca.tif</span></code></pre></div></div>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{robinson2026,
  author = {Robinson, Caleb and Corley, Isaac},
  title = {Characterizing {Census} {Blocks} with {Satellite} {Embedding}
    {Statistics}},
  date = {2026-03-10},
  url = {https://geospatialml.com/posts/aef-census-block-embeddings/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-robinson2026" class="csl-entry quarto-appendix-citeas">
Robinson, Caleb, and Isaac Corley. 2026. <span>“Characterizing Census
Blocks with Satellite Embedding Statistics.”</span> March 10. <a href="https://geospatialml.com/posts/aef-census-block-embeddings/">https://geospatialml.com/posts/aef-census-block-embeddings/</a>.
</div></div></section></div> ]]></description>
  <category>embeddings</category>
  <category>census</category>
  <category>foundation-models</category>
  <category>pca</category>
  <guid>https://geospatialml.com/posts/aef-census-block-embeddings/</guid>
  <pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://geospatialml.com/posts/aef-census-block-embeddings/washington_state_block_pca.png" medium="image" type="image/png" height="93" width="144"/>
</item>
<item>
  <title>Training a Water Segmentation Model with TorchGeo</title>
  <dc:creator>Caleb Robinson</dc:creator>
  <dc:creator>Isaac Corley</dc:creator>
  <link>https://geospatialml.com/posts/torchgeo-iclr-tutorial/</link>
  <description><![CDATA[ 





<p>One notebook, a few hundred lines of Python, and you go from raw Sentinel-2 imagery to a georeferenced water map you can open in QGIS. That’s the premise of the <a href="https://torchgeo.readthedocs.io/en/stable/tutorials/earth_surface_water.html">TorchGeo tutorial</a> we put together for the <a href="https://ml-for-rs.github.io/iclr2026/">ICLR 2026 ML4RS Workshop</a> (<a href="https://arxiv.org/abs/2603.02386">paper</a>). It walks through the full earth observation (EO) ML workflow: loading multispectral data, training a semantic segmentation model on the <a href="https://zenodo.org/records/5205674">Earth Surface Water dataset</a>, and running gridded inference on a Sentinel-2 scene over Rio de Janeiro.</p>
<section id="why-satellite-imagery-isnt-just-big-computer-vision" class="level2">
<h2 class="anchored" data-anchor-id="why-satellite-imagery-isnt-just-big-computer-vision">Why satellite imagery isn’t just “big computer vision”</h2>
<p>If you’ve tried to plug satellite imagery into a standard computer vision pipeline, you’ve probably run into the friction. Imagery arrives as large georeferenced scenes (often with more than three bands), labels live in separate files with different coordinate reference systems (CRSs) and resolutions, and you can’t just <code>resize</code> and <code>normalize</code> your way to a training loop. Further, once you have a model you need to run inference across entire scenes, which requires stitching together predictions from overlapping tiles and saving the output as a georeferenced raster.</p>
<p>TorchGeo handles this by providing geospatial-aware datasets, samplers, and transforms that slot into standard PyTorch workflows. The key components are:</p>
<ul>
<li><strong>Composable datasets</strong> — use <code>|</code> (union) to mosaic tiles and <code>&amp;</code> (intersection) to pair imagery with labels, all lazily evaluated</li>
<li><strong>Geographic samplers</strong> — <code>RandomGeoSampler</code> for training and <code>GridGeoSampler</code> for inference, sampling in projected coordinates rather than pixel indices</li>
<li><strong>Windowed reads</strong> — no pre-tiling (assuming you have data in Cloud Optimized GeoTIFFs or other cloud native formats); TorchGeo reads only the pixels it needs from large rasters on demand</li>
</ul>
</section>
<section id="the-earth-surface-water-dataset" class="level2">
<h2 class="anchored" data-anchor-id="the-earth-surface-water-dataset">The Earth Surface Water dataset</h2>
<p>The <a href="https://zenodo.org/records/5205674">Earth Surface Water dataset</a> contains Sentinel-2 patches paired with binary water masks from diverse geographic regions. It’s a good fit for a tutorial because it’s small enough to train on quickly but realistic enough to show the full complexity of an EO workflow: patches span multiple UTM zones, the labels are raster masks in separate files, and the task (water vs.&nbsp;non-water) is easy to interpret visually.</p>
</section>
<section id="pairing-imagery-and-labels-across-utm-zones" class="level2">
<h2 class="anchored" data-anchor-id="pairing-imagery-and-labels-across-utm-zones">Pairing imagery and labels across UTM zones</h2>
<p>The tutorial constructs paired <code>RasterDataset</code> objects for imagery and masks, then combines them with TorchGeo’s intersection operator:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torchgeo.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> RasterDataset</span>
<span id="cb1-2"></span>
<span id="cb1-3">images <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> RasterDataset(paths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>image_dir, crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:3395"</span>, res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, transforms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>scale)</span>
<span id="cb1-4">masks <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> RasterDataset(paths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>mask_dir, crs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"EPSG:3395"</span>, res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb1-5">masks.is_image <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># use nearest-neighbor resampling for discrete labels</span></span>
<span id="cb1-6">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> images <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> masks</span></code></pre></div></div>
<p>Because the patches are distributed globally (often falling in different UTM zones), the notebook specifies a global CRS (World Mercator, EPSG:3395) so that all samples are consistently aligned during sampling and loading.</p>
</section>
<section id="from-6-bands-to-9-channels-with-spectral-indices" class="level2">
<h2 class="anchored" data-anchor-id="from-6-bands-to-9-channels-with-spectral-indices">From 6 bands to 9 channels with spectral indices</h2>
<p>Satellite data typically has more than three bands, which breaks standard vision preprocessing pipelines. The Earth Surface Water tutorial uses six Sentinel-2 bands — B02 (blue), B03 (green), B04 (red), B08 (NIR) at 10 m resolution, plus B11 and B12 (SWIR) at 20 m. Raw Sentinel-2 digital numbers are divided by 10,000 to convert to surface reflectance (a small detail that’s easy to forget and will silently wreck your training if you skip it).</p>
<p>From those 6 reflectance bands, the notebook computes three spectral indices using TorchGeo’s built-in transforms: NDWI (Normalized Difference Water Index, using green and NIR), MNDWI (Modified NDWI, using green and SWIR2), and NDVI (Normalized Difference Vegetation Index). The full preprocessing pipeline chains index computation and normalization in a single <code>Sequential</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> kornia.augmentation <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> K</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torchgeo.transforms <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> indices</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compute mean/std over training images for z-score normalization,</span></span>
<span id="cb2-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># then pad with 0s/1s so the 3 index channels pass through unchanged</span></span>
<span id="cb2-6">mean <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate([band_mean, [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]])</span>
<span id="cb2-7">std <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.concatenate([band_std, [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span>
<span id="cb2-8"></span>
<span id="cb2-9">tfms <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.nn.Sequential(</span>
<span id="cb2-10">    indices.AppendNDWI(index_green<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, index_nir<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>),   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># NDWI: (Green - NIR) / (Green + NIR)</span></span>
<span id="cb2-11">    indices.AppendNDWI(index_green<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, index_nir<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># MNDWI: (Green - SWIR2) / (Green + SWIR2)</span></span>
<span id="cb2-12">    indices.AppendNDVI(index_nir<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, index_red<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># NDVI: (NIR - Red) / (NIR + Red)</span></span>
<span id="cb2-13">    K.Normalize(mean<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>mean, std<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>std),</span>
<span id="cb2-14">)</span>
<span id="cb2-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Input: 6 bands,  Output: 9 channels (6 normalized bands + 3 indices)</span></span></code></pre></div></div>
<p>We pad the mean/std vectors with <code>[0, 0, 0]</code> and <code>[1, 1, 1]</code>, so that z-score normalization becomes a no-op for the index channels, which are already bounded [-1, 1] by construction.</p>
</section>
<section id="adapting-an-rgb-architecture-to-9-channels" class="level2">
<h2 class="anchored" data-anchor-id="adapting-an-rgb-architecture-to-9-channels">Adapting an RGB architecture to 9 channels</h2>
<p>The model is a DeepLabV3 with a ResNet-50 backbone from torchvision, trained from scratch — ImageNet-pretrained weights expect 3-channel RGB input, so they’re not useful here. The key adaptation is reinitializing the first convolutional layer to accept our 9 input channels:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torchvision.models.segmentation <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> deeplabv3_resnet50</span>
<span id="cb3-2"></span>
<span id="cb3-3">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> deeplabv3_resnet50(weights<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>, num_classes<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb3-4">backbone <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model.get_submodule(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"backbone"</span>)</span>
<span id="cb3-5">conv <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> torch.nn.Conv2d(</span>
<span id="cb3-6">    in_channels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span>, out_channels<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">64</span>,</span>
<span id="cb3-7">    kernel_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>), stride<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), padding<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), bias<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb3-8">)</span>
<span id="cb3-9">backbone.register_module(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"conv1"</span>, conv)</span></code></pre></div></div>
<p>The dataset ships with a pre-defined, geographically separated train/validation split — important for avoiding the over-optimistic metrics that spatial autocorrelation can cause in EO. Within each split, <code>RandomGeoSampler</code> draws 512x512 chips in geographic coordinate space, handling CRS alignment and resolution matching automatically. After 10 epochs with Adam (lr=1e-4, weight_decay=0.01) and a batch size of 4, the model reaches <strong>0.977 overall accuracy</strong> and <strong>0.824 IoU</strong> on the validation set. Training takes a few minutes on a single GPU.</p>
</section>
<section id="inference-on-a-sentinel-2-scene" class="level2">
<h2 class="anchored" data-anchor-id="inference-on-a-sentinel-2-scene">Inference on a Sentinel-2 scene</h2>
<p>This is the part of the tutorial where the model stops being a number on a leaderboard and starts being a useful tool! After training, the notebook downloads a Sentinel-2 scene over Rio de Janeiro, Brazil from the <a href="https://planetarycomputer.microsoft.com/">Microsoft Planetary Computer</a>, runs gridded inference across the entire tile, and finally saves the resulting predictions as a georeferenced GeoTIFF.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torchgeo.datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> Sentinel2</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> torchgeo.samplers <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GridGeoSampler</span>
<span id="cb4-3"></span>
<span id="cb4-4">s2_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Sentinel2(paths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>scene_dir, bands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>bands, res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, transforms<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>scale)</span>
<span id="cb4-5">grid_sampler <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GridGeoSampler(s2_dataset, size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>, stride<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">448</span>, units<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>Units.PIXELS)</span>
<span id="cb4-6">s2_dataloader <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> DataLoader(</span>
<span id="cb4-7">    s2_dataset, sampler<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>grid_sampler, batch_size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, collate_fn<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>stack_samples</span>
<span id="cb4-8">)</span></code></pre></div></div>
<p>The <code>GridGeoSampler</code> tiles the scene into overlapping 512x512 patches (stride=448, so 64 pixels of overlap on each edge). Predictions are stitched back together and saved as a GeoTIFF — tiled, compressed, with overviews — that is pixel-aligned with the input scene:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rasterio <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> rio</span>
<span id="cb5-2"></span>
<span id="cb5-3">profile <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {</span>
<span id="cb5-4">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"driver"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"GTiff"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dtype"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"uint8"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"count"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb5-5">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"width"</span>: img_width, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"height"</span>: img_height,</span>
<span id="cb5-6">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"crs"</span>: crs, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"transform"</span>: transform,</span>
<span id="cb5-7">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"compress"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"deflate"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tiled"</span>: <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb5-8">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blockxsize"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blockysize"</span>: <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">512</span>,</span>
<span id="cb5-9">}</span>
<span id="cb5-10"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">with</span> rio.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">open</span>(output_path, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"w"</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>profile) <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> dst:</span>
<span id="cb5-11">    dst.write(prediction, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb5-12">    dst.build_overviews([<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>], rio.enums.Resampling.nearest)</span></code></pre></div></div>
<p>The result is a georeferenced water mask that you can open in QGIS, load into a GIS pipeline, or overlay on the original scene.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/torchgeo-iclr-tutorial/rio_sentinel2.png" class="img-fluid figure-img" style="width:90.0%"></p>
<figcaption>Sentinel-2 true-color composite of Rio de Janeiro</figcaption>
</figure>
</div>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://geospatialml.com/posts/torchgeo-iclr-tutorial/rio_prediction.png" class="img-fluid figure-img" style="width:90.0%"></p>
<figcaption>Water segmentation predictions (blue) on the same scene</figcaption>
</figure>
</div>
<p>This step bridges the gap between “model that scores well on a test set” and “model that produces a useful geospatial product.” It also lets you explore the model’s behavior beyond aggregate metrics: How sharp are the predictions along coastlines? What’s the smallest water feature it can detect? Where does it fail?</p>
</section>
<section id="try-it-yourself" class="level2">
<h2 class="anchored" data-anchor-id="try-it-yourself">Try it yourself</h2>
<p>The tutorial is distributed as two executable notebooks, and all you need is a machine with a GPU (a Colab T4 works fine):</p>
<ul>
<li><a href="https://torchgeo.readthedocs.io/en/stable/tutorials/torchgeo.html">Introduction to TorchGeo</a> — core abstractions (dataset composition, spatiotemporal indexing, geographic samplers)</li>
<li><a href="https://torchgeo.readthedocs.io/en/stable/tutorials/earth_surface_water.html">Earth Surface Water</a> — the end-to-end case study described in this post</li>
</ul>
<p>For more detail on the design choices and motivation, see our <a href="https://arxiv.org/abs/2603.02386">ICLR 2026 ML4RS Workshop paper</a>. The tutorial also builds on Mauricio Cordeiro’s <a href="https://medium.com/towards-data-science/artificial-intelligence-for-geospatial-analysis-with-pytorchs-torchgeo-part-1-52d17e409f09">3-part Medium series</a> on geospatial analysis with TorchGeo. If you have questions or want to discuss, come find us in the <a href="https://torchgeo.slack.com/join/shared_invite/zt-22rse667m-eqtCeNW0yI000Tl4B~2PIw">TorchGeo Slack</a>.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{robinson2026,
  author = {Robinson, Caleb and Corley, Isaac},
  title = {Training a {Water} {Segmentation} {Model} with {TorchGeo}},
  date = {2026-03-02},
  url = {https://geospatialml.com/posts/torchgeo-iclr-tutorial/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-robinson2026" class="csl-entry quarto-appendix-citeas">
Robinson, Caleb, and Isaac Corley. 2026. <span>“Training a Water
Segmentation Model with TorchGeo.”</span> March 2. <a href="https://geospatialml.com/posts/torchgeo-iclr-tutorial/">https://geospatialml.com/posts/torchgeo-iclr-tutorial/</a>.
</div></div></section></div> ]]></description>
  <category>torchgeo</category>
  <category>tutorial</category>
  <category>semantic-segmentation</category>
  <category>sentinel-2</category>
  <category>iclr</category>
  <guid>https://geospatialml.com/posts/torchgeo-iclr-tutorial/</guid>
  <pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://geospatialml.com/posts/torchgeo-iclr-tutorial/rio_sentinel2.png" medium="image" type="image/png" height="69" width="144"/>
</item>
<item>
  <title>Welcome to GeoSpatial ML</title>
  <dc:creator>Caleb Robinson</dc:creator>
  <dc:creator>Isaac Corley</dc:creator>
  <link>https://geospatialml.com/posts/welcome/</link>
  <description><![CDATA[ 





<p>Welcome to <strong>GeoSpatial ML</strong> — a place to share what we’re exploring, building, and reading at the intersection of geospatial data and machine learning.</p>
<p>Many of us already swap papers, datasets, and half-baked experiments in the <a href="https://torchgeo.slack.com/join/shared_invite/zt-22rse667m-eqtCeNW0yI000Tl4B~2PIw">TorchGeo Slack</a>. This blog is an extension of those conversations — a more permanent home for the things we find interesting each week.</p>
<section id="what-to-expect" class="level2">
<h2 class="anchored" data-anchor-id="what-to-expect">What to expect</h2>
<ul>
<li><strong>Paper highlights</strong> — summaries and takes on new GeoAI / GeoML research we’re reading</li>
<li><strong>Code demos</strong> — small, reproducible experiments with <a href="https://github.com/microsoft/torchgeo">TorchGeo</a> and the broader geospatial ML ecosystem</li>
<li><strong>New models &amp; datasets</strong> — quick tours of recently released foundation models, benchmarks, and datasets worth trying</li>
<li><strong>Geospatial explorations</strong> — anything from satellite imagery tricks to fun visualizations to workflow tips</li>
</ul>
<p>Posts will be short and practical. If something is interesting enough to share in Slack, it’s interesting enough to write up here.</p>
<p>Stay tuned, and come hang out in <a href="https://torchgeo.slack.com/join/shared_invite/zt-22rse667m-eqtCeNW0yI000Tl4B~2PIw">TorchGeo Slack</a> if you haven’t already.</p>


</section>

 ]]></description>
  <category>meta</category>
  <guid>https://geospatialml.com/posts/welcome/</guid>
  <pubDate>Fri, 27 Feb 2026 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
