SoilStats demo

With soilstats, you can easily retrieve datasets from the SoilGrids REST API, and use them to calculate soil properties. This notebook shows how to do this.

Set up a collection

In this example, we want to collect 50 data points from within the grid (56.225297, 8.662215), (55.958103, 9.354390). In other words: the latitude boundaries are 55.958103 and 56.225297, and the longitude boundaries are 8.662215 and 9.354390.

The data we want to collect is for clay, sand, silt, and ocs, and we want to collect it for the top layers of the soil (0-30cm). There are various depths available in the SoilGrids API that meet that range: 0-5cm, 5-15cm, 15-30cm, and 0-30cm. The value we are interested in is the mean.

To set up the collection, we use the SoilCollect class from soilstats:

[8]:
from soilstats import SoilCollect as sc

# Create a SoilCollect object
collect = sc(
   lat_bounds = [56.225297, 55.958103],
   lon_bounds = [8.662215, 9.354390],
   properties = ['clay', 'sand', 'silt', 'ocs'],
   depths = ['0-5cm', '5-15cm', '15-30cm', '0-30cm'],
   values = 'mean',
   n = 50
)

This setup prepares the collection. We can manually verify the setup by looking at the URLs for each data point:

[9]:
[point.url for point in collect.soildatapoints[:10]]
[9]:
['https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.26880426534182&lat=56.04566881134826&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.008384609307548&lat=56.08287481810572&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.027193413968114&lat=56.16974421031949&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.0528787153327&lat=55.98670043562756&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.957115353878395&lat=55.95960560408361&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.693895094893287&lat=56.140225173641056&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.683851659390681&lat=56.21412249681987&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.863307945787344&lat=56.06065809848109&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.915427625733173&lat=55.965385739588434&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
 'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.023234399447304&lat=56.17977586161952&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean']

To make the call to the API, and retrieve the data, we use the get_data() method:

[ ]:
df = collect.get_data()

For several reasons, the API may not return data for a particular data point. In this case, we can add more data points to the collection, to make it complete. Afterwards, we can call get_data() again to make sure data from the new points is also retrieved:

[ ]:
collect.add_points(n=10)
df = collect.get_data()

Note that get_data() can be used to generate a dataframe, but also stores the data in the object collect. You can retrieve it with collect.df.

Calculate soil properties

We want to check for each point what the dominant soil type is. To do this, we check for the three properties sand, clay, and silt, which has the highest value for each point.

[23]:
top = collect.top_property(['clay', 'sand', 'silt'])

top
Data from 41 points. 9 out of 50 locations returned no data.
Run .add_points(n) to add more points to the SoilCollect object.
[23]:
lat lon units property values.mean
0 55.959606 8.957115 g/kg sand 884.0
1 55.965386 8.915428 g/kg sand 844.0
2 55.965386 8.933367 g/kg sand 843.0
3 55.986700 9.052879 g/kg sand 864.0
4 55.987445 9.331280 g/kg sand 783.0
5 55.995999 8.752333 g/kg sand 853.0
6 56.001327 9.216497 g/kg sand 856.0
7 56.022299 8.804207 g/kg sand 770.0
8 56.037555 8.891811 g/kg sand 823.0
9 56.043104 8.666003 g/kg sand 827.0
10 56.045669 9.268804 g/kg sand 849.0
11 56.047958 9.314837 g/kg sand 822.0
12 56.056133 9.218608 g/kg sand 847.0
13 56.060658 8.863308 g/kg sand 638.0
14 56.061782 9.333592 g/kg sand 813.0
15 56.069915 8.995612 g/kg sand 871.0
16 56.074669 8.940235 g/kg sand 867.0
17 56.077143 9.226066 g/kg sand 817.0
18 56.081898 8.954777 g/kg sand 861.0
19 56.082875 9.008385 g/kg sand 844.0
20 56.096027 9.278650 g/kg sand 810.0
21 56.106124 8.913891 g/kg sand 650.0
22 56.116758 8.938777 g/kg sand 742.0
23 56.140225 8.693895 g/kg sand 839.0
24 56.155668 8.748060 g/kg sand 858.0
25 56.158798 9.161999 g/kg sand 810.0
26 56.159170 8.870258 g/kg sand 830.0
27 56.165322 8.887687 g/kg sand 844.0
28 56.168373 9.008047 g/kg sand 883.0
29 56.169744 9.027193 g/kg sand 883.0
30 56.178598 9.309166 g/kg sand 756.0
31 56.179776 9.023234 g/kg sand 851.0
32 56.180252 8.770693 g/kg sand 784.0
33 56.187660 9.347266 g/kg sand 741.0
34 56.195798 9.289321 g/kg sand 787.0
35 56.202788 9.140720 g/kg sand 855.0
36 56.214122 8.683852 g/kg sand 787.0
37 56.214425 8.847385 g/kg sand 815.0
38 56.217701 9.247869 g/kg sand 818.0

Correlate soil properties

To investigate how soil properties correlate to each other, we set up a linear regression model.

The formula we use is clay + sand + silt ~ ocs. We use the regression method to set up the model.

[24]:
model = collect.regression(formula = "clay + sand + silt ~ ocs")
Data from 41 points. 9 out of 50 locations returned no data.
Run .add_points(n) to add more points to the SoilCollect object.

Call:
lm(formula = formula, data = df)

Residuals:
    Min      1Q  Median      3Q     Max
-14.859  -4.236  -1.505   4.303  28.218

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.016e+03  1.728e+01  58.785   <2e-16 ***
ocs         9.228e-02  2.842e-01   0.325    0.747
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.309 on 37 degrees of freedom
Multiple R-squared:  0.002842,  Adjusted R-squared:  -0.02411
F-statistic: 0.1055 on 1 and 37 DF,  p-value: 0.7472

Running the model should print summary statistics to the screen. However, the main statistics are also stored in the model’s stats attribute:

[25]:
model.stats
[25]:
{'r_squared': 0.002842255479948067,
 'intercept': 1015.9681202015358,
 'slope': 0.09228047024951902}