SoilStats demo
With soilstats, you can easily retrieve datasets from the SoilGrids REST API, and use them to calculate soil properties. This notebook shows how to do this.
Set up a collection
In this example, we want to collect 50 data points from within the grid (56.225297, 8.662215), (55.958103, 9.354390). In other words: the latitude boundaries are 55.958103 and 56.225297, and the longitude boundaries are 8.662215 and 9.354390.
The data we want to collect is for clay, sand, silt, and ocs, and we want to collect it for the top layers of the soil (0-30cm). There are various depths available in the SoilGrids API that meet that range: 0-5cm, 5-15cm, 15-30cm, and 0-30cm. The value we are interested in is the mean.
To set up the collection, we use the SoilCollect class from soilstats:
[8]:
from soilstats import SoilCollect as sc
# Create a SoilCollect object
collect = sc(
lat_bounds = [56.225297, 55.958103],
lon_bounds = [8.662215, 9.354390],
properties = ['clay', 'sand', 'silt', 'ocs'],
depths = ['0-5cm', '5-15cm', '15-30cm', '0-30cm'],
values = 'mean',
n = 50
)
This setup prepares the collection. We can manually verify the setup by looking at the URLs for each data point:
[9]:
[point.url for point in collect.soildatapoints[:10]]
[9]:
['https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.26880426534182&lat=56.04566881134826&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.008384609307548&lat=56.08287481810572&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.027193413968114&lat=56.16974421031949&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.0528787153327&lat=55.98670043562756&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.957115353878395&lat=55.95960560408361&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.693895094893287&lat=56.140225173641056&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.683851659390681&lat=56.21412249681987&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.863307945787344&lat=56.06065809848109&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=8.915427625733173&lat=55.965385739588434&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean',
'https://rest.isric.org/soilgrids/v2.0/properties/query?lon=9.023234399447304&lat=56.17977586161952&property=clay&property=sand&property=silt&property=ocs&depth=0-5cm&depth=5-15cm&depth=15-30cm&depth=0-30cm&value=mean']
To make the call to the API, and retrieve the data, we use the get_data() method:
[ ]:
df = collect.get_data()
For several reasons, the API may not return data for a particular data point. In this case, we can add more data points to the collection, to make it complete. Afterwards, we can call get_data() again to make sure data from the new points is also retrieved:
[ ]:
collect.add_points(n=10)
df = collect.get_data()
Note that get_data() can be used to generate a dataframe, but also stores the data in the object collect. You can retrieve it with collect.df.
Calculate soil properties
We want to check for each point what the dominant soil type is. To do this, we check for the three properties sand, clay, and silt, which has the highest value for each point.
[23]:
top = collect.top_property(['clay', 'sand', 'silt'])
top
Data from 41 points. 9 out of 50 locations returned no data.
Run .add_points(n) to add more points to the SoilCollect object.
[23]:
| lat | lon | units | property | values.mean | |
|---|---|---|---|---|---|
| 0 | 55.959606 | 8.957115 | g/kg | sand | 884.0 |
| 1 | 55.965386 | 8.915428 | g/kg | sand | 844.0 |
| 2 | 55.965386 | 8.933367 | g/kg | sand | 843.0 |
| 3 | 55.986700 | 9.052879 | g/kg | sand | 864.0 |
| 4 | 55.987445 | 9.331280 | g/kg | sand | 783.0 |
| 5 | 55.995999 | 8.752333 | g/kg | sand | 853.0 |
| 6 | 56.001327 | 9.216497 | g/kg | sand | 856.0 |
| 7 | 56.022299 | 8.804207 | g/kg | sand | 770.0 |
| 8 | 56.037555 | 8.891811 | g/kg | sand | 823.0 |
| 9 | 56.043104 | 8.666003 | g/kg | sand | 827.0 |
| 10 | 56.045669 | 9.268804 | g/kg | sand | 849.0 |
| 11 | 56.047958 | 9.314837 | g/kg | sand | 822.0 |
| 12 | 56.056133 | 9.218608 | g/kg | sand | 847.0 |
| 13 | 56.060658 | 8.863308 | g/kg | sand | 638.0 |
| 14 | 56.061782 | 9.333592 | g/kg | sand | 813.0 |
| 15 | 56.069915 | 8.995612 | g/kg | sand | 871.0 |
| 16 | 56.074669 | 8.940235 | g/kg | sand | 867.0 |
| 17 | 56.077143 | 9.226066 | g/kg | sand | 817.0 |
| 18 | 56.081898 | 8.954777 | g/kg | sand | 861.0 |
| 19 | 56.082875 | 9.008385 | g/kg | sand | 844.0 |
| 20 | 56.096027 | 9.278650 | g/kg | sand | 810.0 |
| 21 | 56.106124 | 8.913891 | g/kg | sand | 650.0 |
| 22 | 56.116758 | 8.938777 | g/kg | sand | 742.0 |
| 23 | 56.140225 | 8.693895 | g/kg | sand | 839.0 |
| 24 | 56.155668 | 8.748060 | g/kg | sand | 858.0 |
| 25 | 56.158798 | 9.161999 | g/kg | sand | 810.0 |
| 26 | 56.159170 | 8.870258 | g/kg | sand | 830.0 |
| 27 | 56.165322 | 8.887687 | g/kg | sand | 844.0 |
| 28 | 56.168373 | 9.008047 | g/kg | sand | 883.0 |
| 29 | 56.169744 | 9.027193 | g/kg | sand | 883.0 |
| 30 | 56.178598 | 9.309166 | g/kg | sand | 756.0 |
| 31 | 56.179776 | 9.023234 | g/kg | sand | 851.0 |
| 32 | 56.180252 | 8.770693 | g/kg | sand | 784.0 |
| 33 | 56.187660 | 9.347266 | g/kg | sand | 741.0 |
| 34 | 56.195798 | 9.289321 | g/kg | sand | 787.0 |
| 35 | 56.202788 | 9.140720 | g/kg | sand | 855.0 |
| 36 | 56.214122 | 8.683852 | g/kg | sand | 787.0 |
| 37 | 56.214425 | 8.847385 | g/kg | sand | 815.0 |
| 38 | 56.217701 | 9.247869 | g/kg | sand | 818.0 |
Correlate soil properties
To investigate how soil properties correlate to each other, we set up a linear regression model.
The formula we use is clay + sand + silt ~ ocs. We use the regression method to set up the model.
[24]:
model = collect.regression(formula = "clay + sand + silt ~ ocs")
Data from 41 points. 9 out of 50 locations returned no data.
Run .add_points(n) to add more points to the SoilCollect object.
Call:
lm(formula = formula, data = df)
Residuals:
Min 1Q Median 3Q Max
-14.859 -4.236 -1.505 4.303 28.218
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.016e+03 1.728e+01 58.785 <2e-16 ***
ocs 9.228e-02 2.842e-01 0.325 0.747
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.309 on 37 degrees of freedom
Multiple R-squared: 0.002842, Adjusted R-squared: -0.02411
F-statistic: 0.1055 on 1 and 37 DF, p-value: 0.7472
Running the model should print summary statistics to the screen. However, the main statistics are also stored in the model’s stats attribute:
[25]:
model.stats
[25]:
{'r_squared': 0.002842255479948067,
'intercept': 1015.9681202015358,
'slope': 0.09228047024951902}