Использование xarray для получения среднемесячного значения
Я работаю с GLDAS
данные реанализа, период 1 year
, Файлы .nc4
, Я могу открыть файлы, но я не могу дать groupby
за 1 month
, Я не хочу делать руку или для, и я обнаружил, что xarray
делает groupby
, Мой сценарий:
In[16]:import xarray as xr
In[17]:gldas = xr.open_mfdataset('./GLDAS_2010/*.nc4', chunks=None, concat_dim='time', preprocess=None, engine='netcdf4', lock=None,compat='minimal',coords='minimal',data_vars='minimal')
In[18]: gldas
Out[18]:
<xarray.Dataset>
Dimensions: (bnds: 2, lat: 40, lon: 48, time: 365)
Coordinates:
* lat (lat) float32 -34.875 -34.625 -34.375 -34.125 ...
* lon (lon) float32 -59.875 -59.625 -59.375 -59.125 ...
* time (time) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Dimensions without coordinates: bnds
Data variables:
Albedo_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
AvgSurfT_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
CanopInt_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
ECanop_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
ESoil_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Evap_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
LWdown_f_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Lwnet_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
PotEvap_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Psurf_f_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qair_f_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qg_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qh_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qle_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qs_acc (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qsb_acc (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Qsm_acc (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Rainf_f_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Rainf_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
RootMoist_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SWE_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SWdown_f_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SnowDepth_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Snowf_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilMoi0_10cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilMoi100_200cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilMoi40_100cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilTMP0_10cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilTMP100_200cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilTMP10_40cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
SoilTMP40_100cm_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Swnet_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Tair_f_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Tveg_tavg (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
Wind_f_inst (time, lat, lon) float32 dask.array<shape=(365, 40, 48), chunksize=(1, 40, 48)>
time_bnds (time, bnds) float64 dask.array<shape=(365, 2), chunksize=(1, 2)>
Attributes:
CDI: Climate Data Interface version 1.6.9 (http://mpim...
Conventions: CF-1.4
history: Mon Feb 5 20:07:25 2018: /usr/bin/ncks -O -L 1 /...
source: Noah_v3.3
institution: NASA GSFC
missing_value: -9999.0
tavg definision:: past 3-hour average
acc definision:: past 3-hour accumulation
inst definision:: instantaneous
title: GLDAS2.1 LIS land surface model output
references: Rodell_etal_BAMS_2004, Kumar_etal_EMS_2006, Peter...
conventions: CF-1.6
comment: website: http://ldas.gsfc.nasa.gov/gldas, http://...
MAP_PROJECTION: EQUIDISTANT CYLINDRICAL
SOUTH_WEST_CORNER_LAT: -59.875
SOUTH_WEST_CORNER_LON: -179.875
DX: 0.25
DY: 0.25
CDO: Climate Data Operators version 1.6.9 (http://mpim...
NCO: 20180205
когда я пытаюсь:
In[19]:pp.set_index('time').groupby(freq='1M').mean('time')
выдает эту ошибку:
TypeError: groupby() got an unexpected keyword argument 'freq'
хорошо, зная, что есть этот аргумент, я пойду к следующей попытке:
In[20]:import pandas as pd
In[21]:pp.groupby(['Albedo_inst',pd.Grouper(key='time', freq='M')]).mean()
Ошибка:
TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension
Последняя попытка была без частоты:
In[22]:pp.groupby('Albedo_inst').mean('time')
И ошибка:
ValueError: Level values must be unique: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] on level 0
Я попробую:
gldas.resample('1MS', dim='time', how='mean')
ошибка:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Float64Index'
Я попробую:
gldas.groupby('time.month').mean('time')
ошибка:
AttributeError: 'IndexVariable' object has no attribute 'month'
Кто-нибудь поможет мне? Спасибо, парни!
Редактировать:
In [24]: pp['time']
Out[24]:
<xarray.DataArray 'time' (time: 365)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
* time (time) float64 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Attributes:
standard_name: time
bounds: time_bnds
units: day as %Y%m%d.%f
calendar: proleptic_gregorian
РЕДАКТИРОВАТЬ:
Если я открою файл:
In[108]:gldas
Out[108]:
<xarray.Dataset>
Dimensions: (bnds: 2, lat: 40, lon: 48, time: 1)
Coordinates:
* lat (lat) float32 -34.875 -34.625 -34.375 -34.125 ...
* lon (lon) float32 -59.875 -59.625 -59.375 -59.125 ...
* time (time) float64 0.0
* bnds (bnds) int64 0 1
Data variables:
Albedo_inst (time, lat, lon) float64 15.66 16.0 16.23 15.98 ...
AvgSurfT_inst (time, lat, lon) float64 302.8 302.3 302.3 302.4 ...
CanopInt_inst (time, lat, lon) float64 0.001 0.0 0.0 0.0 0.0 ...
ECanop_tavg (time, lat, lon) float64 2.48 0.89 0.16 0.0 0.02 ...
ESoil_tavg (time, lat, lon) float64 46.39 46.58 49.18 55.22 ...
Evap_tavg (time, lat, lon) float64 0.0001668 0.0001683 ...
LWdown_f_tavg (time, lat, lon) float64 369.5 368.3 366.8 365.6 ...
Lwnet_tavg (time, lat, lon) float64 -88.05 -87.24 -87.68 ...
PotEvap_tavg (time, lat, lon) float64 622.3 610.4 601.1 599.6 ...
Psurf_f_inst (time, lat, lon) float64 1.004e+05 1.006e+05 ...
Qair_f_inst (time, lat, lon) float64 0.01117 0.01144 0.01172 ...
Qg_tavg (time, lat, lon) float64 54.2 53.65 55.13 58.52 ...
Qh_tavg (time, lat, lon) float64 110.4 101.9 104.5 114.7 ...
Qle_tavg (time, lat, lon) float64 417.2 420.8 416.1 405.6 ...
Qs_acc (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
Qsb_acc (time, lat, lon) float64 0.00253 0.00335 0.00528 ...
Qsm_acc (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
Rainf_f_tavg (time, lat, lon) float64 1.4e-06 5e-07 1e-07 0.0 ...
Rainf_tavg (time, lat, lon) float64 1.4e-06 5e-07 1e-07 0.0 ...
RootMoist_inst (time, lat, lon) float64 255.3 259.6 263.9 266.1 ...
SWE_inst (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
SWdown_f_tavg (time, lat, lon) float64 793.9 789.7 791.5 794.6 ...
SnowDepth_inst (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
Snowf_tavg (time, lat, lon) float64 0.0 0.0 0.0 0.0 0.0 0.0 ...
SoilMoi0_10cm_inst (time, lat, lon) float64 24.86 25.17 25.47 25.51 ...
SoilMoi100_200cm_inst (time, lat, lon) float64 254.4 258.8 266.1 274.1 ...
SoilMoi40_100cm_inst (time, lat, lon) float64 153.7 156.7 159.8 161.6 ...
SoilTMP0_10cm_inst (time, lat, lon) float64 293.8 293.5 293.4 293.7 ...
SoilTMP100_200cm_inst (time, lat, lon) float64 291.3 291.0 290.9 290.9 ...
SoilTMP10_40cm_inst (time, lat, lon) float64 292.4 292.2 292.1 292.2 ...
SoilTMP40_100cm_inst (time, lat, lon) float64 292.2 292.0 291.8 291.9 ...
Swnet_tavg (time, lat, lon) float64 669.5 663.3 663.0 667.7 ...
Tair_f_inst (time, lat, lon) float64 300.2 300.0 299.8 299.5 ...
Tveg_tavg (time, lat, lon) float64 368.3 373.3 366.8 350.4 ...
Wind_f_inst (time, lat, lon) float64 3.704 3.704 3.704 3.704 ...
time_bnds (time, bnds) float64 0.0 0.0
Attributes:
CDI: Climate Data Interface version 1.6.9 (http://mpimet.mpg.de/cdi)
Conventions: CF-1.4
history: Mon Feb 5 20:07:25 2018: /usr/bin/ncks -O -L 1 /tmpdata/regridder/services_88661/cdoGLDAS_NOAH025_3H.A20100101.1500.021.SUB.nc4 /tmpdata/regridder/services_88661/deflatecdoGLDAS_NOAH025_3H.A20100101.1500.021.SUB.nc4
Mon Feb 05 20:07:21 2018: cdo -s -L -f nc4 -sellonlatbox,-60.0,-48.0,-35.0,-25.0 -selname,Albedo_inst,AvgSurfT_inst,CanopInt_inst,ECanop_tavg,ESoil_tavg,Evap_tavg,LWdown_f_tavg,Lwnet_tavg,PotEvap_tavg,Psurf_f_inst,Qair_f_inst,Qg_tavg,Qh_tavg,Qle_tavg,Qs_acc,Qsb_acc,Qsm_acc,Rainf_...
source: Noah_v3.3
institution: NASA GSFC
missing_value: -9999.0
tavg definision:: past 3-hour average
acc definision:: past 3-hour accumulation
inst definision:: instantaneous
title: GLDAS2.1 LIS land surface model output
references: Rodell_etal_BAMS_2004, Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
conventions: CF-1.6
comment: website: http://ldas.gsfc.nasa.gov/gldas, http://lis.gsfc.nasa.gov/
MAP_PROJECTION: EQUIDISTANT CYLINDRICAL
SOUTH_WEST_CORNER_LAT: -59.875
SOUTH_WEST_CORNER_LON: -179.875
DX: 0.25
DY: 0.25
CDO: Climate Data Operators version 1.6.9 (http://mpimet.mpg.de/cdo)
NCO: 20180205/usr/local/lib/python3.6/dist-packages/spyder/widgets/variableexplorer/utils.py:414: FutureWarning: 'summary' is deprecated and will be removed in a future version.
display = value.summary()
Вывод: я понял. Если кому-то нужна помощь, отправьте мне сообщение.
1 ответ
Я не мог получить xr.open_mfdataset
, поэтому я должен был создать новый ds
и пойти на это.
files = glob('./GLDAS_2010/*.nc4')
data = Dataset(files[0], mode='r')
albedo=0
for file in files:
dados = Dataset(file, mode='r')
lons = dados.variables['lon'][:]
lats = dados.variables['lat'][:]
Albed = dados.variables['Albedo_inst'][0,:,:]
time=datetime.strptime(file[-29:-16], '%Y%m%d.%H%M')
ds = xr.DataArray(Albed, coords={'lat':lats, 'lon':lons,'time':time}, dims=['lat','lon']).to_dataset(name='Albedo')
if type(albedo)==int:
albedo=ds
else:
albedo = xr.concat([albedo, ds], 'time')
albedo = albedo.sortby('time')
mean_month = albedo.Albedo.resample(freq='M',dim='time')
Если у вас нет действительной переменной времени (время в вашем примере выше 0), вам придется это исправить. Но как только это будет сделано, я думаю, что строка, которую вы ищете:
gldas['Albedo_inst'].resample(time="1MS").mean(dim="time")