Developmental changes in the reflectance spectra of temperate deciduous tree leaves and implications for thermal emissivity and leaf temperature

Summary Leaf optical properties impact leaf energy balance and thus leaf temperature. The effect of leaf development on mid‐infrared (MIR) reflectance, and hence thermal emissivity, has not been investigated in detail. We measured a suite of morphological characteristics, as well as directional‐hemispherical reflectance from ultraviolet to thermal infrared wavelengths (250 nm to 20 µm) of leaves from five temperate deciduous tree species over the 8 wk following spring leaf emergence. By contrast to reflectance at shorter wavelengths, the shape and magnitude of MIR reflectance spectra changed markedly with development. MIR spectral differences among species became more pronounced and unique as leaves matured. Comparison of reflectance spectra of intact vs dried and ground leaves points to cuticular development – and not internal structural or biochemical changes – as the main driving factor. Accompanying the observed spectral changes was a drop in thermal emissivity from about 0.99 to 0.95 over the 8 wk following leaf emergence. Emissivity changes were not large enough to substantially influence leaf temperature, but they could potentially lead to a bias in radiometrically measured temperatures of up to 3 K. Our results also pointed to the potential for using MIR spectroscopy to better understand species‐level differences in cuticular development and composition.


Fig. S1
Baseline FTIR measurements used to calculate corrected sample reflectance and for quality assurance.

Table S4
Mean total reflectance (2 µm to 14 µm), partitioned to diffuse and specular components, for mature sun and shade leaves of five temperate deciduous species.

Methods S2 Determination of diffuse and specular reflectance components.
Notes S1 Differences between sun and shade leaves.
Notes S2 Diffuse and specular components of total reflectance.

Fig. S1
Baseline FTIR measurements used to calculate corrected sample reflectance and for quality assurance. These measurements were repeated at the beginning of each measurement session. Red lines indicate the mean, and grey shading the variability (95% confidence interval) across multiple measurements (n is the number of independent spectra measured, i.e., on different days). Note that the y-axis range is 15 % in panels (a) and (b), but only 5 % in panels (c) and (d). (a) reflectance spectra of a roughened gold reference standard (Pike Technologies, Madison, WI, USA). The NIST (National Institute of Standards and Technology) measurement ( st ) is taken as the "true" reflectance of the reference standard, while the NAU (Northern Arizona University) measurement (Rst) is the reference standard reflectance measured on the Nicolet iS10 FTIR spectrometer used in the present study; (b) reflectance spectra of a second roughened gold reference standard (Middleton Spectral Vision, Middleton, WI, USA); (c) reflectance spectra measured with no sample on the open port of the integrating sphere (R0), to quantify port overfilling; (d) reflectance spectra measured with aluminum foil blocking the illumination beam before it enters the sphere and detector assembly. Note that for all measurements on the NAU FTIR, the comparison method was used and reference measurements were made off the wall of the instrument's gold integrating sphere. See Table S1 for a full list of symbols used. Gray shading indicates 95 % confidence interval around the mean, based on n = 7 independent scans (conducted on different measurement days).

Fig. S3 Comparison of leaf reflectance measured by two different instruments with
overlapping spectral ranges. The PerkinElmer Lambda 750s has a range from 0.25 µm to 2.5 µm, while the Nicolet iS10 has a range from 2.0 µm to 20 µm. (a) agreement within the overlap region between the two instruments, for lower-reflectance immature red maple leaves (collected May 1) and higher-reflectance mature red maple leaves (collected June 26); and (b) reflectance at the 2.2 µm reflectance peak, for leaves of 5 deciduous species and multiple collection dates, as measured on the Nicolet (x-axis) vs. PerkinElmer (y-axis) spectrometers.
Shading around the Nicolet line in (a) and error bars in (b) indicate ± 1 standard deviation across n = 3 replicate leaf samples collected for each species-date combination. In (b), n is the number of independent species-date combinations, and r is Pearson's correlation. We filled the external cavity with a piece of ESLI Velvet and measured much lower reflectance across the entire spectrum from 2 µm to 14 µm.   , the shaded grey band shows Elvidge's (1988) ligno-cellulose spectrum, redrawn (mean ± 1 standard deviation) based on spectra presented in that paper's Figure 4 (Arctostaphylos glauca: gray wood, brown wood, and grey seed spectra).  Vrst signal from reference measurement, with standard * Note: the angle of incidence of input light on directing mirror is different between sample and reference measurements.   For each species and each crown position, one leaf from each of three individual trees was sampled on June 26, approximately 6 to 8 weeks after leaf-out. Reported values are mean ± 1 standard deviation, across n = 3 leaves.

Calculations and assumptions
Here we describe our procedure for calculating a corrected sample reflectance measurement (rs) for the Nicolet iS10 FTIR † scans used in this analysis. The symbols used are listed in Table S1. The corrections account for over-filling of the sample port (see also Hecker et al., 2011), assessed through measurement of V0 and Vr0, as well as the "true" (not equal to 100 %) reflectance of our roughened gold reflectance standard, which was quantified through independent measurement of rst by the Optical Technology Division at the National Institute of

Rs = Vs/Vrs
Cancelling k: 9 = 9 A9 9 9 + ;= A9 D D9 + ;<9 D D9 > A> > > + ;? A> D D9 + ;<> D D9 † Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. And: We note that in the present analysis, both the leaf samples and the reference standard are predominantly diffuse reflectors (Notes S2). The corrected sample reflectance ( 9 = 9L ) is therefore calculated as: Where Rs is the measured sample reflectance, R0 is the measured empty port reflectance, Rst is the measured reference standard reflectance, and 9L is the true reference standard reflectance.

Baseline measurements
At the start of measurements each day, we conducted a series of baseline FTIR measurements to enable the above calculation, and for quality assurance. We used the comparison method (Hanssen & Snail, 2001) (also referred to as the comparative method; see Hecker et al., 2011), whereby the sample is placed on the open port of the integrating sphere, a reference measurement is made off the wall of the sphere using the built-in flipper mirror, and then the sample measurement is made with the flipper mirror returned to the sample position (see also Blake et al., 2018). In this way, the sphere throughput is the same for both sample and reference measurements (cf. substitution method measurements, see: Hanssen & Snail, 2001;Hecker et al., 2011).
The spectra from these baseline measurements are illustrated in Figure S1. We begin by comparing the measured reference standard reflectance (Rst) against its "true" reflectance ( st ) as measured by NIST ( Figure S1a). We measured Rst > 100 % because the reflectance standard, a roughened gold puck, has higher reflectance than the wall of the integrating sphere. The fact that Rst > st is not cause for concern; this is accounted for in the correction calculation described above. Importantly, the repeatability (on different days) of measurements was very high: expressed as a 95 % confidence interval, the variation in reflectance (grey shading around the red line) was only about 0.5 %, and consistent in magnitude across the entire spectrum from 2 µm to 14 µm. We note that a similar reflectance spectrum was also obtained for a second roughened gold reference standard which was regularly measured ( Figure S1b). The empty sample port (R0) measurement ( Figure S1c) suggested a wavelength-dependent overfilling signal that varied in magnitude from ≈1 to 3 % but which was also consistently measured on different days, as the width of the 95% confidence interval was again about 0.5 %. To verify that the empty sample port measurement was not associated with other factors (e.g. stray light, electrical noise, etc.), we blocked the illuminating beam as it entered the sphere using a double layer of aluminum foil ( Figure S1d). The resulting spectrum was essentially zero, with no obvious structure, across the entire range from 2 µm to 14 µm. This measurement was again highly repeatable, with very little variation across different days: the width of the 95% confidence interval was only about 0.2 %, on average across the spectrum.

Evaluation of measured reflectance for a range of reference materials
To evaluate our measurement protocol and the correction procedure described above, we compared the reflectance spectra of a variety of reference materials-both high-and lowreflectance-measured on the Nicolet iS10 FTIR spectrometer used in this study with reflectance spectra of the same samples measured by NIST. Although we measured reflectance on the FTIR from 2 µm to 20 µm, we focus here on the region from 2 µm to 14 µm, as beyond 14 µm the quality of our measurements degraded rapidly, with substantial random error (assessed by the variation across repeat, independent measurements) relative to the magnitude of the reflectance signal for low-reflectance samples.
There are several important differences to note between our setup and the NIST setup, which may help to explain the observed (but generally minor) differences between the spectra we measured and the spectra measured by NIST. The NIST setup uses a 152.4 mm (6 in) diameter integrating sphere, compared with our 76.2 mm (3 in) sphere, and the internal baffling is different. At NIST, the illumination beam is at 8° incidence, compared with 12° for our setup, which could be important if there are strong BRDF (bidirectional reflectance distribution function) effects for particular samples, or a strong specular component. At NIST, 16 to 24 repeated reflectance measurements are averaged, and the total measurement time takes up to 6 h per sample. We averaged 64 scans, but measurement of each sample was completed in under 5 min. Thus, our setup is designed to minimize per sample measurement time and prioritize convenience, at the potential expense of increased uncertainty. In the analysis below, we characterize the random and systematic errors in our measurements, and use these to develop an estimate of the expanded uncertainty.
For a roughened gold reference standard provided by NIST ( Figure S2a), our corrected reflectance spectra were generally about 1 % higher (99 % reflectance vs 98 % reflectance) than the spectrum measured by NIST. Repeat measurements were somewhat noisier for this sample (95 % confidence interval width ≈1 %) compared to other high-reflectance samples (e.g. Figures  S1a, b), but this noise was small relative to the overall magnitude of the reflectance (≈1 % in relative terms).
For a sample of Aeroglaze Z306 ( Figure S2b), our corrected reflectance spectra were generally in very close agreement with the spectrum measured by NIST. The prominent spectral features at 2.6 µm, 4.8 µm, 8.1 µm, and 9.4 µm were seen in both spectra. But, a notable difference was that the reflectance peak from 3.6 µm to 5.7 µm was 50 % higher in the NIST spectrum (peaking at almost 7.5 % reflectance) than in our spectrum (peaking at 5.0 % reflectance). We note that NIST measurements of the specular reflectance from this sample indicate a strong peak, approximately 2.5 % reflectance, in this region, and very little specular reflectance outside of this region, which may partially explain this discrepancy. At longer wavelengths (> 5.7 µm), our spectra compare well with the NIST spectra, with a maximum difference of less than 0.4 % (10 % in relative terms) from 10 µm to 14 µm.
For a sample of D25C16 ( Figure S2c), the agreement between our corrected reflectance spectra and the spectrum measured by NIST was again very good. Similar spectral features at 3.0 µm, 3.4 µm, 5.7 µm, 6.8 µm, and 10.0 µm were seen in both spectra. Our corrected reflectance values were consistently about 0.4 % (10 % in relative terms) higher, across the entire spectrum, than the NIST measured reflectance. Repeatability of our measurements for this sample was again very high (95 % confidence interval width ≈ 0.5 %).
For a sample of ESLI Velvet ( Figure S2d), which is composed of aligned carbon needles, there were no prominent reflectance features to compare between our corrected reflectance spectra and the spectrum measured by NIST. As was the case for the preceding three reference materials, we measured a somewhat higher reflectance than NIST. This bias was exhibited across the entire spectrum, and tended to increase in size with increasing wavelength from an offset of about 0.1 % at 2 µm to 0.3 % at 7 µm (up to 40 % in relative terms).
To put the above comparisons in context, we note that the expanded uncertainty on the NIST measurements is estimated to be about 2.8 % (mean across 2 µm to 14 µm) for the NIST roughened gold sample (a high-reflectance reference material), and about 0.1 % (mean across 2 µm to 14 µm) for the ESLI Velvet sample (a low-reflectance reference material). We conduct a more formal uncertainty analysis below.
Finally, we conducted an analysis leveraging the overlapping spectral range (from 2.0 µm to 2.5 µm) of our two instruments, the PerkinElmer Lambda 750s (full range, 0.25 µm to 2.5 µm) and Nicolet iS10 (full range, 2.0 µm to 20 µm). Within this zone of overlap, there is a prominent peak in leaf reflectance at 2.2 µm. Despite differences in measurement technology, illumination sources, reflectance standards, and the fact that across this range both instruments are at the extreme edges of their spectral sensitivity, our analyses show that for two samples of varying reflectance (red maple leaves collected May 1 and June 26, respectively), the measured spectra are highly consistent between the two instruments ( Figure S3a). Indeed, a comparison between 2.2 µm reflectance measured with the PerkinElmer instrument and 2.2 µm reflectance measured with the Nicolet instrument shows minimal bias and very high correlation ( Figure  S3b). In this comparison, we averaged the n = 3 samples collected for different species and crown positions; the error bars denote the standard deviation across these three replicates. This analysis gives us further confidence, not only in the spectral patterns but also in the overall magnitude of reflectance measured with the FTIR (Nicolet iS10) instrument.

Determination of expanded uncertainty
To quantitatively assess the uncertainties in our FTIR measurements, we followed the general approach described by Blake et al. (2018). We used the repeated measurements of three our standard materials, spanning a range of reflectance from highly reflective (roughened gold reference standard; Figure S2a) to minimally reflective (D25C16 and ESLI velvet;Figures S2c and d,respectively) to quantify random measurement uncertainties. We used the difference between the mean of our measurements and the NIST measurement of the same material to quantify systematic uncertainties. We included the uncertainty in the NIST measurement of the same material as an additional source of uncertainty. We then combined these three uncertainties in quadrature and multiplied by a coverage factor of k = 2 to obtain the expanded uncertainty. We report here the mean expanded uncertainty, calculated from 2 µm to 14 µm. These results are presented in Table S2. For all three standard materials, the systematic error was substantially larger than the random error. For the high-reflectance roughened gold reference standard, our expanded uncertainty was 3.83 %. For the two low-reflectance standard materials, D25C16 and ESLI velvet, our expanded uncertainties were 0.60 % and 0.51 %, respectively.
We then compared D25C16 and ESLI velvet expanded uncertainties, and their components, to the variability across replicate samples, and in relation to observed changes in reflectance over time. The purpose of this analysis was to assess the relative magnitudes of biological variability and developmental change in relation to the overall uncertainty of our measurement system for materials that are superficially similar in reflectance to tree leaves. These results are shown in Table S3. We defined "biological variability" as the standard deviation across the n = 3 replicate samples collected for each species x crown position x sampling date combination. For the five species studied here, the mean biological variability (1 standard deviation) ranged from 0.39 % to 0.58 %, which is three-to four-fold larger than the random uncertainty (1 standard deviation) estimated for the low-reflectance standard materials, and comparable in magnitude to the expanded uncertainty for both D25C16 and ESLI velvet. We defined "developmental change" as the standard deviation across the mean reflectance measured on each of the different sun leaf sampling dates (n = 7 dates for red maple, paper birch, and trembling aspen; n = 6 dates for American beech; n = 5 dates for red oak). For the five tree species studied here, the developmental change (1 standard deviation) ranged from 3.2 % to 4.2 %, and is thus almost an order of magnitude larger than the biological variability across replicate samples.

Conclusion
Taken together, the above results give encouraging evidence for the robust repeatability of reflectance measurements with our setup, and the ability of our setup to measure reflectance of diffuse low-reflectance (< 10 % reflectance) samples that are in good agreement (in the context of random and systematic uncertainties) with measurement of the same sample at NIST. These findings are also consistent with results of our Nicolet vs. PerkinElmer comparison. The major difference between the spectra we measured and as measured by NIST appears, for the most part, to be a small upward bias in our spectra, even after correcting for overfilling of the sample port. With additional correction, this bias could be minimized or even eliminated.