Data Descriptions

This page details the ESSP IV standardized data products, important things to note, and provides links to example code.

We ask that all methods submit results using the provided standardized data products. If you would also like to test your methods on custom generated data products, please submit those results in addition to results using the standardized data products.

The data can be accessed via the following link (Updated October 9, 2025)

Results can be submitting using the following link. Both links use the same password.


Example Code

  1. ESSP IV standardized spectra
    • Python Script
    • Jupyter Notebook
  2. ESSP IV order-by-order and combined CCF
    • Python Script
    • Jupyter Notebook
  3. Time series of RVs/indicators
    • Python Script
    • Jupyter Notebook

Please reach out if there is any additional code you would find helpful or examples in additional coding languages.


Instrument Properties

These data products are produced from solar observations taken by HARPS, HARPS-N, EXPRES, and NEID. For more details about each instrument, please see Table 1/Section 2 of the ESSP III paper.

Intrinsic Offset

There is an intrinsic offset between the four instruments, meaning the spectral lines and CCFs from the different instruments will not exactly line up. The HARPS and HARPS-N observations fall close to one another, and the EXPRES and NEID data fall close together. These two groups, however, are separated by about 650 m/s.

If alignment is needed, a single global offset for each instrument is given here. These offsets were derived by aligning binned CCF RVs from the full set of observations (i.e. as opposed to the subsets of data given in each released data set). An example of how to read in these offsets and use them to align CCFs is given in the example CCF code.

Heliocentric Corrections

One of the benefits of Solar data is the ability to correct for the reflex motion of the Solar System planets. For the ESSP data, these corrections are implemented on the wavelengths at the spectral level. The HARPS and HARPS-N data were not corrected in data downloaded before October 9, 2025.

EXPRES and NEID use barycorrpy to shift wavelengths into the heliocentric frame. Corrections for HARPS-N were obtained through DACE as prescribed by the read me. Corrections for HARPS were obtained through custom code that, like barycorrpy, uses the ephemeris from JPL Horizons.


Spectra

Data from all four instruments have been rewritten to a standard FITS format. The spectral data is stored in different HDUs, which are named by the below keywords. All data have dimensions of [number of orders, number of pixels].

Keyword Description
WAVELENGTH Barycentric corrected wavelengths in angstroms
FLUX Intensity at the given wavelength; includes instrument blaze response
UNCERTAINTY Standard deviation for each flux value as assigned by each instrument’s data reduction pipeline
CONTINUUM Continuum model
BLAZE Blaze model, which can be used to remove the general blaze shape from the FLUX array
COMMON_MASK A binary mask that is True for wavelengths that are present for all instruments
TELLURIC_MASK A binary mask that is True where telluric features have been masked and spectral values are replaced with NaNs

Important to Note

  1. Some orders will have only NaN values (read more)
  2. Redder orders will have NaNs sprinkled throughout due to telluric masking (read more)
  3. A subset of headers have been included (read more); please reach out if any headers required by your method are missing
  4. Masking may be required of the low SNR EXPRES order edges (read more)
  5. Redder NEID orders (i.e. echelle orders redder than 69) are not included due to significant telluric contamination and low signal (read more)

Relative Orders Standardized to Echelle Order

The data within each file have been re-organized such that the relative indexing of orders always maps to the same echelle order. In other words, the first order in every standardized FITS file corresponds to echelle order 161. Calling the Nth order of this standardized data will return an echelle order of a comparable wavelength regardless of the source instrument (e.g., H-alpha appears in the 69th entry of the standardized data for all instruments).

Data has been padded as necessary with NaNs. For instance, NEID orders start at echelle order 161 while EXPRES orders start at echelle order 160. Therefore, the first entry for every EXPRES-sourced ESSP file contains only NaNs. Of particular note is that the HARPS data does not contain echelle order 115 (relative order 47 in the standardized data format).

The provided example code includes a simple function to generate a mask that will exclude orders that are all NaNs.

Telluric Masking

In addition to the all NaN orders described above, there will also be NaNs throughout orders where telluric features have been masked out. We choose to mask out tellurics rather than remove and re-inject these lines with a potentially flawed telluric model. We believe introducing NaNs is preferable to potentially adding spurious features due to model inaccuracies, especially as such features could lead to systematics that are difficult to diagnose.

The requested detailed questions for each method will ask how this masking affects the performance of your method. Please reach out if the introduction of NaNs significantly hinders the implementation of your method.

Standardized Headers

Header keywords have been standardized across instruments (largely following the recommendations of Burt et al. 2024). The included keywords and their counterparts in the original FITS files are listed in this table.

Please reach out if required header information for your method is not included in the standardized files.

EXPRES Order Edges

EXPRES order edges, especially in the blue, will exhibit NaNs and very low SNR values.

The extracted EXPRES spectra encompass the full detector, resulting in 7920 pixels in each order. In practice, however, the signal drops off sharply at the edge of each order as expected due to the blaze response. Pixels with too little signal to be extracted are replaced with NaNs.

Very low SNR extracted values will persist on the edges of EXPRES orders, especially in the blue. This can result in noisy, even negative extracted values. These noisy values are accompanied by corresponding large uncertainties, but we find that in practice it is easiest to implement an uncertainty cut on extracted values. We recommend masking out all values with a large fractional uncertainty in addition to excluding the NaN values in each order.

NEID Redder Orders

For the NEID data, we include only echelle orders 161-69, which spans approximately 3770 to 8968 A. Spectra redder than echelle order 69 are not likely to be useful due to the significant telluric masking required in this region. Over 50% of pixels in these orders are masked. There are also few CCF lines for these orders (on the order of 5 per order).

Merged Spectra

Spectra merged across orders is also available. These files will have the same headers as the un-merged files as well as arrays for the merged wavelength, flux, and uncertainty (with the same keywords).


CCFs

Standardized spectra have been run through the same CCF pipeline, which makes use of the iCCF code. All CCFs use the same line mask and are sampled on the same velocity grid. The CCF from each order is reweighted before being combined into a global CCF; the specific weighting is different for each instrument. The resultant CCF files contain the following HDUs.

We use Nv to indicate the length of the velocity grid and No to indicate the number of echelle orders.

Keyword Shape Description
V_GRID Nv Velocity grid in km/s for all CCFs in file
CCF Nv Global CCF value (i.e. all orders combined)
E_CCF Nv Global CCF errors
ECHELLE_ORDERS No Corresponding echelle order for each order-by-order (obo) entry
OBO_CCF No x Nv Order-by-order CCFs
OBO_E_CCF No x Nv Order-by-order CCF Errors
OBO_CCF_RV No Best-fit CCF RV (mean of a Gaussian fit) for each order in m/s
OBO_CCF_E_RV No Standard deviation of the mean of the Gaussian fit to each order's CCF m/s

Important to Note

  1. Some order specific CCFs will have only NaN values (read more)
  2. The standard velocity grid is oversampled for HARPS and HARPS-N (read more)
  3. Because of intrinsic instrument offsets, the CCFs from different instruments will be offset (see discussion above); the provided example code gives a function for aligning and re-interpolating the provided CCFs.
  4. Only a subset of headers have been included; please reach out if any headers required by your method are missing.

HARPS/HARPS-N Velocity Sampling

The provided standardized CCFs all share the same velocity grid with a spacing of 400 m/s, which oversamples the HARPS/HARPS-N pixel sampling. For the HARPS/HARPS-N pipeline, a spacing of 820 m/s is used.

If your method does not require a uniform velocity grid across all instruments or treats each instrument separately, we recommend downsampling the velocity grid by half. The provided example code contains a function that implements this.


Time Series

A CSV file containing time series information (e.g. RVs, indicators, etc.) is given for each data set. We recommend using pandas to interact with these files, though another option is given in the provided example code.

Column Name Description
Standard File Name Name of associated standard spectrum file.
Time [eMJD] Photon weighted midpoint in eMJD (i.e. extra modified JD, can be used like BJD)
RV [m/s] Zero-centered radial velocity in m/s
RV Err. [m/s] Radial velocity error in m/s (analytic, from CCF fit)
Exp. Time [s] Exposure time of observation in seconds
Airmass Airmass of observation
BERV [km/s] Barycentric velocity of observation in km/s
Instrument Source instrument for observation
CCF FWHM [km/s] FWHM of Gaussian fit to global CCF in km/s
CCF FWHM Err. [km/s] Standard deviation of the FWHM of the Gaussian fit to the global CCF in km/s
CCF Contrast Amplitude of Gaussian fit to the global CCF
CCF Contrast Err. Standard deviation of the amplitude of the Gaussian fit to the global CCF
BIS [m/s] CCF bisector skew as defined in Queloz et al. 2001 in km/s
H-alpha Emission Minimum value in the H-alpha line with continuum normalized spectra
CaII Emission Emission in core of Ca II HK lines
(The provided example code includes a demonstration of how to rename the columns.)

Important to Note

  1. Time stamps (throughout all data products) feature an additional, random offset from their original time stamps (the "e" in "eMJD" is for extra modified JD)
  2. The RVs in this table have been centered in order to align values from all four instruments (see above); these RVs will therefore differ from what is given in the headers of the individual CCF files.
  3. Some analytical errors for activity indicators are provided, but there is also the option to use empirical errors (read more)
  4. We suggest that derived indicators from each instrument be treated independently (read more)
  5. (Oct 9, 2025) The CCF FWHM and contrast have now been corrected for the apparent change in Solar vsini (read more)
  6. Please reach out if any indicators required by your method are missing.

Empirical Errors

The derived analytical errors appear to overestimate the error for CCF FWHM/Contrast and are not given for BIS, H-alpha emission, and CaII emission. To derive an empirical error, we found the spread in indicator values across all provided data. This should be an appropriate error estimate as the Sun was relatively inactive during these observations.

We therefore suggest using the following empirical errors:

  • CCF FWHM [km/s]: 0.005
  • CCF Contrast: 130
  • BIS [m/s]: 0.95
  • H-alpha Emission: 0.001
  • CaII Emission: 0.003

Indicator Offsets/Scaling

We suggest treating the indicators from each instrument independently. It is obvious that at least an offset is needed between indicators from different instruments. At this time, we are still working to establish whether a scaling will also be needed at the level of EPRV work. For instance, CCF FWHM shows an obvious need for both an offset and scaling factor.

We therefore opt to present the indicators directly as they are derived from the spectra/CCFs, but advise that different model parameters will likely be needed for each instrument.

CCF FWHM/Contrast Corrections

The slight eccentricity of Earth’s orbit causes variations to the Sun’s vsini, requiring corrections to the measured CCF FWHM and contrast (see Collier Cameron et al. 2019 Section 3). These corrections were not applied in data downloaded before October 9, 2025.

The columns “CCF FWHM [m/s]” and “CCF Contrast” in each DS#_timeSeries.csv file now contains the corrected values. The uncorrected values are still include in columns that end in “Raw” (i.e. “CCF FWHM [km/s] Raw”). The corrections that must be applied are given in column “CCF FWHM [km/s] Correction”

If you choose to rederive your own CCF FWHM/contrast values, the corrections should be applied as follows:

  • "CCF FWHM [km/s]" = √("CCF FWHM [km/s] Raw"2+"CCF FWHM [km/s] Correction"2)
  • "CCF Contrast" = "CCF Contrast Raw" × "CCF FWHM [km/s] Raw" ÷ "CCF FWHM [km/s] "
The provided example code contains a cell that implements this.