Time-Series Files¶
-
read_data!(S, fmt::String, filepat [, KWs])
-
S = read_data(fmt::String, filepat [, KWs])
filepat
. Supports wildcards.?SeisIO.KW
. See table below for the list.Supported File Formats¶
File Format | String | Strict Match |
---|---|---|
AH-1 | ah1 | id, fs, gain, loc, resp, units |
AH-2 | ah2 | id, fs, gain, loc, resp |
Bottle (UNAVCO) | bottle | id, fs, gain |
GeoCSV, time-sample pair | geocsv | id |
GeoCSV, sample list | geocsv.slist | id |
Lennartz ASCII | lenartz | id, fs |
Mini-SEED | mseed | id, fs |
PASSCAL SEG Y | passcal | id, fs, gain, loc |
SAC | sac | id, fs, gain |
SEG Y (rev 0 or rev 1) | segy | id, fs, gain, loc |
SEISIO | seisio | id, fs, gain, loc, resp, units |
SLIST (ASCII sample list) | slist | id, fs |
SUDS | suds | id |
UW data file | uw | id, fs, gain, units |
Win32 | win32 | id, fs, gain, loc, resp, units |
Strings are case-sensitive to prevent any performance impact from using matches and/or lowercase().
Note that read_data with file format “seisio” largely exists as a convenience
wrapper; it reads only the first SeisIO object from each file that can be
converted to a SeisData structure. For more complicated read operations,
rseis
should be used.
Warning: GeoCSV files must be Unix text files; DOS text files, whose lines end in “\r\n”, will not read properly. Convert with dos2unix or equivalent Windows Powershell commands.
Supported Keywords¶
Keyword | Used By | Type | Default | Meaning |
---|---|---|---|---|
cf | win32 | String | “” | win32 channel info filestr |
full | [1] | Bool | false | read full header into :misc? |
ll | segy | UInt8 | 0x00 | set loc in :id? (see below) |
memmap | * | Bool | false | use Mmap.mmap to buffer file? |
nx_add | [2] | Int64 | 360000 | minimum size increase of x |
nx_new | [3] | Int64 | 86400000 | length(x) for new channels |
jst | win32 | Bool | true | are sample times JST (UTC+9)? |
swap | [4] | Bool | true | byte swap? |
strict | * | Bool | true | use strict match? |
v | * | Integer | 0 | verbosity |
vl | * | Bool | 0 | verbose source logging? [5] |
Table Footnotes
[1] | used by ah1, ah2, sac, segy, suds, uw; information read into :misc varies by file format. |
[2] | see table below. |
[3] | used by bottle, mseed, suds, win32 |
[4] | used by bottle, mseed, suds, win32 |
[5] | used by mseed, passcal, segy; swap is automatic for sac. |
[6] | adds one line to :notes per file read. It is not guaranteed that files listed in S.notes[i] contain data for channel i; rather, all files listed are from the read operation(s) that populated i. |
Performance Tips¶
- mmap=true improves read speed for some formats, particularly ASCII readers, but requires caution. In our benchmarks, the following significant (>3%) speed changes are observed:
- Significant speedup: ASCII formats, including metadata formats
- Slight speedup: mini-SEED
- Significant slowdown: SAC
2. With mseed or win32 data, adjust nx_new and nx_add based on the sizes of the data vectors that you expect to read. If the largest has Nmax samples, and the smallest has Nmin, we recommend nx_new=Nmin and nx_add=Nmax-Nmin.
Default values can be changed in SeisIO keywords, e.g.,
SeisIO.KW.nx_new = 60000
SeisIO.KW.nx_add = 360000
The system-wide defaults are nx_new=86400000 and nx_add=360000. Using these values with very small jobs will greatly decrease performance.
3. strict=true may slow read_data based on the fields matched as part of the file format. In general, any file format that can match on more than id and fs will read slightly slower with this option.
Channel Matching¶
By default, read_data continues a channel if data read from file matches the
channel id (field :id). In some cases this is not enough to guarantee a good match. With strict=true
, read_data matches against fields :id, :fs, :gain, :loc, :resp, and :units. However, not all of these fields are stored natively in all file formats. Column “Strict Match” in the first table lists which fields are stored (and can be logically matched) in each format with strict=true.
Examples¶
S = read_data("uw", "99011116541W", full=true)
- Read UW-format data file
99011116541W
- Store full header information in
:misc
- Read UW-format data file
read_data!(S, "sac", "MSH80*.SAC")
- Read SAC-format files matching string pattern MSH80*.SAC
- Read into existing SeisData object
S
S = read_data("win32", "20140927*.cnt", cf="20140927*ch", nx_new=360000)
- Read win32-format data files with names matching pattern
2014092709*.cnt
- Use ASCII channel information filenames that match pattern
20140927*ch
- Assign new channels an initial size of
nx_new
samples
- Read win32-format data files with names matching pattern
Memory Mapping¶
memmap=true is considered unsafe because Julia language handling of SIGBUS/SIGSEGV and associated risks is undocumented as of SeisIO v1.0.0. Thus, for example, we don’t know what a connection failure during memory-mapped file I/O does. In some languages, this situation without additional signal handling was notorious for corrupting files.
Under no circumstances should mmap=true be used to read files directly from a drive whose host device power management is independent of the destination computer’s. This includes all work flows that involve reading files directly into memory from a connected data logger. It is not a sufficient workaround to set a data logger to “always on”.
Format Descriptions and Notes¶
Additional format information can be accessed from the command line by typing
SeisIO.formats("FMT")
where FMT is the format name; keys(SeisIO.formats)
for a list.
- AH (Ad-Hoc) was developed as a machine-independent seismic data format based on External Data Representation (XDR).
- Bottle is a single-channel format maintained by UNAVCO (USA).
- GeoCSV: an extension of “human-readable”, tabular file format Comma-Separated Values (CSV).
- Lennartz: a variant of sample list (SLIST) used by Lennartz portable digitizers.
- PASSCAL: A single- channel variant of SEG Y with no file header, developed by PASSCAL/New Mexico Tech and used with PASSCAL field equipment.
- SAC: the Seismic Analysis Code data format, originally developed by LLNL for the eponymous command-line interpreter.
- SEED: adopted by the International Federation of Digital Seismograph Networks (FDSN) as an omnibus seismic data standard. mini-SEED is a data-only variant that uses only data blockettes.
- SEG Y: Society of Exploration Geophysicists data format. Common in the energy industry. Developed and maintained by SEG.
- SLIST: An ASCII file with a one-line header and data written to file in ASCII string format.
- SUDS: A similar format to SEED, developed by the US Geological Survey (USGS) in the late 1980s.
- UW: created in the 1970s by the Pacific Northwest Seismic Network (PNSN), USA, for event archival; used until the early 2000s.
- Win32: maintained by the National Research Institute for Earth Science and Disaster Prevention (NIED), Japan. Continuous data are divided into files that contain a minute of data from multiple channels stored in one-second segments.
Format-Specific Information¶
SEG Y¶
Only SEG Y rev 0 and rev 1 with standard headers are supported. The following are known support limitations:
- A few SEG Y headers are partially implemented or unused. These will be refined as we obtain more test data with standardized SEG Y headers and known results.
- Not all SEG Y files use the gain formula in the SEG Y rev 1 manual. Users are urged to consult equipment manufacturers and/or coders whose software converts proprietary data formats to SEG Y.
- SeisIO does not use the Textual File Header (file bytes 1-3600) or Extended Textual File Header records, as these were never standardized. Specify full=true to read the raw bytes into vectors in :misc. These byte vectors can be parsed manually by the user after file read.
Setting the Location Subfield¶
The location subfield within :id (“LL” in NN.SSSS.LL.CC) is normally blank, but can be set from an arbitrary Int32 quantity in SEG Y. The reason for this behavior is that SEG Y has at least six “recommended” quantities that can indicate a unique channel. Use one by passing the corresponding value from the table below to keyword “ll”:
Code | U | Bytes | :misc | Usual trace header quantity |
---|---|---|---|---|
0x00 | None (Default); don’t set LL | |||
0x01 | Y | 001-004 | trace_seq_line | Trace sequence number within line |
0x02 | Y | 005-008 | trace_seq_file | Trace sequence number within SEG Y file |
0x03 | 009-012 | rec_no | Original field record number | |
0x04 | Y | 013-016 | channel_no | Trace number within original field record |
0x05 | 017-020 | energy_src_pt | Energy source point number | |
0x06 | 021-024 | cdp | Ensemble number | |
0x07 | ? | 025-028 | trace_in_ensemble | Trace number within the ensemble |
0x08 | 037-040 | src-rec_dist | Distance from center of source point | |
0x09 | 041-044 | rec_ele | Receiver group elevation | |
0x0a | 045-048 | src_ele | Surface elevation at source | |
0x0b | 049-052 | src_dep | Source depth below surface (positive) | |
0x0c | 053-056 | rec_datum_ele | Datum elevation at receiver group | |
0x0d | 057-060 | src_datum_ele | Datum elevation at source | |
0x0e | 061-064 | src_water_dep | Water depth at source | |
0x0f | 065-068 | rec_water_dep | Water depth at group | |
0x10 | 073-076 | src_x | Source coordinate - X | |
0x11 | 077-080 | src_y | Source coordinate - Y | |
0x12 | 081-084 | rec_x | Group coordinate - X | |
0x13 | 085-088 | rec_y | Group coordinate - Y | |
0x14 | 181-184 | cdp_x | X coordinate of ensemble (CDP) position | |
0x15 | 185-188 | cdp_y | Y coordinate of ensemble (CDP) position | |
0x16 | 189-192 | inline_3d | For 3-D poststack data, in-line number | |
0x17 | 193-196 | crossline_3d | For 3-D poststack data, cross-line number | |
0x18 | 197-200 | shot_point | Shotpoint number (2-D post-stack data) | |
0x19 | 205-208 | trans_mant | Transduction Constant (mantissa) | |
0x1a | ? | 233-236 | unassigned_1 | Unassigned — For optional information |
0x1b | ? | 237-240 | unassigned_2 | Unassigned — For optional information |
A SEG Y file usually increments one (or more) of 0x01, 0x02, or 0x04 for each trace. Unfortunately, we can’t imagine any way to use all three, or even two, in a SEGY-compliant channel ID.
Warning: for any quantity above,
- Numeric values >1296 lead to nonstandard characters in the LL subfield
- Numeric values >7200 lead to non-unique :id fields, with undefined results
- Numeric values >9216 cause read_data to throw an InexactError
UW¶
Only UW v2 (UW-2) data files are supported. We have no reason to believe that any UW-1 data files are in circulation, and external converters to UW-2 exist.
Win32¶
Use older channel files with caution. They were not controlled by any central authority until the late 2010s. Inconsistencies between different versions of the same channel file were found by SeisIO developers as recently as 2015.