Tips¶
Impact of memory argument¶
By default when the MDF object is created all data is loaded into RAM (memory=’full’). This will give you the best performance from asammdf.
However if you reach the physical memory limit asammdf gives you two options:
- memory=’low’ : only the metadata is loaded into RAM, the raw channel data is loaded when needed
- memory=’minimum’ : only minimal data is loaded into RAM.
MDF created with memory=’full’¶
Advantages
- best performance if all channels are used (for example cut, convert, export or merge methods)
Disadvantages
- higher RAM usage, there is the chance of MemoryError for large files
- data is not accessed in chunks
- time can be wasted if only a small number of channels is retreived from the file (for example filter, get or select methods)
Use case
- when data fits inside the system RAM
MDF created with memory=’low’¶
Advantages
- lower RAM usage than memory=’full’
- can handle files that do not fit in the available physical memory
- middle ground between ‘full’ speed and ‘minimum’ memory usage
Disadvantages
- slower performance for retrieving channel data
- must call close method to release the temporary file used in case of appending.
Note
it is advised to use the MDF context manager in this case
Use case
- when ‘full’ data exceeds available RAM
- it is advised to avoid getting individual channels when using this option
- best performance / memory usage ratio when using cut, convert, flter, merge or select methods
Note
See benchmarks for the effects of using the flag
MDF created with memory=’minimum’¶
Advantages
- lowest RAM usage
- the only choise when dealing with huge files (10’s of thousands of channels and GB of sample data)
- handle big files on 32 bit Python ()
Disadvantages
- slightly slower performance compared to momeory=’low’
- must call close method to release the temporary file used in case of appending.
Note
See benchmarks for the effects of using the flag
Chunked data access¶
When the MDF is created with the option “full” all the samples are loaded into RAM and are processed as a single block. For large files this can lead to MemoryError exceptions (for example trying to merge several GB sized files).
asammdf optimizes memory usage for options “low” and “minimum” by processing samples in fragments. The read fragment size was tuned based on experimental measurements and should give a good compromise between execution time and memory usage.
You can further tune the read fragment size using the configure method, to favor execution speed (using larger fragment sizes) or memory usage (using lower fragment sizes).
Optimized methods¶
The MDF methods (cut, filter, select) are optimized and should be used instead of calling get for several channels. For “low” and “minimum” options the time savings can be dramatic.
Faster file loading¶
BytesIO and memory=’full’¶
In case of files with high block count (large number of channels, or large number of data blocks) you cand speed up the loading in case of full memory option, at the expense of higher RAM usage by reading the file into a BytesIO object and feeding it to the MDF class
Using a test file with the size of 3.2GB that contained ~580000 channels the loading time and RAM usage were
- Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]
- Windows-10-10.0.15063-SP0
- Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
- 16GB installed RAM
Open file | Time [ms] | RAM [MB] |
---|---|---|
asammdf 3.5.1.dev mdfv4 | 62219 | 4335 |
asammdf w BytesIO 3.5.1.dev mdfv4 | 31232 | 7409 |
Skip XML parsing for MDF4 files¶
MDF4 uses the XML channel comment to define the channel’s display name (this acts as an alias for the channel name). XML pasring is an expensive operation that can have a big impact on the loading performance of measurements with hihg channel count.
You can use the keyword only argument use_display_names when creating MDF objects to skip the XML parsing. This means that the display names will not be available when calling the get method.
Using a test file that contained ~36000 channels the loading times were
Open file | Time [ms] | RAM [MB] |
---|---|---|
asammdf 3.5.1.dev full mdfv4 use_display_names=True | 6086 | 335 |
asammdf 3.5.1.dev low mdfv4 use_display_names=True | 5590 | 170 |
asammdf 3.5.1.dev minimum mdfv4 use_display_names=True | 4694 | 61 |
asammdf 3.5.1.dev full mdfv4 use_display_names=False | 2020 | 328 |
asammdf 3.5.1.dev low mdfv4 use_display_names=False | 1912 | 163 |
asammdf 3.5.1.dev minimum mdfv4 use_display_names=False | 966 | 59 |