Tips

Impact of memory argument

By default when the MDF object is created all data is loaded into RAM (memory=’full’). This will give you the best performance from asammdf.

However if you reach the physical memory limit asammdf gives you two options:

  • memory=’low’ : only the metadata is loaded into RAM, the raw channel data is loaded when needed
  • memory=’minimum’ : only minimal data is loaded into RAM.

MDF created with memory=’full’

Advantages

  • best performance if all channels are used (for example cut, convert, export or merge methods)

Disadvantages

  • higher RAM usage, there is the chance of MemoryError for large files
  • data is not accessed in chunks
  • time can be wasted if only a small number of channels is retreived from the file (for example filter, get or select methods)

Use case

  • when data fits inside the system RAM

MDF created with memory=’low’

Advantages

  • lower RAM usage than memory=’full’
  • can handle files that do not fit in the available physical memory
  • middle ground between ‘full’ speed and ‘minimum’ memory usage

Disadvantages

  • slower performance for retrieving channel data
  • must call close method to release the temporary file used in case of appending.

Note

it is advised to use the MDF context manager in this case

Use case

  • when ‘full’ data exceeds available RAM
  • it is advised to avoid getting individual channels when using this option
  • best performance / memory usage ratio when using cut, convert, flter, merge or select methods

Note

See benchmarks for the effects of using the flag

MDF created with memory=’minimum’

Advantages

  • lowest RAM usage
  • the only choise when dealing with huge files (10’s of thousands of channels and GB of sample data)
  • handle big files on 32 bit Python ()

Disadvantages

  • slightly slower performance compared to momeory=’low’
  • must call close method to release the temporary file used in case of appending.

Note

See benchmarks for the effects of using the flag

Chunked data access

When the MDF is created with the option “full” all the samples are loaded into RAM and are processed as a single block. For large files this can lead to MemoryError exceptions (for example trying to merge several GB sized files).

asammdf optimizes memory usage for options “low” and “minimum” by processing samples in fragments. The read fragment size was tuned based on experimental measurements and should give a good compromise between execution time and memory usage.

You can further tune the read fragment size using the configure method, to favor execution speed (using larger fragment sizes) or memory usage (using lower fragment sizes).

Optimized methods

The MDF methods (cut, filter, select) are optimized and should be used instead of calling get for several channels. For “low” and “minimum” options the time savings can be dramatic.

Faster file loading

BytesIO and memory=’full’

In case of files with high block count (large number of channels, or large number of data blocks) you cand speed up the loading in case of full memory option, at the expense of higher RAM usage by reading the file into a BytesIO object and feeding it to the MDF class

Using a test file with the size of 3.2GB that contained ~580000 channels the loading time and RAM usage were

  • Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]
  • Windows-10-10.0.15063-SP0
  • Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
  • 16GB installed RAM
Open file Time [ms] RAM [MB]
asammdf 3.5.1.dev mdfv4 62219 4335
asammdf w BytesIO 3.5.1.dev mdfv4 31232 7409

Skip XML parsing for MDF4 files

MDF4 uses the XML channel comment to define the channel’s display name (this acts as an alias for the channel name). XML pasring is an expensive operation that can have a big impact on the loading performance of measurements with hihg channel count.

You can use the keyword only argument use_display_names when creating MDF objects to skip the XML parsing. This means that the display names will not be available when calling the get method.

Using a test file that contained ~36000 channels the loading times were

Open file Time [ms] RAM [MB]
asammdf 3.5.1.dev full mdfv4 use_display_names=True 6086 335
asammdf 3.5.1.dev low mdfv4 use_display_names=True 5590 170
asammdf 3.5.1.dev minimum mdfv4 use_display_names=True 4694 61
asammdf 3.5.1.dev full mdfv4 use_display_names=False 2020 328
asammdf 3.5.1.dev low mdfv4 use_display_names=False 1912 163
asammdf 3.5.1.dev minimum mdfv4 use_display_names=False 966 59