Discussion:
reading tables in matlab (big data)
(too old to reply)
filipa
2018-03-23 17:11:46 UTC
Permalink
Hello

We have multiple matlab tables in .mat files with an aprox. size of 10 GB.
Our problem on matlab r2017a is that it takes about 30 min to load each of this tables to the matlab workspace, which is a lot considering that we only need the data from a few rows of those tables.
Our goal would be to extract those variables without loading the complete files.
What are our options?
thank you all
dpb
2018-03-23 18:09:24 UTC
Permalink
Post by filipa
Hello
We have multiple matlab tables in .mat files with an aprox. size of 10 GB.
Our problem on matlab r2017a is that it takes about 30 min to load each of this tables to the matlab workspace, which is a lot considering that we only need the data from a few rows of those tables.
Our goal would be to extract those variables without loading the complete files.
What are our options?
thank you all
A) doc matfile % not 100% flexible, but worth investigating

B) redefine the data storage to not be "all in one" -- of course, if
your access needs are random across the whole file that may not be of as
much benefit as if can partition it somewhat to prevent that. Still, it
may be much faster to read multiple smaller datasets than one large one.

C) rethink what your data structure is; not enough info here to even
guess on specifics, but complex data objects, while convenient for
manipulation have much more overhead than does the native double array
and it may be more efficient overall to recreate the structure from data
as needed rather than save it.

--

Loading...