Discussion:
very slow write to text file
(too old to reply)
Fredrik Kronhamn
2008-05-19 14:01:03 UTC
Permalink
I am using a quite simple m-file to process numerical data
from a 200Mb+ (3M lines) text file. Values (text) are being
read from a source file, some calculations are done, and the
results written to a target text file. I have been
experimenting with different alternatives for reading the
in-data, and found a quite comfortable and speedy (20000
lines/sec) solution using the textscan function.
Unfortunately, writing the out-data is utterly slow
regardless of what function I am using (fprintf, dlmwrite,
etc). Process Monitor (SysInternals) reveals that MatLab is
writing chunks that are only between 9 and 13 bytes long,
which is disastrous for the performance of my script. The
infile is however read in 512 bytes chunks. Does anyone know
a way of boosting performance for file i/o? I am using
MatLab 7.1
Ken Campbell
2008-05-19 16:13:01 UTC
Permalink
"Fredrik Kronhamn" <***@ioq.uni-
jena.de.removethis> wrote in message
Post by Fredrik Kronhamn
I am using a quite simple m-file to process numerical data
from a 200Mb+ (3M lines) text file. Values (text) are
being
Post by Fredrik Kronhamn
read from a source file, some calculations are done, and
the
Post by Fredrik Kronhamn
results written to a target text file. I have been
experimenting with different alternatives for reading the
in-data, and found a quite comfortable and speedy (20000
lines/sec) solution using the textscan function.
Unfortunately, writing the out-data is utterly slow
regardless of what function I am using (fprintf, dlmwrite,
etc). Process Monitor (SysInternals) reveals that MatLab
is
Post by Fredrik Kronhamn
writing chunks that are only between 9 and 13 bytes long,
which is disastrous for the performance of my script. The
infile is however read in 512 bytes chunks. Does anyone
know
Post by Fredrik Kronhamn
a way of boosting performance for file i/o? I am using
MatLab 7.1
I probably won't be of much help but can I try to clarify
the problem. Are you trying to write the entire output file
in one go, or are you continually appending to it as you
work through the input file?

Ken
Fredrik Kronhamn
2008-05-20 13:56:02 UTC
Permalink
I have elaborated different solutions, i.e. dumping an
entire array with dlmwrite, linewise output with fprintf,
and buffering the output with fprintf. Matlab is however
refusing to write larger chunks of data independent of what
approach I am using.

Maybe it is possible to do some low-level hacking (i.e.
registry settings) to change this behavior?
Derek
2008-08-05 19:32:02 UTC
Permalink
"Fredrik Kronhamn"
Post by Fredrik Kronhamn
I have elaborated different solutions, i.e. dumping an
entire array with dlmwrite, linewise output with fprintf,
and buffering the output with fprintf. Matlab is however
refusing to write larger chunks of data independent of what
approach I am using.
Maybe it is possible to do some low-level hacking (i.e.
registry settings) to change this behavior?
I had the same problem. Its ridiculous, but my solution was
to write a mex file, which turned writing a 3x900000 matrix
from taking 30 seconds to 2 seconds. The time hog in the
matlab fprintf seems to be in the string formatting, since
sprintf is about as slow as fprintf.

I'll include my mex below, but it is not very general. You
supply the filename, the way to open it ('a' or 'w'), the
data, and the number of elements per text line. It will
write each element in '%g' format. Revise as needed (and if
you make a more general one, I'd be happy to get it:
d+hoiem+at+uiuc+.edu (without the +)).

- Derek

----------

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const
mxArray *prhs[]){

int strlen;
char* filename;
char* opentype;
double* data;
int ndata;
int n;
FILE* fid;
int linelen;
int k;
int status;

/* check for the proper no. of input and outputs */
if (nrhs != 4)
mexErrMsgTxt("4 input arguments are required: filname
fopen_mode data linelen");
if (nlhs>0)
mexErrMsgTxt("Too many outputs");

// Read Inputs
strlen = mxGetN(prhs[0])*sizeof(mxChar)+1;
filename = (char*)mxMalloc(strlen);
mxGetString(prhs[0], filename, strlen);

strlen = mxGetN(prhs[1])*sizeof(mxChar)+1;
opentype = (char*)mxMalloc(strlen);
mxGetString(prhs[1], opentype, strlen);

ndata = mxGetNumberOfElements(prhs[2]);
data = (double*)mxGetData(prhs[2]);

linelen = (int)mxGetScalar(prhs[3]);
fid = fopen(filename, opentype);

if (fid==0)
printf("Failed to open %s\n", filename, fid);
else {

for (n=0; n<ndata; n+=linelen) {
for (k=0; k<linelen; k++) {
/*printf("%d\n", n);*/
fprintf(fid, "%g ", (double)data[n+k]);
}
fprintf(fid, "\n");
}
fclose(fid);
}

mxFree(filename);
mxFree(opentype);

}
Balint
2011-12-08 15:20:08 UTC
Permalink
Post by Fredrik Kronhamn
I am using a quite simple m-file to process numerical data
from a 200Mb+ (3M lines) text file. Values (text) are being
read from a source file, some calculations are done, and the
results written to a target text file. I have been
experimenting with different alternatives for reading the
in-data, and found a quite comfortable and speedy (20000
lines/sec) solution using the textscan function.
Unfortunately, writing the out-data is utterly slow
regardless of what function I am using (fprintf, dlmwrite,
etc). Process Monitor (SysInternals) reveals that MatLab is
writing chunks that are only between 9 and 13 bytes long,
which is disastrous for the performance of my script. The
infile is however read in 512 bytes chunks. Does anyone know
a way of boosting performance for file i/o? I am using
MatLab 7.1
I have had the same problem and found a different cause and thus different solution: for me dlmwrite took ages to _start_ writing. Thus I simply used fprintf which does not wait and therefore runs about ten times faster.
Just in case others have the same issue..
Balint
2011-12-08 15:51:08 UTC
Permalink
Post by Balint
I have had the same problem and found a different cause and thus different solution: for me dlmwrite took ages to _start_ writing. Thus I simply used fprintf which does not wait and therefore runs about ten times faster.
Just in case others have the same issue..
Have to correct myself: the problem was actually opening the file, so fopen took just as long as dlmwrite. The reason was some windows security nonsense, I simply decided to write everything in one file and that solved it.
Loading...