rkoziol7

Distinguished
Sep 2, 2010
3
0
18,510
Hello all,

I have some python scripts which reads some large txt files and for each line of these files do some DB table read (MySQL database). The scripts are added to crontab and there are several of them and the files are really huge so the DB has a really great number of reads each minute. I discovered that the performance of DB is much worse lately.

My question is: would it be more efficient to make only one DB read per each file? Read few thousands of rows at one time and then find with python language the appropriate row for each line? Would is be faster than make one read per one file's line? Is this a kind of a good practice or not?

Kind regards,
Pawel
 

randomizer

Distinguished
What sort of queries are you trying to run? It's generally more efficient to run a single complex query that returns all of the data you need than lots of small queries. It may also be worthwhile writing a stored procedure that does the heavy lifting for you. These can often (but not always) end up faster once they've been run once because the DBMS has a precompiled execution plan for each subsequent call and it only needs to substitute the parameters.
 

rkoziol7

Distinguished
Sep 2, 2010
3
0
18,510
Hello,

Thanks for your reply. My queries are just a simple dictionary queries like 'select id, name from table_name where id=something' but the problem is that I need a lot of them. I think I will invoke some query like 'select id, name from table_name' without any conditions and map the result to some Map object. Then I will look for the particular value in this Map object using id as the key.

Kind regards,
Pawel
 

Ijack

Distinguished

That would, I think, be far more inefficient than letting the database do the work. Database engines are optimised to do queries like this (apart from the fact that you would be transferring large quantities of data that you don't need. Look at stored procedures and make sure that you have the appropriate indexes on your tables.
 

randomizer

Distinguished


I'd definitely make a stored procedure then. You will simply need to pass it a single parameter which is the id that you are filtering on. SQL queries are prone to syntax errors, so it's best to have one rather than hundreds of them. And, as I mentioned above, it allows the DB engine to build an optimised execution plan that it can reuse. Make sure that your id column has an index applied to it as ijack said as well. Indexed tables are a bit slower to insert into and update but much faster to filter and read from.

Let the DB do the heavy lifting and let you application concentrate on using the resulting data and not how to get it.
 

TRENDING THREADS