User login

MetaTraffic 2.0 Boxshot

Automatic Truncation of Old Records

Posted by pbyron [June 20, 2003]

Since the current version of the product create an infinitely growing database, reaching hosting quotas for high-traffic sites is a problem.

I would recommend adding the ability to set a truncation point based on X months of data. This would allow the admin to set the tracking to keep the last X months of data only, and truncate the rest.

In turn, I wouldn't have to keep so close an eye on the size of the traffic databases for my clients. Currently this process is a manual one (assuming you know how to write SQL or manipulate databases).

Thanks!

Parker Byron
www.ISGWebHosting.com

Automatic Truncation of Old Records

Good idea Parker,

I agree with you that this would be a nice feature.

Knowing enough VBscript and ASP to have some "fun", I was successfull at adding functionalities to the script I have.

However, automating the deletion is something a bit difficult to do, at least to my point of view. The scheduling aspect of it is the turing point.

The main question is : "When will this cleanup be done and how will it be invoked ?". Setting the scope to delete is not much of a problem, however automatically delete records is not easy.

Also, when the records will be deleted, a compaction of the database is mandatory to regain the disk space. Even if the compaction doesn't take long to do, you simply can't take for granted that it can be done blindly, therefore the logging should be disabled first, then go ahead with the compaction. Once done, the logging can be resumed.

If you don't play safe, you may end up with damaged databases, or unhappy users that got a "mysterious" error because the tracker failed to access the database.

With some thoughts and ideas from the different users of that script, we sure will come up with something interesting, meantime we have to do it pretty much the "old fashioned" way, unless of course you're running it against an MsSql database.

Regards.

Automatic Truncation of Old Records

Another method for saving space is to compress some of the data upon creation. (Ok, not a compression as you would normally think... but more like a conversion)

For example, it typically takes 12-15 characters in a database to store an IP address as a string. Yet it only takes 4 bytes to store it as a long integer. That's a savings of about 10 bytes (about 70%) per database record. If you are storing 5,000,000 IPs in your database, that would be a savings in disk space of about 50MB. In addition, it is faster and more efficient for your database to work with integers than strings. The same could be said for the storage of the Date.

There is an aweful lot of wasted space in the database at the moment. With storing the information more efficiently and a few lines of code to convert it to/from the database... people might not have as much of a space issue with their ISP.

TechStud,
www.TechStud.com

Automatic Truncation of Old Records

One of the goals of MetaTraffic v2.0 will be to optimize the database's size and performance. This will include optimizing the database fields in the pagelog table and moving the detection data to separate summary tables. This should significantly reduce the size of the database.

Regards,
~Chad

Automatic Truncation of Old Records

quote:Originally posted by cdegroot
One of the goals of MetaTraffic v2.0 will be to optimize the database's size and performance. This will include optimizing the database fields in the pagelog table and moving the detection data to separate summary tables. This should significantly reduce the size of the database.

Regards,
~Chad

Chad, I didn't mean to make my post sound like you weren't working towards that. It came out a bit more harsh than it was intended to. :oops:

BTW, you have done an Excellent job on this project... Keep up the great work.

TechStud,
www.TechStud.com

Automatic Truncation of Old Records

quote:Originally posted by cdegroot
One of the goals of MetaTraffic v2.0 will be to optimize the database's size and performance. This will include optimizing the database fields in the pagelog table and moving the detection data to separate summary tables. This should significantly reduce the size of the database.

Regards,
~Chad

Chad,

At first glance, there isn't much more data to gather, or is there ?

As far as the storage, it is a "big" eater, but with some fields re-definition, that may help.

I personally customized my v 1.2 and the IP addresses are now stored as numbers

On the other hand, there is a bit of redundancy in the information stored in the database, but that is part of the development process.

"Compressing" string might be good, however I'm a bit suspicious about the overhead this would induce at the server level when time will come to "de-compress" the data and render the report/reports.

Time will tell I guess.

Finally, I aggree 200% with TechStud. You did a fantastic job with this project/product and we know for a fact that you have the potential to create high quality utilities.

Best regards.

Automatic Truncation of Old Records

Hi,

OK, I will keep this in mind. I have some code that I have written for country detection which uses IP numbers instead of IP addresses so I will incorporate that for the v1.3 release. (Not the country detection, just the space optimization.)

I am investigating more space saving options without radically redesigning the db schema for v1.3 and will try and do all I can to optimize space for the next minor release.

Thanks for the feedback all. Keep it coming.

Regards,
~Chad

Automatic Truncation of Old Records

Hi,

Actually a minor change to what I said previously. The 1.3 version will not have the IP Address change in it. Although I have made some other more simple modifications to the database which can result in a much more significant reduction in database size.

Regards,
~Chad

Automatic Truncation of Old Records

quote:Originally posted by cdegroot
Hi,

Actually a minor change to what I said previously. The 1.3 version will not have the IP Address change in it. Although I have made some other more simple modifications to the database which can result in a much more significant reduction in database size.

Regards,
~Chad

Good show Chad!

Btw, you mentioned in another post that few bugs were fixed in version 1.3. Have you kept a log of the changes you did ?

I know that I shouldn't have modified/customized version 1.2 to the extent I did, but I couldn't help it.:D

If the changes between the releases and/or versions are clearly identified, this could minimize the application of the fixes to a customized version if/and when applicable.

For instance, I created 2 class files, one of which is used for the reporting, and the other is for the logging. I trimmed down the logging version to the bare minimal, thus improving substantially the time required to load the pages being tracked.

Having to "#include" over 200KB worth of code on every single asp/htm page being tracked was simply overkilling the system where I'm hosting the site.

Can't wait to see version 1.3 and the improvements on the storage. This will be a plus for sure.

Regards.

Automatic Truncation of Old Records

Unfortunately I don't have a detailed change log. Although I guess it would be possible to do a diff of the class and other files from version 1.2 to 1.3, indicating where the changes were.

The database optimizations are quite simple. I dropped the pl_languageactual table and now have the language reports keyed off just the pl_language table. Also, it no longer populates the pl_referrerdomain, extension, or url fields with referrer data that matches hosts listed in the strSiteAliases configuration variable. The upgrade data process removes this data as well from past records.

Regards,
~Chad

Automatic Truncation of Old Records

quote:Originally posted by cdegroot
Unfortunately I don't have a detailed change log. Although I guess it would be possible to do a diff of the class and other files from version 1.2 to 1.3, indicating where the changes were.

The database optimizations are quite simple. I dropped the pl_languageactual table and now have the language reports keyed off just the pl_language table. Also, it no longer populates the pl_referrerdomain, extension, or url fields with referrer data that matches hosts listed in the strSiteAliases configuration variable. The upgrade data process removes this data as well from past records.

Regards,
~Chad

No problems Chad,

I'll take some time to look at the code and compare the things.

Thanks for pointing it out.

Regards