I am doing some research into tools for a new startup company. I know that we need a good relational database and what it needs to be able to do, however, I do NOT know what database(s) will fit the criteria. I did some research and know that SQL does at least some of what we need. Could I get some advice? Here are my criteria:
1. Our product is a standalone, end user, desktop product. Therefore, we need a database with a desktop engine that we can install with the software.
1.1 Ideally, this desktop engine wouldn't be TOO big.
2. We need to be able to run very fast full text boolean or heuristic boolean queries on a large number of fields.
2.1 Ideally, these full text queries have, or could use, a thesaurus to do fuzzy searching.
2.2 Ideally, new entries could be added to the thesaurus at runtime.
3. The product may be used collaboratively by large organizations and may index a large amount of data per user, therefore, it needs to have no functional upper limit to the file.
4. I would like to be able to run a query that says, in effect, "For column x, what are the most frequent values in column y for each unique value in column x." OR "Given this value for column x, what is the probability for this value in column y"
5. If 4 isn't possible, I need to be able to query the counts of values in column y for each unique value in x, so I can calculate the information anyway.
6. The desktop engine needs to be able to either work with two databases at the same time, or have a subset inside the database disconnected from the rest.
7. I need to be able to add new columns to the database at runtime so the user can store a new kind of data if they want.
8. With the exception of Full Text queries, the database doesn't need to be very fast.
9. It is important that the database NEVER EVER loose the user's data.
I know that this is are really weird set of criteria. I don't know much of anything about relational databases (I'm guessing that this is obvious by now) but I do know what I need to be able to do.
Anyway, does SQL fit the bill? Is there a better database to use for this?
JohnSounds like a job for SQL Server Express
http://www.microsoft.com/sql/editions/express/default.mspx|||Thanks for the reply. I took a look at the link. The 4GB limit on the database size is a serious issue. One of the important functions of the database will be to retian a history so we won't be deleting old items. Information will also be getting extracted from a large number of sources so I don't think this is enough. For many users yes, but not for enough of them.
Does SQL match the other necesary requirements? For example, does SQL do fuzzy searching? Can I run the type of multi-field statistical queries that I was talking about?|||The free version has a 4 Gb limit, there are a number of other choices that Microsoft's marketing team can discuss that take the limit to 16 petabytes... I don't think any vendor supports any way to create a drive that large yet.
SQL Server can do full text indexing, which supports several kinds of fuzzy search. This isn't trivial for the user to query manually, but it is easy for a developer to write code that will construct the necessary queries.
The statistical analysis is easy to code, but not built into the product itself (but think about it, I don't know of any way for a product to support that kind of query without a relatively sophisticated insight into what the user wants to know).
In short, yes, with the help of a moderately talented programmer SQL Server can do everything you need and more. I don't know of any database product that can do what you want "out of the box", but most of the major players (Microsoft, Oracle, IBM DB/2) can get there with a little help from a programmer.
-PatP|||Did I miss something here?
Standalone desktop product...
Never deletes data...
4 GB might be a problem...
Must never, ever lose user data...
These sound like conflicts in requirements. Either:
a) you want a client/server design; or
b) you want a standalone desktop product
If a), you're not going to meet requirement 1 (standalone desktop product).
If b), you're not going to meet the other 3 requirements (plus I would think that performance would be a pig in a poke)
Just my $.02.
Regards,
hmscott|||If you don't want to pay, but still get all the features you need, go for Postgres. It has a small footprint so you can use it as a "Desktop Engine" surrogate, but is scales well to any size you want on the server. You won't be able to start it in embedded mode though. You will need to install it as a service with its own account, but apart from that it will do everything you requested.
Postgres is especially very well founded when it comes to preserving your data. It has a lot of concepts that ensure that you don't lose your data.|||I have spent the whole day looking at this from every imaginable angle. The insights here have been really valuable.
shammat, I'll look into the postgres option. Does anyone know off hand if this is this licensable for commercial use?
hmscott, thanks for your input. This brings up a design decision that will have to be carefully evaluated.|||shammat, I'll look into the postgres option. Does anyone know off hand if this is this licensable for commercial use?Yes, absolutely free. Quote from their FAQ:
"PostgreSQL is distributed under the classic BSD license. Basically, it allows users to do anything they want with the code, including reselling binaries without the source code"
http://www.postgresql.org/docs/faqs.FAQ.html#item1.3|||I think you should check this out
http://www.firebirdsql.org|||I did some looking. I think Postgress is the best match for what we need because it will allow full text fuzzy searching and such. The lack of full text search capabilities was a critical problem with Firebird when I looked at it. Other than that however, Firebird WAS a better match because it was smaller and supported embeding the DB into the software. I'll just have to live with running it as a service. Thanks anyway for the suggestion wilsonchew, I was almost convinced.|||1. Our product is a standalone, end user, desktop product. Therefore, we need a database with a desktop engine that we can install with the software.
I confused....:S|||I confused....:S
So was I. I walked into this part of the planning knowing very little about relational databases.
Essentially we were looking for something that didn't require a supercomputer to run and wouldn't slow things down much if installed on the end user's computer. Most databases are server based and accessed via a client. Our database is local to the user's computer and is not coupled to any server.
After a lot of searching, there were only a few databases that looked like they could be installed like this without messing things up. There is SQL Express, but that is not sufficient because of the 4 GB limit. Firebird, which unfortunately doesn't do the full text search that we need. MySQL and Berkley DB both looked like they could do this but they had other issues. Postgres installs in under 100MB and can run as a service in the background. It shouldn't hurt performance of the machine too much and has all the abilities we need.
I hope you are less confused now? If not, I worry that this means I am still confused; and, I don't feel confused.|||Have you looked at this information?
http://www.ibexpert.info/ibexpert_fts/documentation/index.html|||Have you looked at this information?
http://www.ibexpert.info/ibexpert_fts/documentation/index.html
I just glanced at this but it doesn't look like it would fit our specific needs. The users of this product will have the database on their computer but will need to be completely insulated from it's existance since 95% of them won't have a clue what to do with it. It looks like the system suggested would have to be set up by hand.
Also, using this system wouldn't buy anything since it would be a service running in the background. The background service is my only remaining complaint about the PostgreSQL solution.
Thanks anyway for the input.|||The background service is my only remaining complaint about the PostgreSQL solution.
Maybe you are interested in this: I have my Postgres service up and running all the time, and if I don't use it, the service consumes about 20MB memory (but I'm sure this can be tweaked as I have increased most of the default parameters. As long as the DB is not accessed, I have never seen it consume CPU. My PC is a 1GHz AMD Athlon
I don't know if harddisk space is an issue for you, but PG does require more harddisk space than Firebird (due to the WAL which is basically used for recovery)|||Like I said, kind of a bummer to have a service running all the time. Thanks for the numbers. It is considered bad behavior to stop the service when our program gets closed? It doesn't take too many 20MB services to bog down a computer if the user buys the minimum 265 MB RAM.
With the kind of computers available today, I would say that HD space is a concern, but not a big one. Is it possible to optionally turn the WAL on or off? If not it's ok. Many users will probably never use a whole ton of HD space with this thing. Those that do probably had a huge computer already(Just because of what it would take to use the space.)|||Like I said, kind of a bummer to have a service running all the time. Thanks for the numbers. It is considered bad behavior to stop the service when our program gets closed?From my point of view, I'd consider this a very good behaviour. And actually MS is doing that as well for some services (start them when needed).. I'm not sure what privileges an account must have in order to be able to start and stop a service. Might be a problem for users running with restricted privileges.
Btw: starting and stopping the service is a matter of seconds (maybe 2 or so, depending on the hardware)
Is it possible to optionally turn the WAL on or off?No, as this is a major cornerstone of their data safety.
No comments:
Post a Comment