Best way to structure a database for a large/static dataset
There can be a universal model, each record.
for e.g:- suppose u have a answer model,it greatly depends on the needs of application
@answers = Answer.find(...) @answers.text_answers //for text answers @answers.boolean_answers //for boolean answers
Or may be u can have like this:-
@answers = Answer.find(...)
@answers.each do |a|
So it purely depends on the need and the semantics u would like to use in database.
Off the top of my head, I think your idea of having a big table with all the responses is on the right track. To reduce some of the redundancy of saying that person 123 took the survey in 1989 over and over for each response on it, you could have a table of, say, Takings, that would represent all the data about a given person having taken the survey. You may have many other such opportunities to extract things out of that table and into a much smaller one.
As for the variability of the answers, you can have a type column that will tell the app which of several supplementary tables to look in -- one for booleans, one for strings, one for dates, etc. Yes this introduces some complexity and inefficiency of operation, but at least the data will be reasonably normalized and efficient to store. Your Response class could have a method like "answer" that would check which table the actual answer is in, then go retrieve it.
I'm a rails newbie and am looking to create an app to work off a very large dataset. The dataset contains survey data from 1980 through 2011 for about 8,000 respondents. Each year's survey has between 900 and 7000 variable associated with it.
Basically, the app will be used to view all information for one respondent (seeing all the years at a glance), compare respondents, etc.
It seems like creating a table for each year isn't necessarily the best strategy. What would be a good way to structure this data for the database? I've considered just having a table that's basically the respondent ID, survey year, variable name, and value name. The difficulty there is that the values would range from simple boolean fields to fields with extensive amounts of text, so standardizing that field would be tricky (and probably very inefficient).
Any thoughts/tips/resources on where to turn?
Thanks in advance.