Back at ya with another Programming Challenge. OK, this one is much more complicated than the first. And with this one you'll have to find the "best" answer, so it's not about who can write the code fastest. If this challenge looks too wordy for you, stop reading now!
The idea is simple. It's based on the idea that there are hidden codes throughout the Bible (Google for 'Bible Code' for more info on that). You take a sample text and try to find hidden codes (words and phrases) inside the text.
For this challenge, you take the attached CNN story and I want you to develop a simple function that finds the most number of hidden words. What hidden words are you looking for? Any word that is in the text itself. So you can find 'the' and 'century' and 'hub' and 'city', etc. That is your dictionary. Throw out all punctuation and numbers to make it easier.
How can you find words? Simple. Take any combination of letters in order throughout the text. The only rule is that the letters cannot be next to each other (including spaces). Example: I pick the first 't', then the first 'h' after that (but not the one right next to the t), then the first 'a' after that, then the first 't' after that and I have the word 'that'. And that word is in our dictionary. Does that make sense?
This actually might be kind of tough! Hehe. Hints? I'll make a separate post for that.
Quote:
The Boston of a century ago was a beer-brewing hub to rival Midwestern suds capitals such as Milwaukee, St. Louis and Chicago. Research by local historians has turned up evidence of 31 operating breweries inside the city limits in the late 19th century.
A city better known for its baked beans and clam chowder had the greatest number of breweries per capita at the time, said Michael Reiskind, a Boston historian who has researched Boston's brewing history.
But few visible reminders of the industry remain -- except a two-decades-old beer named after one of the city's fieriest Revolutionary War patriots. The Samuel Adams Brewery Tour and its Boston Beer Museum as the chief repository of the city's brewing history -- a place to learn about the past and toast the legacy by downing some sample Sam Adams varieties.
"There are empty remnants, and some old warehouses left behind, but nothing else really interesting like the Sam Adams Brewery to walk around and really get a feel for what it was like," Reiskind said.
Modern Boston brewers like Sam Adams, founded in 1984, and more recent arrivals including the Harpoon Brewery and Tremont Ale in nearby Waltham "brought back a whole local industry of brewing" that began to die out during Prohibition and the subsequent consolidation in the U.S. brewing industry, Reiskind said.
The Sam Adams tour, which is free with a $2 donation to charity encouraged, promotes the city's best-known contemporary beer brand. It also recaptures some of the city's brewing past through historical displays, memorabilia, and even a 30-foot-long glass-lined brewing tank that has been cut open and placed on its side to walk through.
Included is a section on Sam Adams, who brewed beer for a time before fueling some of the political ferment that preceded the Revolutionary War. Jim Koch, a sixth-generation brewer whose family has brewing roots in St. Louis and Cincinnati, borrowed the Sam Adams name when he worked off an old family recipe to create his modern brew.
One of the first major Boston breweries was the Boston Beer Co., a defunct brand chartered in 1828 that now is the namesake of the corporate entity behind Sam Adams Beer. The company develops new brews at the Sam Adams Brewery but brews most of its product elsewhere.
The brewery is in Boston's Jamaica Plain section, which, along with neighboring Roxbury, was home to about three-quarters of Boston's breweries in the late 19th and early 20th centuries.
The brewers were drawn to the then-inexpensive land in the Stony Brook Valley and also to the Stony Brook Aquifer, which no longer boasts the pure water of a century ago.
Most of the old brewery structures still standing have been turned into warehouses and storage, and they're hard to find tucked away amid a maze of narrow streets once populated mostly by immigrants from countries like Germany and Ireland. The German legacy is easy to see in Jamaica Plain's streets, which bear names such as Beethoven and Bismarck.
Irish and English immigrants produced hearty ales in Boston, while Germans favored lagers, Reiskind said. Some of the brews shared partnerships with Boston's professional baseball teams, such as the Burkhard Brewery's Red Sox Beer and Pennant Ale. The Pfaff brewing family cut a deal with the former Boston Braves, Reiskind said.
Prohibition forced many of the breweries to either shut down or convert to other products like soft drinks. By the time the ban on alcohol consumption was lifted in 1933, many small breweries were unable to survive as industry consolidation took hold, Reiskind said.
At 30 Germania St. stands the Samuel Adams Brewery, which occupies several structures in a complex of two dozen buildings dating to the 1870s that once housed the Haffenreffer Brewery.
The Haffenreffer was the last of the old Boston breweries to shut down when it ceased operations in the 1960s, two decades before Koch set up shop there and reinvigorated the local industry.
Sam Adams beer fans come from far and wide to take the brewery tour, which Koch figures has drawn as many as 300,000 visitors since 1988. They come even though the brewery is hard to find and the tours are offered just six times a week most of the year.
"To me, they're pilgrims," Koch said. "If they find their way here, we want them to have a good experience, learn something and enjoy some great beer."
No, like in his example when he got "that", he would've picked out these letters:
Quote:
The Boston of a century ago was a beer-brewing hub to rival Midwestern suds capitals such as Milwaukee, St. Louis and Chicago. Research by local historians has turned up evidence of 31 operating breweries inside the city limits in the late 19th century.
So he found "T", "h", "a", and "t" -- but without using letters that were right next to eachother (i.e. there's a "th" in the first "The", but he didn't use that "h" because it was right next to the "t" that he already got).
I don't get how you could even make a halfway logical algorhythm to pick out random letters to form words though (or how these people came up with those Bible Codes. It's like the crazy scientists recently who say they discovered a "God Gene" saying religion is genetic. Random ideas....) _________________ Current Site (2008) http://www.cuvou.com/
Joined: 27 Jan 2005 Posts: 109 Location: Lewiston, ME
Posted: Wed Mar 23, 2005 12:59 pm Post subject:
The people that came up with the bible codes used a step variable. IE, each letter was the same number of characters apart from the next.
Majove's only rules are that each letter must come after the one before it, and that they cannot be right next to each other. Not having a step variable is going give many more combinations of letters.
You will most likely be able to find hidden mesages within your hidden messages.
Joined: 06 Jan 2004 Posts: 562 Location: Netherlands
Posted: Wed Mar 23, 2005 2:19 pm Post subject:
Backtracking is the solution, I think.
The general problem is that (What I understood from Moj's post) you pick a letter, then pick the next possible letter, etc. The only limitation is that the letters may not be next to each other (In the example, you pick "t", then "e". You skip "h" first because its next to "t"). This already creates a "step" of 2 letters.
So, logically, you would need the following variables: - You have a letter index "I" which denotes the index of the first letter - There is an index "J", which starts at I+2 and is increased each loop - An array (preferably hash to use "exists" on) with all the words (without punctuation!).
Now the actual backtracking solution. First you grab the letter on index I, in the post example it would be the letter "T". Then you grab letter J, which is I+2 = "e". To prevent useless looping you will now need to check if there is a word "te" in your text. Now, there are two scenarios:
- It exists, you add it to an array or something similar (string is fine too). You reset J to I+2 and start over again, preserving the previous stirng "te". Another "e" is added to it in this case (We now have "tee"). Repeat the previous check (existance in the "dictionary") until...
- It does not exist. If J is I+2 then you should add TWO to J (Remember, the letters may not be next to each other!). Otherwise, add one to J. The index is now "4". The 4rd letter in the CNN story is "o" (without spaces). Let us imagine that "te" was a real word in the story, and "tee" was not. Now, take off the last e, as the word didn't existed, and add the o. We now have "teo". Go back to step 1 and check if it exists.
If you hit the end of the string (J > length(story)) you should reset the entire search string, add one to I and start the process all over again. Repeat until I also hits the end of the string and youre done.
I hope I explained it correctly. Either way, this will cost a lot of CPU time to try and find all possible matches of words ("Hidden messages") in a story that long.. I don't think it would finish in a day or two.. Have fun
Edit: There's a bit of a flaw in my explanation, I think, but it's definitely backtracking alright.
As malefactor said, the Bible codes were found using a step technique. For example, in:
Quote:
31:28 And hast not suffered me to kiss my sons and my daughters? thou hast now done foolishly in so doing.
If you start at the letter R in 'daughters', then skip 3 letters to get O from 'thou', etc, you will get the word ROSWELL (a town in New Mexico, US, where supposedly an alien UFO crashed). See the attached image.
For my challenge, I thought the text was too short to get anything meaningful out of using exact steps, so I opened it to random steps.
Siebe definitely has an algorithm, though it is the brute force approach. He's right, this will take a lot of CPU cycles. But if you think about it, there is a better method. Check out my hints post. The hints aren't all that revealing, but all I can say is that there is a better, faster method, given my challenge constraints.
Or you could make an array of words, and when it gets to the first letter it starts a first word.... it can't use the second letter in that first word so it starts a second one... and so-on. When it gets to a letter that makes sense in a word (like "t + h" it adds that to the "t" word). _________________ Current Site (2008) http://www.cuvou.com/
According to my algorythm, there are 54801 hidden words in this story, allthough i only count a 'word' to be at least 3 letters long. I had no intrest in finding smaller words like 'a' and 'as', since that doesn't make much sense here, besides the numeral extensions 'th' and 'st' would have to be filtered out manually then.
Like described above, i create and index a little dictionary of the words in the story, and use it to find the hidden ones. The code only takes a few seconds to execute on my humble duron 800mhz, so i suppose it needs little optimalization.
Here is my proposed solution:
Code:
#!/usr/bin/perl<br /><br />use strict;<br /><br />open( F, "< $ARGV[0]" ) or die "can't open file: $ARGV[0]";<br />my @text = <F>;<br />close F;<br /><br /># let's get rid of punctuation and end-of-lines<br />my $text = lc ( join( " ", @text ) );<br />$text =~ s/[^a-z ]/ /g;<br /><br /># and initialize our dictionary<br />my @words = split / /, $text;<br /># little perl magic to find our uniques...<br />my %done = undef;<br />@done{@words} = ();<br />@words = sort keys %done;<br /># now let's 'categorize' them by their first<br /># letter to speed up searches later, 2 letter words<br /># or less are skipped, since they aren't intresting<br /># in this context.<br />my %dict = ();<br />foreach my $word ( @words ) {<br /> if(length( $word ) > 2) {<br /> push @{$dict{substr( $word, 0, 1 )}}, $word;<br /> }<br />}<br /><br /># now lets do the big work<br />$text =~ s/ //g;<br />for(my $i = 0; $i < length( $text ) - 1; $i++) {<br /> # let's go by each letter in the text and<br /> # get the words that start with it<br /> foreach my $word (@{$dict{substr( $text, $i, 1 )}}) {<br /> my $cindex = $i+1;<br /> my $found = 1;<br /> # shall we see if we can make the word from<br /> # the remaining letters, keeping in mind they<br /> # may not reside next to eachother in the original<br /> # text.<br /> for(my $j = 1; $j < length( $word ); $j++) {<br /> if( ( $cindex = index( $text, substr( $word, $j, 1 ), $cindex ) ) == -1 ) { $found = 0; last; }<br /> $cindex++;<br /> }<br /> if( $found ) { print $word . "\n"; }<br /> }<br />}
I have attached the output of the command as a reference.