IDKSM Search Engine
Forget the explaination just give me
the file: download it
here.
Wait a minute let me see a
demo first.
Now
in:
Spanish - Danish - Dutch - French -
Slovenian
|
What is it?
The IDKSM* Search Engine is
a full text search engine that was designed for use on
CD-ROMs and Intranets. The engine consists of 2 parts.
- The Indexer which is used to process the
HTML files and place the output into a database.
- The runtime engine which is used to read that
database and present at hit list that matches the
keywords you are searching for.
Currently 4 version of the runtime engine exists.
- One written in Java 1.0.2 which has been tested and
works on the following:
- Netscape 3.0
- Netscape 4.x
- Internet Explorer 3.0
- Internet Explorer 4.0
- HotJava 1.0
|
- A second version of the engine is written for use in
Delphi. It consists of a simple .dcu file that you just
need to compile along with the rest of your code. Delphi
1 & 3 version are available.
- The third version is a DLL. A 16-bit and 32-bit DLL
are included. Any applications that can use a DLL can now
include a Full Text Search Engine.
- The fourth version consists of a Windows Web Server
CGI application and a Java Servlet. The Windows CGI
should work with any Windows based web server. The
Servlet has been tested with Apache but again it should
work with any web server that deals with servlets.
Sample code for each of the engines, except the CGI
versions, is included in the release.
The Indexer
Fig. 1 - The Indexer Window
The Indexer is the shareware product. The version you
find in the release is fully functional but it will only
process 10 files. Once you register the product you can
process as many files as you wish. I have successfully
created projects with 2,000 html files.
Registration is a simple matter,
see the bottom of the page for instructions.
As of the 1.3 release you can now index html and ASCII
files. You can also instruct the indexer to link to any
file. This means that you can actually index the content
from ANY type of file and have the database point to
the actual file that has that content. For example. You
could take the narration found in an audio file an place
that in a text file. Run the Indexer against the text file
but have it point to the audio file. When the user does a
search for specific content the audio file will be indicated
as having that content and the user can then elect to listen
to the file. Any file type that a browser can handle the
indexer easily work with.
Delphi Runtime Engine
Fig. 2 - Sample Delphi Search Engine Dialog
Figure 2 represents a sample of how the Delphi Search
Engine Dialog can look. This is included in the release. You
can modify this or use it in any fashion you wish. The
engine uses standard boolean logic and now understands the
wildcard character "*". See the ToDo list in the release
for future features. After the user enters the words
they care to search on they select the "Search Button". If
the search is successful at hit list of valid files appears
along with the count of how many documents had those words.
Double clicking the line item or selecting it and clicking
"Go" will process the information in the manner that you
choose.
Java Runtime Engine
Fig. 3 - Java Applet Search Engine Window
The Java engine is written in 1.0.2. The reason for this
was so that a larger number of browsers would be able to use
the product. It was also written so that it doesn't require
the applet to be trusted. It can work off of a CD-ROM, Hard
Drive, Intranet. Due to its design it requires the use of a
CGI application for it to be used over the internet. (see
below).
The applet window above accepts several parameters that
modify the title and indicate how the results are handled.
It is assumed that it is running in a frames environment but
it isn't necessary. The design lends itself to a frames
environment. If you place the applet button on a frame that
will always be available it means that the engine is loaded
upon start-up and is always accessable and fast. It also
means that any hit list that you create stays around until
you ask for another one. This means the user can keep trying
out the different files the hit list found without
generating a new hit list. Since the search information is
in a separate applet window it can be closed and opened over
and over again without losing the data. Several parameters
are passed to the applet to allow this to work. The sample
code details the information that needs to be passed.
With the 1.2 release the Java interface as seen in Fig. 3
has been separated from the Search Engine itself. The code
that was used to create the interface above has been
included with the release and this now allows you to write
any kind of interface that you would like. It also allows
the Search Engine to be used in a standard Java Applications
as well.
DLL Runtime Engine
Both 16-bit and 32-bit version of the DLL have been
included in the release. This allows you to write any
interface to the Engine that you so desire in any language
you wish just so long as it can access a DLL. Examples in
Delphi and Visual Basic have been supplied in the release.
CGI Runtime Engine
Two CGI versions are available since the 1.3 release. The
one is a Windows based web server CGI application. It should
work with any standard Windows web server. The other version
is a Java Servlet. It has been tested with Apache and should
work with any web server that can handle servlets. In fact
the demos found on this page use the servlet version running
on Apache. Both of these CGI applications talk to and work
with the same Java Applet that you use for a CD-ROM setup.
The help documentation gives you some input as to how to
configure your Applet call to use the CGI applications.
Features included in Version
1.4
- The Indexer works in batch mode.
Command line arguments accepted by the Indexer.
<path>IDKSM Indexer.exe <path>[project]
autorun
- The first parameter is of course the
project file that contains all the information
required.
- The second parameter is "autorun" which is the
command telling it to, well as it says, autorun.
- The Indexer now uses a project file which stores your
options and file selections for a given job. The file
type .idksm is registered now and links you to the
Indexer. This allows you to set options and files on a
project by project bases. It also makes it easier for
batch processing because now you only need to list the
project file to open along with the autorun command.
- Color modification. You can now change the applets
background, foreground, button background and button
foreground colors via param statements in the applet
call.
- A bug was fixed where in the last word processed was
dropped and not added to the dictionary.
- A new tag is being recognized. It is <IDKSM
ignore> </IDKSM>. In some cases users want to
index a file but do not want certain sections of the file
to be indexed. By placing the section you DO NOT WANT
indexed between these tags it will not be processed by
the Indexer.
- CD-ROM launch program. I have included a program
called launch.exe with this distribution. In some cases
users would like to have their CD use autorun to open up
their index.html file. However, autorun will only work
with .exe files. You can use the launch program to make
the CD autorun your html content. In you autorun.inf file
type open=launch.exe index.htm This will open the users
browser and load the .htm file you indicate. Be sure to
use paths if needed.
- A bug was fixed in the Delphi Engine. This bug would
also exists in the Windows CGI application as well as the
DLL. Be sure to use the new versions.
- We now have a list of default Spanish stopwords to go
along with the Indexer.
- <META> tags are recognized. In perticular one
designed specifically for use by IDKSM. It is <META
name=IDKSMTitle content="[your title]">. In some cases
users don't have the option of giving specific Titles
using the <Title> tag. It can also be inconvient to
modify a large filelist.txt file to add the T: option
with a different title. This <META> tag lets you
add a title just for use by IDKSM directly into the HTML
content.
- The <META name=keywords content=""> tag is also
honored. Currently the keywords are added to the
dictionary with the rest of the body text. In future
versions the keywords and body content will be keep
separate so that you can search on one or the other. For
now if you only want to use keywords in your search
engine you can use the <IDKSM ignore> tag on the
body of the file and then only the keywords will be
included in the dictionary.
- The Indexer now does the majority of its work from
the hard drive. Previously the Indexer tried to do most
of the work dynamically in RAM. Some users have a HUGE
number of files that contain alot of content. It would
require the users to have HUGE amounts of RAM to operate.
Now the Indexer does its sorting and word access directly
on the hard drive. The downside of this will be longer
processing times for large databases. It's a trade
off.
- You can now specify a complete URL: for the F:
options tag in the filelist.txt file. This means that you
can index files from several machines or hosts and
collect all of the content into one database. Then by
giving the complete URL, ex.
- F:http://www.miraclec.com/software.html
the file can be accessed on this other host. This URL
can be an HTTP:, FTP:, FILE: or any kind of valid URL.
Please note that in testing it was observed that IE
didn't work exactly right when using the FILE: URL. Talk
to me if this is something you want to use.
- A bug was till hanging around from earlier version of
the engine that limited file searches to less the 3,200.
This has been removed.
- A bug was found that involved word stemming when
using the "*" character. It didn't happen all the time
but was dependent on word order in the dictionary. This
has been fixed.
- Another hold over from an earlier version was found.
This wasn't a bug it was a feature that has been removed.
It forced the selected file to always be converted to
lower case. This would cause files with upper case
letters to not be found.
- The interface now honors a double-click in the list
box. You can now double-click on your selection and no
longer need to use the "Go" button.
- The interface is now multi-language enabled. It can
now easily support alternate languages and already has
Spanish and Danish included. A new parameter has been
included to the applet call.
- <param name=language value=en> {for english}
- <param name=language value=sp> {for spanish}
- <param name=language value=da> {for danish}
- <param name=language value=da> {for dutch}
- <param name=language value=fr> {for french}
- <param name=language value=sl> {for
slovenian}
The interface defaults to english if the parameter
isn't supplied.
- The IE HTTP 1.1 bug has been worked around. All
versions of IE should now work with the CGI versions of
IDKSM.
- A bug, where (non-breaking space) wasn't
honored correctly, has been fixed.
- All indexed words have a size limit of 27 characters.
The previous sorting routine handled words larger then
this. The new sorting routine didn't and would trash the
database files. This has been corrected.
- The Indexer and various runtime time engines now will
recognize and work with extended characters. Previously
extended characters, which are those characters other
than 0-9, a-z, and A-Z, where not handled correctly. As
of this release you can index and search with these
characters.
- Error codes are not available in both the Delphi and
DLL versions of the runtime engines.
- Possibly one or two other items I can't recall right
now. :-)
Is there are feature you would like to see?
Write me and tell me
about it. Maybe it is on our todo list or we can add it.
Demo
I have combined all the demos into one page. On this one
page you can see example of the applet in different
languages, colors, and as an embedded applet. To see the
demo your browser must have Java enabled.
Demo - Due to the hacker attach the demo is currently offline.
Download
Version 1.4 of the Search Engine has been released and
can be found on the
FTP
site. Until the the 1.4 version is more widely available I
will retain the links to the 1.3 version.
You can download the file from several sites. If one
doesn't work well for you please try another one.
Sites where the Ver. 1.4 file is
available: (1,454,103 bytes)
- Miracle
Concepts - FTP
- Torry's
Delphi Pages
Don't forget to check
the Update section.
Call for foreign language assistance:
|
I would like to release versions of the Runtime
Engine in different languages. However, not being
fluent in different languages I need your help. I
would appreciate anyone giving me information as to
the comparable names of buttons and text in
different languages. Send me an email and I can
send you a list of the exact words and phrases I
need translated. -- Thank you.
|
Updates
Mailing Lists
Two mailing lists have been setup to support the IDKSM
Search Engine. The first list will be used to announce new
versions and features. The second list is used to report
bugs, request new features and a place for users to share
problems and ideas. To subscribe to the first mailing list
for announcments send email to:
idksm-announce-request@miraclec.com
and place the word subscribe in the subject line. To
subscribe to the second mailing list for user feedback and
to report bugs send email to:
idksm-request@miraclec.com
and place the word subscribe in the subject line.
NOTE: Please do not add any additional text to
the body of a subscription request because the email is not
read by anyone. The subscription is processed automatically.
Only send mail to the idksm@miraclec.com address
To get more help information on how to subscribe and
unsubscribe to either of these lists, send email to:
idksm-announce-request@miraclec.com
or
idksm-request@miraclec.com
and place the word help in the subject line.
Once you have subscribed to the mailing list you can send
email to the mailing second list by using the address:
idksm@miraclec.com
Registration
The price of IDKSM is currently $50. This gets you a
completely functional Indexer and the rights to distribute
royalty free as many copies of the Runtime Engines as you
wish. The Indexer is not to be distributed.
After you pay your registration fee I will send you a key
to unlock the Indexer.
Payment method:
You have 3 methods to make a payment. Secure online
payment with a credit card, i-check online bank check, or
snail mail check payment.
Please note that our store uses Javascript so it must be
enabled on your browser to go shopping. The store also uses
cookies to store your shopping cart information. No other
information is collected or stored. A secure server is used
to collect your credit card information so your transactions
are completely safe.
If you are still uncomfortable about using your credit
card on-line you can Fax me the required information.
FAX: 570-388-6101
or send a check by mail:
Amount:
|
$50.00/copy
|
Pay to the order of:
|
Miracle Concepts, Inc.
|
Mail to:
|
Miracle Concepts, Inc.
74 Hex Street (Harding)
West Pittston, PA 18643-9615
|
Note:
|
Please include a valid email address so that I
know where to send the registration key.
|
*IDKSM - I
Don't Know Search Me
|