DEVONagent simply explained, hopefully
DEVONagent (OSX app) allows you to perform complex web searches that go beyond what most search engines can do. To really get a grasp of how that little piece of software with a somehow outdated look works and what it is good for, one has to understand its workflow.
What is DEVONagent for
Let’s first clarify what DEVONagent is good at and what not, to not throw wild hopes at it.
DON’T use DEVONagent and stick to your usual search engine, when…
- looking for one precise bit of casual information you know exists (like the opening hours of a shop),
- you know your favourite search engine will get you immediately to the right answer.
DO consider usingDEVONagent…
- to regularly scan/crawl/trawl the web for all (the sparse information) that (newly) exists on a niche topic: like a rare medical condition, a specific piece of software, an antique camera brand , …
- to search in your set of favourite/trusted sources,
- to crawl entire website(s) and extract/export/download some objects (like the PDF editions of the journal an institute publishes),
- to monitor several online shops, for a specific article.
DEVONagent workflow
DEVONagent follows a certain workflow that can be either fully automated or used in an interactive and iterative manner. Understanding its workflow is essential to being able to use it.
DEVONagent workflow in summative steps:
DEVONagents in the above diagram is simplified as a sequence of functions upon document sets. The effective workflow actually happens in a paralellized manner in DEVONagent. Yet, for the sake of our understanding, let’s keep that pipeline workflow model.
As you can also see in the diagram, there’s some iteration possible: you may get back to the configuration step to improve the search mission you give to Devonagent and decide whether re-start and get a new set of results or append to the current results set.
Configuring the search: the search mission brief
I use the image of a “search mission”, because once configured and started, DEVONagent executes the “search work” and does it without you for the middle (dark-grey) part of the workflow, before delivering the final set of results to you for manual exploration and further processing.
This is why, the configuration step is crucial and why you need to understand how each configuration setting will influence DEVONagent while it performs the “search mission” on its own.
DEVONagent has a slightly less metaphoric way to name a pre-configured “search mission”, it calls that a “search set” and comes with a bunch of preconfigured search sets for you. Weirdly, those are hard to find:
- Select one of them when you create a new search by clicking on the “magnifier dropdown” in the query field.
- Go to “Window > Search Sets” to see them all and access their settings.
If you are not familiar with search sets, just have a look at the different panes of the “Web > Deep” search set, as an example.
Let’s now look at how one configure each of the 3 steps of a “search mission” which DEVONagent will perform on its own.
How to configure the search bootstrapping step (2a)
The first thing you need to configure is how DEVONagent will initiate the search. Since the web is infinitely big, it boils down to telling DEVONagent where to start to look for potentially promising results.
DEVONagent proposes three approaches to “seed your search”, approaches that you can actually combine:
Starting from some hand-picked webpage(s)
In this option, you configure the search set to crawl links from some given URL(s) that you list in the “Sites” pane and which you set to “crawl” mode. This tells DEVONagent to basically add each of these provided URLs into the “initial set of documents”.
Searching in a given search engine(s)
In this option, you configure the search set to perform a query on one/many search engine(s) (through plugins), which will also return an “initial set of documents”.
Searching website(s) through Google / Bing
In this option, you narrow down on some website(s) that you list in the “Sites” pane and which you set to “crawl” mode. You thus don’t have to create a plugin for each website that has an internal search engine. If the website content is open (meaning if Google knows about it), you’re better using this option.
“Search Set” settings at your disposal for the “search boostraping” step:
- [Sites] (pane): a list of starting URLs, in crawl mode; a list of domains in which to restrict a Google & Bing search, in search mode.
- [Plugins]: specific search engine(s) you want to intiate the “search mission” with.
- [General] > Default query: the initial query you want to pass to these search engine(s). DEVONagent comes with a powerful query language that goes beyond what most search engines provide. Read its manual to discover more about it.
How to configure the crawling step (2b)
At the end of the “search boostraping” step, DEVONagent will have aggregated an “initial set” of documents (that you actually don’t get to see). It then starts “crawling around it”, following links it finds in it.
Settings at your disposal for the crawling step:
- [General] > Follow links: defines how DEVONagent will actually follow links (off, all, same host, same directory, sub-directories) and how many hops (1 to 5) it will be allowed to take from the original.
- [General > Express Search: tells DEVONagent to not follow links nor filter results and return all results from the initial set.
- [Advanced] > Exclude domains: to exclude all documents from certain domains from the results.
- [Advanced] > Exclude links: to excludes all documents matching a certain URL (pattern).
At the end of this (recursive) crawling, DEVONagent will have amassed an “extended set of documents”, the pertinence of which still needs to be assessed.
Since you could configure it to pretty much trawl the whole web, DEVONagent comes with a [stop] button to halt the crawling and process to last step.
How to configure the filtering step (2c)
After the crawling step is done, DEVONagent will have aggregated an extended set of documents (potentially thousands) and won’t just dump all of this in front of you. Rather, DEVONagent will try its best to only retain what seems relevant to you, by matching the content of the found documents with several criteria.
Settings at your disposal to configure the filtering step:
- [General] > Secondary query: this second query is used to assess the content of gathered documents and keep the ones matching it. If you leave it empty, DEVONagent will use the default query (provided just above it). If you enter “*”, the filtering step will be omitted (since all documents match *).
- [General > Express Search: tells DEVONagent to not follow links nor filter results and return all results from the initial set.
- [General > Language: retain only documents matching a certain language.
- [General] > Language > Ignore diacritics: strips umlauts, accents & co. when matching queries and documents.
- [General] > Language > Fuzzy: allows matches with minimal spelling variations.
- [General] > Filter > Similar pages: ignores webpages that look too similar to existing retained pages.
- [General] > Filter > Archived pages: ignores webpages that already exist in its archive.
- [General] > Scanner: tells DEVONagent to only retain documents to embed some object types, like PDFs or emails, hinting at the fact that you are actually interested in collecting those.
- [Advanced] > Files: defines which document types you’re actually interested in.
What to do with the results set
Once DEVONagent stops and returns to you with a list of results, you can of course explore them interactively, export them as files or to DEVONthink, or export the objects the scanner found, etc. But that part is a lot more obvious, look at its manual for more information about it.