In an earlier article, I described using Suprtool to check the syntax of email addresses and delete invalid ones.
I was toying with figuring out how to call MPE CI string functions on the Suprtool defined strings... like WORD() to extract the domain name... then call a ci command/program that did a dns lookup of the domain name. I figured something like that would be pretty slick. Turns out Mike Hornsby had the same idea:
Chris,
To take this to the next level, I like to use the MPE/IX WHOIS freeware utility from this web location:http://www.allegro.com/software/hp3000/allegro.htmlThe following script will validate an email address and look up the domain name registration using the Allegro ported version of the WHOIS program. This can be very useful for validating the URL and it also returns the company name, address, and sometimes a general phone number.
Warning: The script must run the WHOIS program under the Posix shell (sh.hpbin.sys) and the system must be connected to the Internet to route the WHOIS requests. Also, you must set up the security in the Allegro account according to the instructions on their web page.
Emailchk Command File:
PARM EMAIL_NAME = "MIKEH@BEECHGLEN.COM" SETVAR ATPOS, POS("@","!EMAIL_NAME") IF ATPOS = 0 THEN ECHO BAD EMAIL FORMAT ERROR -1 SETVAR EMAIL_ERR,-1 echo !EMAIL_NAME >> BADDOTS ESCAPE ENDIF SETVAR DOTPOS, POS(".","!EMAIL_NAME") IF DOTPOS = 0 THEN ECHO BAD EMAIL FORMAT ERROR -2 SETVAR EMAIL_ERR,-2 echo !EMAIL_NAME >> BADDOTS ESCAPE ENDIF SETVAR URLPOS,ATPOS+1 SETVAR WHO_URL,RTRIM(STR("!EMAIL_NAME",!URLPOS,80)) ECHO !WHO_URL PURGE WHOIN BUILD WHOIN;REC=-80,,F,ASCII ECHO /ALLEGRO/PUB/WHOIS -C !WHO_URL > WHOIN RUN SH.HPBIN.SYS <WHOINAs well, if the email name is found to be missing dots (.), this script appends the name to a BADDOTS error file.
Mike Hornsby (mikeh@beechglen.com)
Co-founder/Chief Technical Officer
Beechglen Development Inc. (beechglen.com)
513-922-0509
MikeH@beechglen.com
Paul Gobes of Robelle Technical Support tested the command file with WHOIS and below are the results for a successful and unsuccessful email address.
Below are the results for a successful and unsuccessful email address.
:emailchk paulgobes@hotmail.com
hotmail.com
Whois Server Version 1.3
Domain names in the .com, .net, and .org domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
Domain Name: HOTMAIL.COM
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: www.networksolutions.com
Name Server: NS1.HOTMAIL.COM
Name Server: NS3.HOTMAIL.COM
Updated Date: 12-may-2000
>>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<<
The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and
Registrars.
Found InterNIC referral to whois.networksolutions.com.
Registrant:
Hotmail Corporation (HOTMAIL-DOM)
1065 La Avenida
Mtn. View, CA 94043
US
Domain Name: HOTMAIL.COM
Administrative Contact, Technical Contact, Billing Contact:
Records, Custodian of (COR58) enforce_policy@HOTMAIL.COM
MSN Hotmail
1065 La Avendia
Mtn. View , CA 94043
(650) 693-7066 (FAX) (650) 693-7061
Record last updated on 12-May-2000.
Record expires on 28-Mar-2001.
Record created on 27-Mar-1996.
Database last updated on 18-Dec-2000 14:23:20 EST.
Domain servers in listed order:
NS1.HOTMAIL.COM 216.200.206.140
NS3.HOTMAIL.COM 209.185.130.68
End Run
:emailchk paul@jsdlfkjslfj.com jsdlfkjslfj.com Whois Server Version 1.3 Domain names in the .com, .net, and .org domains can now be registered with many different competing registrars. Go to http://www.internic.net for detailed information. No match for "JSDLFKJSLFJ.COM". >>> Last update of whois database: Mon, 18 Dec 2000 11:02:57 EST <<< The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains and Registrars. End Run
Paul couldn't find any error codes returned by WHOIS, nor did the documentation make any reference to them. This makes it difficult to separate the failed domains from the successful domains.
Another restriction is that WHOIS defaults to a server that only finds .COM, .NET, .ORG and .EDU domains, but not national domains such as robelle.ca (Canada) and bbc.co.uk (United Kingdom). You could modify the command file to point to a different server using the '-h server' flag, if your addresses were from a different domain. A final problem is that WHOIS can fail due to heavy network traffic.
So it doesn't really make sense to use WHOIS in batch to check thousands of email addresses.
Here is a job to extract all the valid email addresses from the Ecometry cust-xref dataset and convert them into a series of EMAILCHK command file calls.
!SUPR BA macord,1,DOALL get cust-xref def email,xref-no[3],47 def emtype,xref-no[1],2 if emtype='EM','EU' and email=='?@&@?@.?@' extract cust-edp extract "emailchk ",email output mydsnschk,temp exit ! !mydsnchk
This task inserts the string "emailchk" in front of the EMAIL address in
each record, making it into an invocation of the EMAILCHK command file. Then
just type MYDNSCHK to invoke EMAILCHK once per entry in the EMAILS file,
producing a long listing of WHOIS output that you can review manually.
[Note: an improvement on this would be to add a second parameter to EMAILCHK
with the customer id number, then add CUST-EDP to the command file
invocation: ext ",",cust-edp; this would allow you to print out which
customer has which email address in the listing.]
I call the command file veremail.xeq.
You pass it a domain name (the part of
the email address after the "@") and a customer number. It reports whether
the domain is valid or not, using the same criteria that Internet mail
systems use (looking for either MX or A type DNS records). It can be
modified to output only bad addresses in a way that could be imported back
into Ecometry (or whatever) so the bad addresses could be deleted, or just
create a list of bad addresses that a customer service team could manually
investigate.
parm domain,custno=0 setvar hpautocont TRUE purge NSRES > $NULL purge NSRES,temp > $NULL purge NSCOMM > $NULL purge NSCOMM,temp > $NULL purge NSINX > $NULL purge NSINX,temp > $NULL purge NSOUTX > $NULL purge NSOUTX,temp > $NULL purge NSERRX > $NULL purge NSERRX,temp > $NULL echo set type=any > NSINX echo !domain >> NSINX save NSINX echo /BIND/PUB/bin/nslookup !NSOUTX 2!>NSERRX >NSCOMM save NSCOMM xeq sh.hpbin.sys "./NSCOMM" grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES grep.hpbin.sys "'Unrecognized command: ' NSOUTX" >> NSRES setvar _okdomain TRUE if finfo("NSRES","exists") then if finfo("NSRES","EOF")>0 then setvar _okdomain FALSE endif endif if !_okdomain then echo Domain name "!domain" for customer#!custno is ok else echo Domain name "!domain" for customer#!custno is invalid endif purge NSRES > $NULL purge NSRES,temp > $NULL purge NSCOMM > $NULL purge NSCOMM,temp > $NULL purge NSINX > $NULL purge NSINX,temp > $NULL purge NSOUTX > $NULL purge NSOUTX,temp > $NULL purge NSERRX > $NULL purge NSERRX,temp > $NULL setvar hpautocont TRUE
If you look closely, you will see that this command file also outputs the customer number with each result, making it possible to update the original database.
What to do? Working with Paul Gobes at Robelle we came up with a strategy to save already checked domain names so they don't have to be checked again on the Internet. Once the first big processing was done, the regular checking would be much faster. We also adjusted the job to strip off the user name and only test the domain part of the email address, thereby eliminating many duplicates (this was done with Qedit and regular expressions - thanks to Dave Lo at Robelle for the help on this).
Here is Paul's 7-step plan to do what needs doing. Note: one crucial step requires Qedit, which the US Mint has, but not everyone else will.. so some users will need to buy Qedit ;-)
1) Get emails that need testing:
SUPR
BA macord,1,DOALL
get cust-xref
def custz9,1,9,display
def email,xref-no[3],47
def emtype,xref-no[1],2
def domain,1,47
table t1,email,file,emailok.data.sgaii
if emtype='EM','EU' and email=='?@&@?@.?@' and &
email >< '?@ ?@' and NOT $lookup(t1,email)
extract custz9 = cust-edp
extract domain = email
extract email
out file1,link
exit
2) Use Qedit to trim the username from the domain. Please note that Qedit is editing a self-describing file; it retains the user labels between the Text and Keep so that the file can still be input into Suprtool and the field attributes will be recognized. Qedit edits a single field by doing a Set Left and Set Right to set margins.
QEDIT
set lang text
set incr .001
set work labels on {keep as link file}
text file1
set left 10 {skip the cust acct#}
set right 56 {skip the full email}
change "(.*@)(.*)" (reg) "\2" @ {use regular expressions}
set left
set right
keep file2,yes
exit
3) Use Suprtool to remove duplicates and the customer #
SUPR
in file2
sort domain
dup none keys
ext "domcheck ", domain
out file3
exit
4) Now create the domcheck command file
SUPR
in *;out domcheck,temp;exit
PARM domain = "robelle.com"
setvar hpautocont TRUE
purge NSRES > $NULL
purge NSCOMM > $NULL
purge NSINX > $NULL
purge NSOUTX > $NULL
purge NSERRX > $NULL
echo set type=any > NSINX
echo !domain >> NSINX
save NSINX
echo /SYS/UTIL/NSLOOKUP !NSOUTX 2!>NSERRX >NSCOMM
save NSCOMM
xeq sh.hpbin.sys "./NSCOMM"
grep.hpbin.sys "'Non-existent ' NSERRX" > NSRES
setvar _okdomain TRUE
if finfo("NSRES","exists") then
if finfo("NSRES","EOF")>0 then
setvar _okdomain FALSE
endif
endif
if not !_okdomain then
echo !domain >> badadds
endif
purge NSRES > $NULL
purge NSCOMM > $NULL
purge NSINX > $NULL
purge NSOUTX > $NULL
purge NSERRX > $NULL
setvar hpautocont TRUE
!eod
5) Invoke the domain checking command file
!file3
6) Use Suprtool to find the customer #s for the bad addressess
SUPR
in file2
table t1,domain,sorted,badadds
if $lookup(t1,domain)
ext "Customer# ", custz9, "has bad domain: ", email
out *
xeq
7) Append the good emails to emailok.data.sgaii
in file2
table t2,domain,sorted,badadds
if not $lookup(t2,domain)
ext email
out emailok.data.sgaii,append
exit
As you can see, this idea is still a work in progress. Please send any further improvements to me and to support@robelle.com - thanks.
Chris.Bartram@usmint.treas.gov
January 2, 2001
|
|
|---|