Today, June 28, 2002, the latest version of HowNet, "HowNet 2002" is officially released.
The authors would like to call your attention to the following.
The difference between HowNet 2002 and HowNet 2000
HowNet 2002 inherits the theory and conception from HowNet 2000. That is to say, there is no difference between the two in their conceptual design.
HowNet 2002 has been greatly improved and strengthened in the following aspects:
- The vocabulary in HowNet 2002 has greatly improved in both quantity and quality. A substantial enlargement in vocabulary has
been achieved. Nearly 10,000 Chinese words and expressions have been added. Besides, a great deal of adjustment of English
equivalents has been made. Many explanatory English translations have been replaced by idiomatic English words and expressions
when possible. Based on this a large amount of English synonyms have been added.
- The improvement of HowNet KDML (Knowledge Database Mark-up Language) is the most significant revision in HowNet 2002. The
original KDML was in a linear structure, while HowNet 2002 takes an embedded tree structure instead. The implicitness or even
ambiguity in the representation of concept relationship by the original KDML has been replaced by the explicitness in the new
KDML. To take the word “tomb” as an example, its definitions in HowNet 2002 and HowNet 2000 are as follows respectively:
Obviously, in the old definition the sememe “die|死” is ambiguous in its reference to the sememe “human|人” or the sememe
“facilities|设施”. The sememe “space|空间” is related to “facilities|设施” in an implicit way. Nevertheless the definition
in HowNet 2002 is defined much better. Meanwhile as the new structure can refine the definition; the defining capability has
greatly improved in the new KDML.
For example in HowNet2002 the words ”body bag”, “autopsy”, etc. share the part in their definitions indicating “body”:
- A great adjustment of the sememes of attribute, attribute value, quantity and quantity value has been made. In HowNet 2000 all
the attribute value (adjectives) are coded by a specific value (usually the third item) and its corresponding attribute (usually
the second item) For instance, the word “healthy” is coded as follows:
while one of the meanings of the word “heavy” (as in heavy rain) is coded as:
In other words, in HowNet 2000 a combination is used to differentiate meanings. In HowNet 2002 we adopt a new method by using a single
unique value. The word “healthy” is coded as:
and the word “heavy” (as in heavy rain) is coded as:
This change has brought a substantial increase of sememes. HowNet 2000 has over 1500 sememes, but HowNet 2002 has around 2200.
- The browser of HowNet 2002 has greatly improved with more functions added and a new module “Event and Role” is merged.
- We have regularized and analyzed all the structure patterns in HowNet – Chinese Message Structure Bank.
A computer-aided tool for extracting examples and discovering new structure patterns from corpus is being developed.
In the download area, a HowNet 2002 Demo and a Chinese word and expression list is provided, which can be downloaded free of charge.
HowNet 2002 Demo
It includes around 10,000 Chinese words and expressions and around 15,000 English word and phrases, among which all the Chinese
single-character words and expressions HowNet 2002 whole contains are covered. The reason why we select Chinese single-character
words and expressions is that the sememes used in HowNet are based on 4000 Chinese characters It is believed that users can have
a clear picture of HowNet whole by playing its Demo.
Chinese word and expression list
It lists over 60,000 Chinese words and expressions that HowNet 2002 covers. From the list you may find the updating of HowNet,
and you may estimate its capability for your application. The English word and expression list will be released by the end of this year.
According to the clue in the “Symposium on HowNet” sponsored by Chinese Information Processing Society of China in 2000,
HowNet 2002 will not be provided free of charge. It will adopt a membership system. Any user can contact our technical supporter
to collect Application Forms for HowNet membership. A HowNet member can get the latest version of HowNet 2002 around every three
months free of charge.
HowNet 2000 will remain in the download area and can be downloaded as usual.
||(86-010) 8238-2578 ext:205
||Department of Language Knowledge Research|
Research Center of Computer & Language Information Engineering,
Chinese Academy of Sciences
Room 310 of Kequn Building (West Wing)
NO. 257 Beisihuan Middle Road,Haidian District, Beijing, China.