Service ID | HtmlTextExtractor | ||||||||
---|---|---|---|---|---|---|---|---|---|
Service Name | HTML Text Extractor | ||||||||
Service Type | Other | ||||||||
Service Description | This service separates a HTML document into texts and a HTML skeleton. For example, when this service receives the following HTML document,
"<html> <body> <h1>Weather</h1> <div>It's fine today.</div> </body> </html>", it outputs an array of ($1 Weather) and ($2 It's fine today.), and the HTML skeleton, "<html><body><h1>$1</h1><div>$2</div></body></html>". You can generate the HTML document in other languages by replacing each key ($x) in the skeleton with the corresponding translation; in the above example, replace $1 and $2 with translations of "Weather" and "It's fine today" respectively. This service interface is defined as below. <OPERATION> HTMLDocumentSeparation separate(String htmlDocument) <INPUT> htmlDocument - a document in escaped HTML, such as <html>, <h1>, and so on. <OUTPUT> HTMLDocumentSepration{ CodeAndText[] codesAndTexts; String skeletonHTML; } codesAndText - an array of ID and texts surrounded by a pair of HTML tags. skeletonHTML - HTML document where the texts are replaced with the corresponding ID. CodeAndText{ String code; String text; } code - ID. text - texts surrounded by a pair of HTML tags. <EXAMPLE> (SOAP request) <soapenv:Envelope> <soapenv:Header/> <soapenv:Body> <separate> <htmlDocument> <html><body><h1>Weather</h1><div>It's fine today</div></body></html> </htmlDocument> </separate> </soapenv:Body> </soapenv:Envelope> (SOAP response) <soapenv:Envelope> <soapenv:Body> <separateResponse> <separateReturn> <codesAndTexts> <codesAndTexts> <code>$1</code> <text>Weather</text> </codesAndTexts> <codesAndTexts> <code>$2</code> <text>It's fine today</text> </codesAndTexts> </codesAndTexts> <skeletonHtml> <![CDATA[<html><body><h1>$1</h1><div>$2</div></body></html>]]> </skeletonHtml> </separateReturn> <separateResponse> </soapenv:Body> </soapenv:Envelope> |
||||||||
Atomic or Composite | Atomic Service | ||||||||
Languages |
|
||||||||
Purpose of Use | Non-profit, Research | ||||||||
Type of Application Control | Under Client Control, Under Server Control | ||||||||
Federation Use | Allowed | ||||||||
Permitted Users | For All Users | ||||||||
WSDL | |||||||||
Wrapper Source Code | - | ||||||||
Resource in Use |
|
||||||||
Provider | Language Infrastructure Group, National Institute of Information and Communications Technology | ||||||||
Registration Date | 2009/11/18 | ||||||||
Last Update Date | 2010/12/03 | ||||||||
Status | Run |