Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us
Notesclub

Reference Types: Reverse Engineering

experiments/references_types.livemd

Reference Types: Reverse Engineering

Intro

At some point we noticed a lack of information in the Lexin Mobi’s output (in comparison to the original Lexin website).

We wrote down a few notes on our process of understanding how to parse and then present some extra data from original XML files.

Types of Types

Quick check shows that original XML files contain an extra XML piece, ` tags. We've grepped all the files to find out what theseTYPEs are: ```console for file in swe_*.xml; echo $file; cat $file | grep " ### Unique TYPEs Here are the all uniqueTYPEs values we found in all languages' files: ```console for file in swe_*.xml; cat $file | grep " ### Some Random Extra Statistics Some random statistics (on swe_rus.xml): ```console > cat swe_rus.xml | grep ' cat swe_rus.xml | grep ' cat swe_rus.xml | grep ' cat swe_rus.xml | grep ' cat swe_rus.xml | grep ' > cat swe_rus.xml | grep ' > cat swe_rus.xml | grep ' > cat swe_rus.xml | grep ' ``` ## Going Deeper Let's check all the types we found with a better precision. ### Type: Animation ```console cat swe_rus.xml | grep -3 ' dra in luft i (och skicka ut luft ur) lungorna ²An:das andades andats ``` #### In the UI Shows as a link "VISA FILM" (between meaning and graminfo) to http://lexin.nada.kth.se/lang/lexinanim/andas.mp4 (not SWF!): ![](images/Animation%20reference%20in%20the%20UI.png) ### Type: Compare ```console cat swe_rus.xml | grep -3 ' uppmärksamhet i fraser ak:t ta tillfället i akt ``` #### In the UI Shows as '...subst. jämför "iakttar"' (on the first line, after lyssna link and type of word). Could be a link to the word (but sometimes it has numbers in the value, which may be lead to sub-definitions). ![](images/Compare%20reference%20in%20the%20UI.png) ### Type: Phonetic ```console cat swe_rus.xml | grep -3 ' alla (tillsammans) vardagligt al:esAm:an(s) sjung med allesamman(s)! ``` #### In the UI Next link to the first "lyssna" with the URL like http://lexin.nada.kth.se/sound/allesammans.mp3 (swf -> mp3). ![](images/Phonetic%20reference%20in%20the%20UI.png) > 💡 **Important Note** > > Swedish characters gets converted to something, for example: > > *urspårning.swfhas link that leads to http://lexin.nada.kth.se/sound/ursp0345rning.mp3 > *pärlemor.swflinked to http://lexin.nada.kth.se/sound/p0344rlemor.mp3 > *omvänt baksträck.swflinked to http://lexin.nada.kth.se/sound/omv0344nt%20bakstr0344ck.mp3 > *död(s).swflinked to http://lexin.nada.kth.se/sound/d0366d(s).mp3 > *dossié.swf` linked to http://lexin.nada.kth.se/sound/dossi0351.mp3 > > Here is the table that look like a good source for these conversions: https://www.ic.unicamp.br/~stolfi/EXPORT/www/ISO-8859-1-Encoding.html ### Type: See console cat swe_rus.xml | grep -3 ' A:de: #### In the UI Shows as “se VALUE” (VALUE here is the contents of the VALUE attribute) after “lyssna” link. We possibly can make these words (rarely they separated by commas) as links.