How the **** do you parse japanese in a program?

So Im making a program in python that maybe if I'm lucky will be able to parse japanese words and sayings. However it seems like having no spaces makes it unbelievably difficult to do. I looked into yomitan and it seems like it is using prefix trees or something like that.

However not even yomitan correctly parses some passages, see:、簡単なおやつはいかがでしょうか。

Atleast with my setup it sees 簡単なおや… If it parsed by longest matching section first it might work better but I'm not quite sure it would be flawless and it's not even like yomitan was made for breaking down entire sentences in the first place.

Has anybody here had any success with breaking down japanese sentences? How did you handle verb endings? Was there any unexpected difficulties you faced?

I've tried and will probably continue working with MeCab but it feels really clunky and forces kanji on everythings lemma (base form).

by StorKuk69

How the **** do you parse japanese in a program?

Tags:

Leave a Reply