Class OCR.Options

  • All Implemented Interfaces:
    java.lang.Cloneable
    Enclosing class:
    OCR

    public static class OCR.Options
    extends java.lang.Object
    implements java.lang.Cloneable
    A container for the options relevant for using OCR on Region or Image.

    Use OCR.Options to get a new option set

    Use OCR.globalOptions() to access the global options

    In case you have to consult the Tesseract docs

    See Also:
    Tesseract docs
    • Constructor Detail

      • Options

        public Options()
        create a new Options set from the initial defaults settings.

        about the default settings see OCR.reset()

    • Method Detail

      • clone

        public OCR.Options clone()
        makes a copy of this Options
        Returns:
        new Options as copy
      • reset

        public OCR.Options reset()
        resets this Options set to the initial defaults.
        Returns:
        this
        See Also:
        OCR.reset()
      • toString

        public java.lang.String toString()
        Current state of this Options as some formatted lines of text.
         OCR.Options:
         data = ...some-path.../tessdata
         language(eng) oem(3) psm(3) height(15,1) factor(1,99) dpi(96)
         configs: conf1, conf2, ...
         variables: key:value, ...
         
        Overrides:
        toString in class java.lang.Object
        Returns:
        a text string as before
      • oem

        public int oem()
        get this OEM.
        Returns:
        oem as int
        See Also:
        OCR.OEM
      • oem

        public OCR.Options oem​(int oem)
        set this OEM.
        Parameters:
        oem - as int
        Returns:
        this Options
        See Also:
        OCR.OEM
      • oem

        public OCR.Options oem​(OCR.OEM oem)
        set this OEM.
        Parameters:
        oem - as enum constant
        Returns:
        this Options
        See Also:
        OCR.OEM
      • psm

        public int psm()
        get this PSM.
        Returns:
        psm as int
        See Also:
        OCR.PSM
      • psm

        public OCR.Options psm​(int psm)
        set this PSM.
        Parameters:
        psm - as int
        Returns:
        this Options
        See Also:
        OCR.PSM
      • psm

        public OCR.Options psm​(OCR.PSM psm)
        set this PSM.
        Parameters:
        psm - as enum constant
        Returns:
        this Options
        See Also:
        OCR.PSM
      • resetPSM

        public OCR.Options resetPSM()
        Sets this PSM to -1.

        This causes Tess4J not to set the PSM at all.
        Only use it, if you know what you are doing.

        Returns:
        this Options
      • asLine

        public OCR.Options asLine()
        Configure Options to recognize a single line.
        Returns:
        this Options
      • asWord

        public OCR.Options asWord()
        Configure Options to recognize a single word.
        Returns:
        this Options
      • asChar

        public OCR.Options asChar()
        Configure Options to recognize a single character.
        Returns:
        this Options
      • language

        public java.lang.String language()
        get the cutrrent language
        Returns:
        the language short string
        See Also:
        language(String)
      • language

        public OCR.Options language​(java.lang.String language)
        Set the language short string.

        (must not be null or empty, see Settings.OcrLanguage for a useable fallback)

        According to the Tesseract rules this is a 3-lowercase-letters string like eng, deu, fra, rus, ....

        For special cases it might be something like xxx_yyy (chi_sim) or even xxx_yyyy (deu_frak) or even xxx_yyy_zzzz (chi_tra_vert), but always all lowercase.

        Take care that you have the corresponding ....traineddata file in the datapath/tessdata folder latest at time of OCR feature usage

        Parameters:
        language - the language string
        Returns:
        this Options
        See Also:
        Tesseract language files
      • dataPath

        public java.lang.String dataPath()
        get the current datapath in this Options.

        might be null, if no OCR feature was used until now

        if null, it will be evaluated at time of OCR feature usage to the default SikuliX path or to Settings.OcrDataPath (if set)

        Returns:
        the current Tesseract datapath in this Options
      • dataPath

        public OCR.Options dataPath​(java.lang.String dataPath)
        Set folder for Tesseract to find language and configs files.

        in the tessdata subfolder (the path spec might be given without the trailing /tessdata)

        TAKE CARE, that all is in place at time of OCR feature usage

        if null, it will be evaluated at time of OCR feature usage to the default SikuliX path or to Settings.OcrDataPath (if set)

        Parameters:
        dataPath - the absolute filename string
        Returns:
        this Options
        See Also:
        language(String)
      • smallFont

        public OCR.Options smallFont()
        Convenience: Configure the Option's optimization.

        Might give better results in cases with small fonts with a pixel height lt 12 (font sizes lt 10)

        Returns:
        this Options
      • textHeight

        public float textHeight()
        current base for image optimization before OCR.
        Returns:
        value
        See Also:
        textHeight(float)
      • textHeight

        public OCR.Options textHeight​(float height)
        Configure image optimization.

        should be the (in case average) height in pixels of an uppercase X in the image's text

        NOTE: should only be tried in cases, where the defaults do not lead to acceptable results

        Parameters:
        height - a number of pixels
        Returns:
        this Options
      • fontSize

        public OCR.Options fontSize​(int size)
        Configure the image optimization.

        should be the (in case average) fontsize as base for internally calculating the textHeight()

        NOTE: should only be tried in cases, where the defaults do not lead to acceptable results

        Parameters:
        size - of a font
        Returns:
        this Options
      • resizeInterpolation

        public OCR.Options resizeInterpolation​(org.sikuli.script.Element.Interpolation method)
        INTERNAL: (under investigation).

        should not be used - not supported

        see Element.Interpolation for method options

        Parameters:
        method - the interpolation method
        Returns:
        this Options
      • bestDPI

        public OCR.Options bestDPI​(int dpi)
        INTERNAL: (under investigation).

        should not be used - not supported

        Parameters:
        dpi - the dpi value
        Returns:
        this Options
      • userDPI

        public OCR.Options userDPI​(int dpi)
        INTERNAL: (under investigation).

        should not be used - not supported

        Parameters:
        dpi - 70 .. 2400
        Returns:
        this Options
      • variable

        public OCR.Options variable​(java.lang.String key,
                                    java.lang.String value)
        set a variable for Tesseract.

        you should know, what you are doing - consult the Tesseract docs

        Parameters:
        key - the key
        value - the value
        Returns:
        this Options
        See Also:
        Tesseract docs
      • configs

        public java.util.List<java.lang.String> configs()
        get current configs
        Returns:
        currently stored names of configs files
      • configs

        public OCR.Options configs​(java.lang.String... configs)
        set one ore more configs file names.

        you should know, what you are doing - consult the Tesseract docs

        Parameters:
        configs - one or more configs filenames
        Returns:
        this Options
        See Also:
        Tesseract docs
      • configs

        public OCR.Options configs​(java.util.List<java.lang.String> configs)
        set a list of configs file names.

        you should know, what you are doing - consult the Tesseract docs

        Parameters:
        configs - a list of configs filenames
        Returns:
        this Options
        See Also:
        Tesseract docs