Sublime Forum

How to run beautifulsoup for sublimetext?

#1

I’m trying to make work this beautifulsoup repository in ST3. What I did so far was copying the folder st3/bs4 as bs4 in the packages folder. And then I’m just trying to run this code:

class FooCommand(sublime_plugin.TextCommand):

    def run(self, edit, **kwargs):
        from bs4 import BeautifulSoup

        print('debug starts-------->')
        soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
        print(soup.__class__)
        print(soup)
        print(repr(soup.prettify()))
        print('debug ends-------->')

But for some reason the output I’m getting is this:

debug starts-------->
<class 'bs4.BeautifulSoup'>

''
debug ends-------->

And I dunno why… when i try running the same code on python3.6 outside of ST3 environment with beautifulsoup4 installed from pip I’ll get the correct output, which is:

debug starts-------->
<p>
 Some
 <b>
  bad
  <i>
   HTML
  </i>
 </b>
</p>
debug ends-------->

Any idea what’s going on here? :confused:

0 Likes

#2

That repository is meant to be a Sublime Text package dependency, so you haven’t installed it correctly. I would check the Sublime console, because when I do this I see Sublime throwing an error when it tries to load the bs4 package.

Try this:

  1. Remove the bs4 folder that you put in place above
  2. Create a file named dependencies.json in the folder of your package, and put the following in it:
    { "*": { "*": [ "bs4" ] } }
  3. Select Package Control: Satisfy Dependencies from the command palette (or restart Sublime, which will make PC notice that the dependency is missing and install it for you).

PackageControl will install bs4 for you as a dependency, and if you look at the folder structure, it’s laid out differently than the way that it appears if you just copy that folder directly. Now the errors in the console at startup are gone because Sublime isn’t trying to load the bs4 package (due to it not having any top level python files).

Now your command works as you would expect:

plugins loaded
Package Control: Skipping automatic upgrade, last run at 2017-12-18 22:15:30, next run at 2017-12-18 23:15:30 or after
>>> view.run_command("foo")
debug starts-------->
/tmp/sublime_text_3/Data/Packages/bs4/all/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))
<class 'bs4.BeautifulSoup'>
<p>Some<b>bad<i>HTML</i></b></p>
'<p>\n Some\n <b>\n  bad\n  <i>\n   HTML\n  </i>\n </b>\n</p>'
debug ends-------->
5 Likes

#3

@OdatNurd Hi there! As usual you’ve saved the day once again, so thank you very much :wink:

In any case, even if your answer is a very good one, one doubt still remains to me, why didn’t I see the console error you’re talking about? In fact, take a look to this diff, it’s a comparison between the startup console log with the bs4 package installed manually I’ve described above and the console log without any bs4 package:

--- from_file
+++ DPI scale: 1
@@ -7,9 +7,9 @@
 zip path: /D/software/sublimetext3/Packages
 zip path: /D/software/sublimetext3/Data/Installed Packages
 ignored_packages: ["Vintage"]
-pre session restore time: 0.172808
-startup time: 0.192808
-first paint time: 0.193808
+pre session restore time: 0.244265
+startup time: 0.266265
+first paint time: 0.267265
 reloading plugin Default.auto_indent_tag
 reloading plugin Default.block
 reloading plugin Default.comment
@@ -131,10 +131,6 @@
 SublimeLinter: pydocstyle linter loaded 
 reloading plugin Toggle Words.ToggleWords
 reloading plugin UnitTesting.ut
-reloading plugin bs4.__init__
-reloading plugin bs4.dammit
-reloading plugin bs4.diagnose
-reloading plugin bs4.element
 reloading plugin ColorPicker.sublimecp
 reloading plugin Miscelanea.commands
 reloading plugin Nasm.assembly
@@ -144,32 +140,20 @@
 reloading plugin Zebra.__init__
 reloading plugin Zebra.commands
 plugins loaded
-Loaded instruction set
-Loaded support set
-updating ANSI build settings...
 SublimeLinter: debug mode: on 
 SublimeLinter: temp directory: c:\users\kneda\appdata\local\temp\SublimeLinter3-KneDa 
 SublimeLinter: find_python(version='3', script=None, module=None) 
-SublimeLinter: find_python: requested version = {'major': 3, 'minor': None} 
+SublimeLinter: find_python: requested version = {'minor': None, 'major': 3} 
 SublimeLinter: find_windows_python: <= None 
 SublimeLinter: find_python: system python = None 
 SublimeLinter: find_python: <= (None, None, None, None) 
 SublimeLinter: no python 3 available to augment sys.path 
 SublimeLinter: WARNING: import of pep8 module in pep8 failed, linter will not work with python 3 code 
-SublimeLinter: flake8 version query: D:\software\python362_32\Scripts\flake8.exe --version 
-SublimeLinter: flake8 version: 3.5.0 
-SublimeLinter: flake8: (>= 2.2.2) satisfied by 3.5.0 
-SublimeLinter: flake8 activated: D:\software\python362_32\Scripts\flake8.exe (disabled in settings) 
-SublimeLinter: find_python(version=None, script='pep257', module=None) 
-SublimeLinter: find_python: default python = None 
-SublimeLinter: find_python: <= (None, None, None, None) 
-SublimeLinter: WARNING: pydocstyle deactivated, cannot locate 'pep257@python' 
-SublimeLinter: find_python(version=None, script='pep8', module=None) 
-SublimeLinter: find_python: default python = None 
-SublimeLinter: find_python: <= (None, None, None, None) 
-SublimeLinter: WARNING: pep8 deactivated, cannot locate 'pep8@python' 
+updating ANSI build settings...
+Loaded instruction set
+Loaded support set
 AutoPEP8:
     sublime: version=3131, platform=windows, arch=x64, packages_path=D:\software\sublimetext3\Data\Packages
 , installed_packages_path=D:\software\sublimetext3\Data\Installed Packages
     plugin: version=1.3.3
-    config: {'list-fixes': '', 'max-line-length': 79, 'file_menu_search_depth': 3, 'format_on_save': False, 'aggressive': 0, 'ignore': 'E24', 'indent-size': 4, 'select': '', 'avoid_new_line_in_select_mode': False, 'syntax_list': ['Python'], 'debug': True}
+    config: {'debug': True, 'list-fixes': '', 'avoid_new_line_in_select_mode': False, 'aggressive': 0, 'file_menu_search_depth': 3, 'format_on_save': False, 'max-line-length': 79, 'syntax_list': ['Python'], 'select': '', 'ignore': 'E24', 'indent-size': 4}

As you can see, there isn’t any error that made me suspecting the way I was installing was wrong, my first assumption was the package was buggy and I’ve started debugging it… but then I just gave up doing so.

Could you please clarify about this so next time it won’t happen to me?

Anyway, once again, tyvm!

0 Likes

#4

Interesting. I’ll have to check when I get back to my development machine later, but the error I was seeing was happening in one of the bs4 modules as it was loading because it was trying to import something that didn’t exist.

Do you get the same results if you try doing this with a portable Sublime that has nothing but the bs4 module in it? Perhaps one of the other packages that’s loading before it is modifying sys,path to point at an external python directory and allowing it to find the missing item?

0 Likes

#5

Just for the record my above description has been tested with ST3131 portable version, right now I’m checking the fresh stable 3141 portable edition to see whether it’s worth to upgrade or not, maybe with this one I’ll get the same error you were getting, I’ll test it later and let you know.

As for your hypothesis about sys.path being tweaked previously bs4 loading, it makes total sense, that’s also something i’ll try to check out later :wink:

0 Likes

#6

I was using the latest development build when I did it, but using dev build 3131 clean I get the same result:

reloading plugin bs4.__init__
reloading plugin bs4.dammit
reloading plugin bs4.diagnose
Traceback (most recent call last):
  File "/tmp/sublime_text_3/sublime_plugin.py", line 109, in reload_plugin
    m = importlib.import_module(modulename)
  File "./python3.3/importlib/__init__.py", line 90, in import_module
  File "<frozen importlib._bootstrap>", line 1584, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1565, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1532, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 584, in _check_name_wrapper
  File "<frozen importlib._bootstrap>", line 1022, in load_module
  File "<frozen importlib._bootstrap>", line 1003, in load_module
  File "<frozen importlib._bootstrap>", line 560, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 868, in _load_module
  File "<frozen importlib._bootstrap>", line 313, in _call_with_frames_removed
  File "/tmp/sublime_text_3/Data/Packages/bs4/diagnose.py", line 7, in <module>
    import cProfile
  File "./python3.3/cProfile.py", line 9, in <module>
ImportError: No module named '_lsprof'
reloading plugin bs4.element
plugins loaded

I’m not familiar with the library itself, so I don’t know if that particular error is directly related to it not working if it’s used that way. On the other hand if you don’t see that error then your environment is different in some way, which could also be the cause. Also the version of bs4 that you’re using was likely reworked to be a Sublime dependency, so it wouldn’t surprise me if it expects to be loaded differently than how Sublime does it while it’s loading packages or something.

I’m sort of curious why you’re normally using an older (development) version and thinking about upgrading to a slightly less old but still not recent development version. I would generally expect that someone using a development build would be using the most recent one available, and if you’re going to run a previous version it would be the stable version (which is 3143 currently).

Is there something broken in the latest stable that makes you want to lag behind it?

0 Likes

#7

ST should not be loading bs4 as a plugin and scanning for plugins. This looks like the bs4 dependency is badly structured. I’m on mobile, so I’m not gonna check it right now, but this seems like a thing that definitely should be fixed. It doesn’t explain the import error though. If it only happens in an old build, I wouldn’t even bother tbh. Something with ST’s python environment might have changed.

0 Likes